Frontend#

The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.

Feature Matrix#

Feature

Status

OpenAI Chat Completions API

✅ Supported

OpenAI Completions API

✅ Supported

KServe gRPC v2 API

✅ Supported

Streaming responses

✅ Supported

Multi-model serving

✅ Supported

Integrated routing

✅ Supported

Tool calling

✅ Supported

Quick Start#

Prerequisites#

  • Dynamo platform installed

  • etcd and nats-server -js running

  • At least one backend worker registered

HTTP Frontend#

python -m dynamo.frontend --http-port 8000

This starts an OpenAI-compatible HTTP server with integrated preprocessing and routing. Backends are auto-discovered when they call register_llm.

KServe gRPC Frontend#

python -m dynamo.frontend --kserve-grpc-server

See the Frontend Guide for KServe-specific configuration and message formats.

Kubernetes#

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: frontend-example
spec:
  graphs:
    - name: frontend
      replicas: 1
      services:
        - name: Frontend
          image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
          command:
            - python
            - -m
            - dynamo.frontend
            - --http-port
            - "8000"

Configuration#

Parameter

Default

Description

--http-port

8000

HTTP server port

--kserve-grpc-server

false

Enable KServe gRPC server

--router-mode

round_robin

Routing strategy: round_robin, random, kv

See the Frontend Guide for full configuration options.

Next Steps#

Document

Description

Frontend Guide

KServe gRPC configuration and integration

Router Documentation

KV-aware routing configuration