Frontend#

The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.

Feature Matrix#

Feature	Status
OpenAI Chat Completions API	✅ Supported
OpenAI Completions API	✅ Supported
KServe gRPC v2 API	✅ Supported
Streaming responses	✅ Supported
Multi-model serving	✅ Supported
Integrated routing	✅ Supported
Tool calling	✅ Supported

Quick Start#

Prerequisites#

Dynamo platform installed
etcd and nats-server -js running
At least one backend worker registered

HTTP Frontend#

python -m dynamo.frontend --http-port 8000

This starts an OpenAI-compatible HTTP server with integrated preprocessing and routing. Backends are auto-discovered when they call register_llm.

KServe gRPC Frontend#

python -m dynamo.frontend --kserve-grpc-server

See the Frontend Guide for KServe-specific configuration and message formats.

Kubernetes#

apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
  name: frontend-example
spec:
  graphs:
    - name: frontend
      replicas: 1
      services:
        - name: Frontend
          image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
          command:
            - python
            - -m
            - dynamo.frontend
            - --http-port
            - "8000"

Configuration#

Parameter	Default	Description
`--http-port`	8000	HTTP server port
`--kserve-grpc-server`	false	Enable KServe gRPC server
`--router-mode`	`round_robin`	Routing strategy: `round_robin`, `random`, `kv`

See the Frontend Guide for full configuration options.

Next Steps#

Document	Description
Frontend Guide	KServe gRPC configuration and integration
Router Documentation	KV-aware routing configuration