Frontend#
The Dynamo Frontend is the API gateway for serving LLM inference requests. It provides OpenAI-compatible HTTP endpoints and KServe gRPC endpoints, handling request preprocessing, routing, and response formatting.
Feature Matrix#
Feature |
Status |
|---|---|
OpenAI Chat Completions API |
✅ Supported |
OpenAI Completions API |
✅ Supported |
KServe gRPC v2 API |
✅ Supported |
Streaming responses |
✅ Supported |
Multi-model serving |
✅ Supported |
Integrated routing |
✅ Supported |
Tool calling |
✅ Supported |
Quick Start#
Prerequisites#
Dynamo platform installed
etcdandnats-server -jsrunningAt least one backend worker registered
HTTP Frontend#
python -m dynamo.frontend --http-port 8000
This starts an OpenAI-compatible HTTP server with integrated preprocessing and routing. Backends are auto-discovered when they call register_llm.
KServe gRPC Frontend#
python -m dynamo.frontend --kserve-grpc-server
See the Frontend Guide for KServe-specific configuration and message formats.
Kubernetes#
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: frontend-example
spec:
graphs:
- name: frontend
replicas: 1
services:
- name: Frontend
image: nvcr.io/nvidia/dynamo/dynamo-vllm:latest
command:
- python
- -m
- dynamo.frontend
- --http-port
- "8000"
Configuration#
Parameter |
Default |
Description |
|---|---|---|
|
8000 |
HTTP server port |
|
false |
Enable KServe gRPC server |
|
|
Routing strategy: |
See the Frontend Guide for full configuration options.
Next Steps#
Document |
Description |
|---|---|
KServe gRPC configuration and integration |
|
KV-aware routing configuration |