Managing Models with DynamoModel#
Overview#
DynamoModel is a Kubernetes Custom Resource that represents a machine learning model deployed on Dynamo. It enables you to:
Deploy LoRA adapters on top of running base models
Track model endpoints and their readiness across your cluster
Manage model lifecycle declaratively with Kubernetes
DynamoModel works alongside DynamoGraphDeployment (DGD) or DynamoComponentDeployment (DCD) resources. While DGD/DCD deploy the inference infrastructure (pods, services), DynamoModel handles model-specific operations like loading LoRA adapters.
Quick Start#
Prerequisites#
Before creating a DynamoModel, you need:
A running
DynamoGraphDeploymentorDynamoComponentDeploymentComponents configured with
modelRefpointing to your base modelPods are ready and serving your base model
For complete setup including DGD configuration, see Integration with DynamoGraphDeployment.
Deploy a LoRA Adapter#
1. Create your DynamoModel:
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
namespace: dynamo-system
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B # Must match modelRef.name in your DGD
modelType: lora
source:
uri: s3://my-bucket/loras/my-lora
2. Apply and verify:
# Apply the DynamoModel
kubectl apply -f my-lora.yaml
# Check status
kubectl get dynamomodel my-lora
Expected output:
NAME TOTAL READY AGE
my-lora 2 2 30s
That’s it! The operator automatically discovers endpoints and loads the LoRA.
For detailed status monitoring, see Monitoring & Operations.
Understanding DynamoModel#
Model Types#
DynamoModel supports three model types:
Type |
Description |
Use Case |
|---|---|---|
|
Reference to an existing base model |
Tracking endpoints for a base model (default) |
|
LoRA adapter that extends a base model |
Deploy fine-tuned adapters on existing models |
|
Generic model adapter |
Future extensibility for other adapter types |
Most users will use lora to deploy fine-tuned models on top of their base model deployments.
How It Works#
When you create a DynamoModel, the operator:
Discovers endpoints: Finds all pods running your
baseModelName(by matchingmodelRef.namein DGD/DCD)Creates service: Automatically creates a Kubernetes Service to track these pods
Loads LoRA: Calls the LoRA load API on each endpoint (for
loratype)Updates status: Reports which endpoints are ready
Key linkage:
# DGD modelRef.name ↔ DynamoModel baseModelName must match
Worker:
modelRef:
name: Qwen/Qwen3-0.6B
---
spec:
baseModelName: Qwen/Qwen3-0.6B
Configuration Overview#
DynamoModel requires just a few key fields to deploy a model or adapter:
Field |
Required |
Purpose |
Example |
|---|---|---|---|
|
Yes |
Model identifier |
|
|
Yes |
Links to DGD modelRef |
|
|
No |
Type: base/lora/adapter |
|
|
For LoRA |
Model location |
|
Example minimal LoRA configuration:
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://my-bucket/my-lora
For complete field specifications, validation rules, and all options, see: 📖 DynamoModel API Reference
Status Summary#
The status shows discovered endpoints and their readiness:
kubectl get dynamomodel my-lora
Key status fields:
totalEndpoints/readyEndpoints: Counts of discovered vs ready endpointsendpoints[]: List with addresses, pod names, and ready statusconditions: Standard Kubernetes conditions (EndpointsReady, ServicesFound)
For detailed status usage, see the Monitoring & Operations section below
Common Use Cases#
Use Case 1: S3-Hosted LoRA Adapter#
Deploy a LoRA adapter stored in an S3 bucket.
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: customer-support-lora
namespace: production
spec:
modelName: customer-support-adapter-v1
baseModelName: meta-llama/Llama-3.3-70B-Instruct
modelType: lora
source:
uri: s3://my-models-bucket/loras/customer-support/v1
Prerequisites:
S3 bucket accessible from your pods (IAM role or credentials)
Base model
meta-llama/Llama-3.3-70B-Instructrunning via DGD/DCD
Verification:
# Check LoRA is loaded
kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.readyEndpoints}'
# Should output: 2 (or your number of replicas)
# View which pods are serving
kubectl get dynamomodel customer-support-lora -o jsonpath='{.status.endpoints[*].podName}'
Use Case 2: HuggingFace-Hosted LoRA#
Deploy a LoRA adapter from HuggingFace Hub.
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: multilingual-lora
namespace: dynamo-system
spec:
modelName: multilingual-adapter
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: hf://myorg/qwen-multilingual-lora@v1.0.0 # Optional: @revision
Prerequisites:
HuggingFace Hub accessible from your pods
If private repo: HF token configured as secret and mounted in pods
Base model
Qwen/Qwen3-0.6Brunning via DGD/DCD
With HuggingFace token:
# In your DGD/DCD
spec:
services:
worker:
envFromSecret: hf-token-secret # Provides HF_TOKEN env var
modelRef:
name: Qwen/Qwen3-0.6B
# ... rest of config
Use Case 3: Multiple LoRAs on Same Base Model#
Deploy multiple LoRA adapters on the same base model deployment.
---
# LoRA for customer support
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: support-lora
spec:
modelName: support-adapter
baseModelName: Qwen/Qwen3-0.6B
modelType: lora
source:
uri: s3://models/support-lora
---
# LoRA for code generation
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: code-lora
spec:
modelName: code-adapter
baseModelName: Qwen/Qwen3-0.6B # Same base model
modelType: lora
source:
uri: s3://models/code-lora
Both LoRAs will be loaded on all pods serving Qwen/Qwen3-0.6B. Your application can then route requests to the appropriate adapter.
Monitoring & Operations#
Checking Status#
Quick status check:
kubectl get dynamomodel
Example output:
NAME TOTAL READY AGE
my-lora 2 2 5m
customer-lora 4 3 2h
Detailed status:
kubectl describe dynamomodel my-lora
Example output:
Name: my-lora
Namespace: dynamo-system
Spec:
Model Name: my-custom-lora
Base Model Name: Qwen/Qwen3-0.6B
Model Type: lora
Source:
Uri: s3://my-bucket/my-lora
Status:
Ready Endpoints: 2
Total Endpoints: 2
Endpoints:
Address: http://10.0.1.5:9090
Pod Name: worker-0
Ready: true
Address: http://10.0.1.6:9090
Pod Name: worker-1
Ready: true
Conditions:
Type: EndpointsReady
Status: True
Reason: EndpointsDiscovered
Events:
Type Reason Message
---- ------ -------
Normal EndpointsReady Discovered 2 ready endpoints for base model Qwen/Qwen3-0.6B
Understanding Readiness#
An endpoint is ready when:
The pod is running and healthy
The LoRA load API call succeeded
Condition states:
EndpointsReady=True: All endpoints are ready (full availability)EndpointsReady=False, Reason=NotReady: Not all endpoints ready (check message for counts)EndpointsReady=False, Reason=NoEndpoints: No endpoints found
When readyEndpoints < totalEndpoints, the operator automatically retries loading every 30 seconds.
Viewing Endpoints#
Get endpoint addresses:
kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].address}' | tr ' ' '\n'
Output:
http://10.0.1.5:9090
http://10.0.1.6:9090
Get endpoint pod names:
kubectl get dynamomodel my-lora -o jsonpath='{.status.endpoints[*].podName}' | tr ' ' '\n'
Check readiness of each endpoint:
kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | {podName, ready}'
Output:
{
"podName": "worker-0",
"ready": true
}
{
"podName": "worker-1",
"ready": true
}
Updating a Model#
To update a LoRA (e.g., deploy a new version):
# Edit the source URI
kubectl edit dynamomodel my-lora
# Or apply an updated YAML
kubectl apply -f my-lora-v2.yaml
The operator will detect the change and reload the LoRA on all endpoints.
Deleting a Model#
kubectl delete dynamomodel my-lora
For LoRA models, the operator will:
Unload the LoRA from all endpoints
Clean up associated resources
Remove the DynamoModel CR
The base model deployment (DGD/DCD) continues running normally.
Troubleshooting#
No Endpoints Found#
Symptom:
status:
totalEndpoints: 0
readyEndpoints: 0
conditions:
- type: EndpointsReady
status: "False"
reason: NoEndpoints
message: "No endpoint slices found for base model Qwen/Qwen3-0.6B"
Common Causes:
Base model deployment not running
# Check if pods exist kubectl get pods -l nvidia.com/dynamo-component-type=worker
Solution: Deploy your DGD/DCD first, wait for pods to be ready.
baseModelNamemismatch# Check modelRef in your DGD kubectl get dynamographdeployment my-deployment -o yaml | grep -A2 modelRef
Solution: Ensure
baseModelNamein DynamoModel exactly matchesmodelRef.namein DGD.Pods not ready
# Check pod status kubectl get pods -l nvidia.com/dynamo-component-type=worker
Solution: Wait for pods to reach
RunningandReadystate.Wrong namespace Solution: Ensure DynamoModel is in the same namespace as your DGD/DCD.
LoRA Load Failures#
Symptom:
status:
totalEndpoints: 2
readyEndpoints: 0 # ← No endpoints ready despite pods existing
conditions:
- type: EndpointsReady
status: "False"
reason: NoReadyEndpoints
Common Causes:
Source URI not accessible
# Check operator logs kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f | grep "Failed to load"
Solution:
For S3: Verify bucket permissions, IAM role, credentials
For HuggingFace: Verify token is valid, repo exists and is accessible
Invalid LoRA format Solution: Ensure your LoRA weights are in the format expected by your backend framework (vLLM, SGLang, etc.)
Endpoint API errors
# Check operator logs for HTTP errors kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "error"
Solution: Check the backend framework’s logs in the worker pods:
kubectl logs worker-0
Out of memory Solution: LoRA adapters require additional memory. Increase memory limits in your DGD:
resources: limits: memory: "32Gi" # Increase if needed
Status Shows Not Ready#
Symptom: Some endpoints remain not ready for extended periods.
Diagnosis:
# Check which endpoints are not ready
kubectl get dynamomodel my-lora -o json | jq '.status.endpoints[] | select(.ready == false)'
# View operator logs for that specific pod
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "worker-0"
# Check the worker pod logs
kubectl logs worker-0 | tail -50
Common Causes:
Network issues: Pod can’t reach S3/HuggingFace
Resource constraints: Pod is OOMing or being throttled
API endpoint not responding: Backend framework isn’t serving the LoRA API
When to wait vs investigate:
Wait: If readyEndpoints is increasing over time (LoRAs loading progressively)
Investigate: If stuck at same readyEndpoints for >5 minutes
Viewing Events and Logs#
Check events:
kubectl describe dynamomodel my-lora | tail -20
View operator logs:
# Follow logs
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager -f
# Filter for specific model
kubectl logs -n dynamo-system deployment/dynamo-operator-controller-manager | grep "my-lora"
Common events and messages:
Event/Message |
Meaning |
Action |
|---|---|---|
|
All endpoints are ready |
✅ Good - full service availability |
|
Not all endpoints ready |
⚠️ Check readyEndpoints count - operator will retry |
|
Some endpoints failed to load |
Check logs for errors |
|
No pods discovered |
Verify DGD running and modelRef matches |
|
Can’t query endpoints |
Check operator RBAC permissions |
|
Reconciliation complete |
✅ Good |
Integration with DynamoGraphDeployment#
This section shows the complete end-to-end workflow for deploying base models and LoRA adapters together.
DynamoModel and DynamoGraphDeployment work together to provide complete model deployment:
DGD: Deploys the infrastructure (pods, services, resources)
DynamoModel: Manages model-specific operations (LoRA loading)
Linking Models to Components#
The connection is established through the modelRef field in your DGD:
Complete example:
---
# 1. Deploy the base model infrastructure
apiVersion: nvidia.com/v1alpha1
kind: DynamoGraphDeployment
metadata:
name: my-deployment
spec:
backendFramework: vllm
services:
Frontend:
componentType: frontend
replicas: 1
dynamoNamespace: my-app
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
Worker:
# This modelRef creates the link to DynamoModel
modelRef:
name: Qwen/Qwen3-0.6B # ← Key linking field
componentType: worker
replicas: 2
resources:
limits:
gpu: "1"
extraPodSpec:
mainContainer:
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:latest
args:
- --model
- Qwen/Qwen3-0.6B
- --tensor-parallel-size
- "1"
---
# 2. Deploy LoRA adapters on top
apiVersion: nvidia.com/v1alpha1
kind: DynamoModel
metadata:
name: my-lora
spec:
modelName: my-custom-lora
baseModelName: Qwen/Qwen3-0.6B # ← Must match modelRef.name above
modelType: lora
source:
uri: s3://my-bucket/loras/my-lora
Deployment Workflow#
Recommended order:
# 1. Deploy base model infrastructure
kubectl apply -f my-deployment.yaml
# 2. Wait for pods to be ready
kubectl wait --for=condition=ready pod -l nvidia.com/dynamo-component-type=worker --timeout=5m
# 3. Deploy LoRA adapters
kubectl apply -f my-lora.yaml
# 4. Verify LoRA is loaded
kubectl get dynamomodel my-lora
What happens behind the scenes:
Step |
DGD |
DynamoModel |
|---|---|---|
1 |
Creates pods with modelRef |
- |
2 |
Pods become running and ready |
- |
3 |
- |
CR created, discovers endpoints via auto-created Service |
4 |
- |
Calls LoRA load API on each endpoint |
5 |
- |
All endpoints ready ✓ |
The operator automatically handles all service discovery - you don’t configure services, labels, or selectors manually.
API Reference#
For complete field specifications, validation rules, and detailed type definitions, see:
Summary#
DynamoModel provides declarative model management for Dynamo deployments:
✅ Simple: 2-step deployment of LoRA adapters ✅ Automatic: Endpoint discovery and loading handled by operator ✅ Observable: Rich status reporting and conditions ✅ Integrated: Works seamlessly with DynamoGraphDeployment
Next Steps:
Try the Quick Start example
Explore Common Use Cases
Check the API Reference for advanced configuration