LoRA Adapters
Serve fine-tuned LoRA adapters with dynamic loading and routing in Dynamo
Serve fine-tuned LoRA adapters with dynamic loading and routing in Dynamo
LoRA (Low-Rank Adaptation) enables efficient fine-tuning and serving of specialized model variants without duplicating full model weights. Dynamo provides built-in support for dynamic LoRA adapter loading, caching, and inference routing.
See the Feature Matrix for full compatibility details.
Dynamo’s LoRA implementation provides:
file://), S3-compatible storage (s3://), or Hugging Face Hub (hf://)/v1/modelsDynamoModel CRDThe LoRA system consists of:
lib/llm/src/lora/): High-performance downloading, caching, and validationcomponents/src/dynamo/common/lora/): Extensible wrapper with custom source supportcomponents/src/dynamo/vllm/handlers.py): Load/unload API and inference integration1. Start Dynamo with LoRA support:
2. Load a LoRA adapter:
3. Run inference with the LoRA:
For production deployments, store LoRA adapters in S3-compatible storage:
Load a LoRA adapter from a source URI.
Request:
Response:
List all loaded LoRA adapters.
Response:
Unload a LoRA adapter from the worker.
Response:
For Kubernetes deployments, use the DynamoModel Custom Resource to declaratively manage LoRA adapters.
When you create a DynamoModel:
baseModelNameFor complete Kubernetes deployment details, see:
Check S3 connectivity:
Check cache directory:
Check worker logs:
curl http://localhost:8081/v1/lorasmodel field in your request matches the lora_nameWhen KV-aware routing is enabled, the router automatically accounts for LoRA adapter identity when computing block hashes. This means:
A will never be confused with blocks cached under adapter B or the base model, even if the token sequences are identical. The adapter name is mixed into the LocalBlockHash computation.BlockStored) from the engine to the router. The router uses the lora_name field on events to route LoRA requests to workers that have matching cached blocks.This works end-to-end across the publisher pipeline, the KV consolidator (for deduplication), and the routing query path.