Component Layers
Each layer contains components that can be adopted independently or combined with others. Components within a layer often complement each other but do not require the full layer to function.
Infrastructure Layer
Dynamic Resource Allocation (DRA) enables flexible GPU sharing in Kubernetes clusters. DRA ComputeDomains are a construct for managing distributed shared memory (IMEX). Together with advanced topology-aware schedulers, DRA powers predictable multi-node disaggregated model inference at production (rack) scale.
Optimization Layer
Components that prepare and optimize models for maximum inference performance. Use any combination based on your model types and requirements.
Deployment Layer
Cloud-native orchestration and infrastructure management. Start with GPU Operator for basic GPU management, then add components as scaling requirements grow. When deploying Dynamo with NVIDIA’s Helm charts, you can optionally install Grove and KAI Scheduler in the same chart.
Inference Serving Layer
The runtime engines that handle inference requests. Choose Triton for traditional ML workloads, Dynamo for GenAI, or both for mixed environments.
Memory and Caching Layer
High-performance memory management and data transfer components. These unlock advanced capabilities like disaggregated serving and fast model loading when needed.
Performance Tooling
Tools for benchmarking and configuration. Use these independently or together to optimize your deployment.