⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
API Reference#
Packages#
nvidia.com/v1alpha1#
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
Resource Types#
Autoscaling#
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Deprecated: This field is ignored. |
||
|
Deprecated: This field is ignored. |
||
|
Deprecated: This field is ignored. |
||
|
Deprecated: This field is ignored. |
||
|
Deprecated: This field is ignored. |
ComponentKind#
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
Appears in:
Field |
Description |
|---|---|
|
ComponentKindPodClique represents a PodClique resource. |
|
ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource. |
|
ComponentKindDeployment represents a Deployment resource. |
|
ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource. |
ConfigMapKeySelector#
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name of the ConfigMap containing the desired data. |
Required: {} |
|
|
Key in the ConfigMap to select. If not specified, defaults to “disagg.yaml”. |
disagg.yaml |
DeploymentOverridesSpec#
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name is the desired name for the created DynamoGraphDeployment. |
Optional: {} |
|
|
Namespace is the desired namespace for the created DynamoGraphDeployment. |
Optional: {} |
|
|
Labels are additional labels to add to the DynamoGraphDeployment metadata. |
Optional: {} |
|
|
Annotations are additional annotations to add to the DynamoGraphDeployment metadata. |
Optional: {} |
|
|
WorkersImage specifies the container image to use for DynamoGraphDeployment worker components. |
Optional: {} |
DeploymentStatus#
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name is the name of the created DynamoGraphDeployment. |
||
|
Namespace is the namespace of the created DynamoGraphDeployment. |
||
|
State is the current state of the DynamoGraphDeployment. |
||
|
Created indicates whether the DGD has been successfully created. |
DynamoComponentDeployment#
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
Spec defines the desired state for this Dynamo component deployment. |
DynamoComponentDeploymentSpec#
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”) |
Enum: [sglang vllm trtllm] |
|
|
Annotations to add to generated Kubernetes resources for this component |
||
|
Labels to add to generated Kubernetes resources for this component. |
||
|
The name of the component |
||
|
ComponentType indicates the role of this component (for example, “main”). |
||
|
SubComponentType indicates the sub-role of this component (for example, “prefill”). |
||
|
DynamoNamespace is deprecated and will be removed in a future version. |
Optional: {} |
|
|
GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace |
||
|
Resources requested and limits for this component, including CPU, memory, |
||
|
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter |
||
|
Envs defines additional environment variables to inject into the component containers. |
||
|
EnvFromSecret references a Secret whose key/value pairs will be exposed as |
||
|
VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component. |
||
|
Ingress config to expose the component outside the cluster (or through a service mesh). |
||
|
ModelRef references a model that this component serves |
||
|
SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size). |
||
|
ExtraPodMetadata adds labels/annotations to the created Pods. |
||
|
ExtraPodSpec allows to override the main pod spec configuration. |
||
|
LivenessProbe to detect and restart unhealthy containers. |
||
|
ReadinessProbe to signal when the container is ready to receive traffic. |
||
|
Replicas is the desired number of Pods for this component. |
Minimum: 0 |
|
|
Multinode is the configuration for multinode components. |
||
|
ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. |
DynamoGraphDeployment#
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
Spec defines the desired state for this graph deployment. |
|||
|
Status reflects the current observed state of this graph deployment. |
DynamoGraphDeploymentRequest#
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
Initial → Pending: Validates spec and prepares for profiling
Pending → Profiling: Creates and runs profiling job (online or AIC)
Profiling → Ready/Deploying: Generates DGD spec after profiling completes
Deploying → Ready: When autoApply=true, monitors DGD until Ready
Ready: Terminal state when DGD is operational or spec is available
DeploymentDeleted: Terminal state when auto-created DGD is manually deleted
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
Spec defines the desired state for this deployment request. |
|||
Status reflects the current observed state of this deployment request. |
DynamoGraphDeploymentRequestSpec#
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). |
Required: {} |
|
|
Backend specifies the inference backend to use. |
Enum: [vllm sglang trtllm] |
|
|
EnableGpuDiscovery controls whether the profiler should automatically discover GPU |
false |
Optional: {} |
|
ProfilingConfig provides the complete configuration for the profiling job. |
Required: {} |
|
|
AutoApply indicates whether to automatically create a DynamoGraphDeployment |
false |
|
|
DeploymentOverrides allows customizing metadata for the auto-created DGD. |
Optional: {} |
DynamoGraphDeploymentRequestStatus#
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
State is a high-level textual status of the deployment request lifecycle. |
||
|
Backend is extracted from profilingConfig.config.engine.backend for display purposes. |
Optional: {} |
|
|
ObservedGeneration reflects the generation of the most recently observed spec. |
||
|
Conditions contains the latest observed conditions of the deployment request. |
||
|
ProfilingResults contains a reference to the ConfigMap holding profiling data. |
Optional: {} |
|
|
GeneratedDeployment contains the full generated DynamoGraphDeployment specification |
EmbeddedResource: {} |
|
|
Deployment tracks the auto-created DGD when AutoApply is true. |
Optional: {} |
DynamoGraphDeploymentScalingAdapter#
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
DynamoGraphDeploymentScalingAdapterSpec#
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Replicas is the desired number of replicas for the target service. |
Minimum: 0 |
|
DGDRef references the DynamoGraphDeployment and the specific service to scale. |
Required: {} |
DynamoGraphDeploymentScalingAdapterStatus#
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Replicas is the current number of replicas for the target service. |
||
|
Selector is a label selector string for the pods managed by this adapter. |
||
|
LastScaleTime is the last time the adapter scaled the target service. |
DynamoGraphDeploymentServiceRef#
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name of the DynamoGraphDeployment |
MinLength: 1 |
|
|
ServiceName is the key name of the service within the DGD’s spec.services map to scale |
MinLength: 1 |
DynamoGraphDeploymentSpec#
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
PVCs defines a list of persistent volume claims that can be referenced by components. |
MaxItems: 100 |
|
|
Services are the services to deploy as part of this deployment. |
MaxProperties: 25 |
|
|
Envs are environment variables applied to all services in the deployment unless |
Optional: {} |
|
|
BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”). |
Enum: [sglang vllm trtllm] |
DynamoGraphDeploymentStatus#
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
State is a high-level textual status of the graph deployment lifecycle. |
||
|
Conditions contains the latest observed conditions of the graph deployment. |
||
|
Services contains per-service replica status information. |
DynamoModel#
DynamoModel is the Schema for the dynamo models API
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
|||
|
DynamoModelSpec#
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ModelName is the full model identifier (e.g., “meta-llama/Llama-3.3-70B-Instruct-lora”) |
Required: {} |
|
|
BaseModelName is the base model identifier that matches the service label |
Required: {} |
|
|
ModelType specifies the type of model (e.g., “base”, “lora”, “adapter”) |
base |
Enum: [base lora adapter] |
|
Source specifies the model source location (only applicable for lora model type) |
DynamoModelStatus#
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Endpoints is the current list of all endpoints for this model |
||
|
ReadyEndpoints is the count of endpoints that are ready |
||
|
TotalEndpoints is the total count of endpoints |
||
|
Conditions represents the latest available observations of the model’s state |
EndpointInfo#
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Address is the full address of the endpoint (e.g., “http://10.0.1.5:9090”) |
||
|
PodName is the name of the pod serving this endpoint |
||
|
Ready indicates whether the endpoint is ready to serve traffic |
ExtraPodMetadata#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
|||
|
ExtraPodSpec#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
IngressSpec#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Enabled exposes the component through an ingress or virtual service when true. |
||
|
Host is the base host name to route external traffic to this component. |
||
|
UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress. |
||
|
VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to. |
||
|
HostPrefix is an optional prefix added before the host. |
||
|
Annotations to set on the generated Ingress/VirtualService resources. |
||
|
Labels to set on the generated Ingress/VirtualService resources. |
||
|
TLS holds the TLS configuration used by the Ingress/VirtualService. |
||
|
HostSuffix is an optional suffix appended after the host. |
||
|
IngressControllerClassName selects the ingress controller class (e.g., “nginx”). |
IngressTLSSpec#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
SecretName is the name of a Kubernetes Secret containing the TLS certificate and key. |
ModelReference#
ModelReference identifies a model served by this component
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name is the base model identifier (e.g., “llama-3-70b-instruct-v1”) |
Required: {} |
|
|
Revision is the model revision/version (optional) |
ModelSource#
ModelSource defines the source location of a model
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
URI is the model source URI |
Required: {} |
MultinodeSpec#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Indicates the number of nodes to deploy for multinode components. |
2 |
Minimum: 2 |
PVC#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Create indicates to create a new PVC |
||
|
Name is the name of the PVC |
Required: {} |
|
|
StorageClass to be used for PVC creation. Required when create is true. |
||
|
Size of the volume in Gi, used during PVC creation. Required when create is true. |
||
|
VolumeAccessMode is the volume access mode of the PVC. Required when create is true. |
ProfilingConfigSpec#
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler. |
Optional: {} |
|
|
ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment |
Optional: {} |
|
|
ProfilerImage specifies the container image to use for profiling jobs. |
Required: {} |
|
|
OutputPVC is an optional PersistentVolumeClaim name for storing profiling output. |
Optional: {} |
|
|
Resources specifies the compute resource requirements for the profiling job container. |
Optional: {} |
|
|
Tolerations allows the profiling job to be scheduled on nodes with matching taints. |
Optional: {} |
ResourceItem#
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
CPU specifies the CPU resource request/limit (e.g., “1000m”, “2”) |
||
|
Memory specifies the memory resource request/limit (e.g., “4Gi”, “8Gi”) |
||
|
GPU indicates the number of GPUs to request. |
||
|
GPUType can specify a custom GPU type, e.g. “gpu.intel.com/xe” |
||
|
Custom specifies additional custom resource requests/limits |
Resources#
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Requests specifies the minimum resources required by the component |
||
|
Limits specifies the maximum resources allowed for the component |
||
|
Claims specifies resource claims for dynamic resource allocation |
ScalingAdapter#
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled (default), the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Disable indicates whether the ScalingAdapter should be disabled for this service. |
false |
ServiceReplicaStatus#
ServiceReplicaStatus contains replica information for a single service.
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
ComponentKind is the underlying resource kind (e.g., “PodClique”, “PodCliqueScalingGroup”, “Deployment”, “LeaderWorkerSet”). |
Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet] |
|
|
ComponentName is the name of the underlying resource. |
||
|
Replicas is the total number of non-terminated replicas. |
Minimum: 0 |
|
|
UpdatedReplicas is the number of replicas at the current/desired revision. |
Minimum: 0 |
|
|
ReadyReplicas is the number of ready replicas. |
Minimum: 0 |
|
|
AvailableReplicas is the number of available replicas. |
Minimum: 0 |
VolumeMount#
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
Field |
Description |
Default |
Validation |
|---|---|---|---|
|
Name references a PVC name defined in the top-level PVCs map |
Required: {} |
|
|
MountPoint specifies where to mount the volume. |
||
|
UseAsCompilationCache indicates this volume should be used as a compilation cache. |
false |
Operator Default Values Injection#
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive
fsGroup: 1000by default to ensure proper file permissions for mounted volumes. This can be overridden via theextraPodSpec.securityContextfield.Shared Memory: All components receive an 8Gi shared memory volume mounted at
/dev/shmby default (can be disabled or resized via thesharedMemoryfield).Environment Variables: Components automatically receive environment variables like
DYN_NAMESPACE,DYN_PARENT_DGD_K8S_NAME,DYNAMO_PORT, and backend-specific variables.Pod Configuration: Default
terminationGracePeriodSecondsof 60 seconds andrestartPolicy: Always.Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
Pod Specification Defaults#
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds:60secondsrestartPolicy:Always
Security Context#
The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup:1000- Sets the group ownership of mounted volumes and any files created in those volumes
This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
Model downloads and caching
Compilation cache directories
Persistent volume claims (PVCs)
SSH key generation in multinode deployments
Overriding Security Context#
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
services:
YourWorker:
extraPodSpec:
securityContext:
fsGroup: 2000 # Custom group ID
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
OpenShift and Security Context Constraints#
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:
services:
YourWorker:
extraPodSpec:
securityContext:
# Omit fsGroup to let OpenShift assign it based on SCC
# OpenShift will inject the appropriate UID range
Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.
Health Probes by Component Type#
The operator applies different default health probes based on the component type.
Frontend Components#
Frontend components receive the following probe configurations:
Liveness Probe:
Type: HTTP GET
Path:
/healthPort:
http(8000)Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10
Readiness Probe:
Type: Exec command
Command:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10
Worker Components#
Worker components receive the following probe configurations:
Liveness Probe:
Type: HTTP GET
Path:
/livePort:
system(9090)Period: 5 seconds
Timeout: 30 seconds
Failure Threshold: 1
Readiness Probe:
Type: HTTP GET
Path:
/healthPort:
system(9090)Period: 10 seconds
Timeout: 30 seconds
Failure Threshold: 60
Startup Probe:
Type: HTTP GET
Path:
/livePort:
system(9090)Period: 10 seconds
Timeout: 5 seconds
Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)
Note
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
Multinode Deployment Probe Modifications#
For multinode deployments, the operator modifies probes based on the backend framework and node role:
VLLM Backend#
The operator automatically selects between two deployment modes based on parallelism configuration:
Ray-Based Mode (when world_size > GPUs_per_node):
Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active
Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active
SGLang Backend#
Worker nodes: All probes (liveness, readiness, startup) are removed
TensorRT-LLM Backend#
Leader nodes: All probes remain unchanged
Worker nodes:
Liveness and startup probes are removed
Readiness probe is replaced with a TCP socket check on SSH port (2222):
Initial Delay: 20 seconds
Period: 20 seconds
Timeout: 5 seconds
Failure Threshold: 10
Environment Variables#
The operator automatically injects environment variables based on component type and configuration:
All Components#
DYN_NAMESPACE: The Dynamo namespace for the componentDYN_PARENT_DGD_K8S_NAME: The parent DynamoGraphDeployment Kubernetes resource nameDYN_PARENT_DGD_K8S_NAMESPACE: The parent DynamoGraphDeployment Kubernetes namespace
Frontend Components#
DYNAMO_PORT:8000DYN_HTTP_PORT:8000
Worker Components#
DYN_SYSTEM_PORT:9090(automatically enables the system metrics server)DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS:["generate"]DYN_SYSTEM_ENABLED:true(needed for runtime images 0.6.1 and older)
Planner Components#
PLANNER_PROMETHEUS_PORT:9085
VLLM Backend (with compilation cache)#
When a volume mount is configured with useAsCompilationCache: true:
VLLM_CACHE_ROOT: Set to the mount point of the cache volume
Service Account#
Planner components automatically receive the following service account:
serviceAccountName:planner-serviceaccount
Image Pull Secrets#
The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
Scans all Kubernetes secrets of type
kubernetes.io/dockerconfigjsonin the component’s namespaceExtracts the docker registry server URLs from each secret’s authentication configuration
Matches the container image’s registry host against the discovered registry URLs
Automatically injects matching secrets as
imagePullSecretsin the pod specification
This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
annotations:
nvidia.com/disable-image-pull-secret-discovery: "true"
Autoscaling Defaults#
When autoscaling is enabled but no metrics are specified, the operator applies:
Default Metric: CPU utilization
Target Average Utilization:
80%
Port Configurations#
Default container ports are configured based on component type:
Frontend Components#
Port: 8000
Protocol: TCP
Name:
http
Worker Components#
Port: 9090
Protocol: TCP
Name:
system
Planner Components#
Port: 9085
Protocol: TCP
Name:
metrics
Backend-Specific Configurations#
VLLM#
Ray Head Port: 6379 (for Ray-based multinode deployments)
Data Parallel RPC Port: 13445 (for data parallel multinode deployments)
SGLang#
Distribution Init Port: 29500 (for multinode deployments)
TensorRT-LLM#
SSH Port: 2222 (for multinode MPI communication)
OpenMPI Environment:
OMPI_MCA_orte_keep_fqdn_hostnames=1
Implementation Reference#
For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
Health Probes, Security Context & Pod Specifications:
internal/dynamo/graph.go- Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurationsComponent-Specific Defaults:
Image Pull Secrets:
internal/secrets/docker.go- Implements the docker secret indexer and automatic discoveryBackend-Specific Behavior:
Constants & Annotations:
internal/consts/consts.go- Defines annotation keys and other constants
Notes#
All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
User-specified probes (via
livenessProbe,readinessProbe, orstartupProbefields) take precedence over operator defaultsFor security context, if you provide any
securityContextinextraPodSpec, no defaults will be injected, giving you full controlFor multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
The
extraPodSpec.mainContainerfield can be used to override probe configurations set by the operator