SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
Appears in:
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
Appears in:
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
DynamoModel is the Schema for the dynamo models API
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
Appears in:
Appears in:
Appears in:
Appears in:
ModelReference identifies a model served by this component
Appears in:
ModelSource defines the source location of a model
Appears in:
Appears in:
Appears in:
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See benchmarks/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
Appears in:
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
ServiceReplicaStatus contains replica information for a single service.
Appears in:
Appears in:
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds: 60 secondsrestartPolicy: AlwaysThe operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumesThis default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:
Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.
Shared memory is enabled by default for all components:
true (unless explicitly disabled via sharedMemory.disabled)8Gi/dev/shmemptyDir with memory mediumTo disable shared memory or customize the size, use the sharedMemory field in your component specification.
The operator applies different default health probes based on the component type.
Frontend components receive the following probe configurations:
Liveness Probe:
/healthhttp (8000)Readiness Probe:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""Worker components receive the following probe configurations:
Liveness Probe:
/livesystem (9090)Readiness Probe:
/healthsystem (9090)Startup Probe:
/livesystem (9090):::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
For multinode deployments, the operator modifies probes based on the backend framework and node role:
The operator automatically selects between two deployment modes based on parallelism configuration:
Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):
--distributed-executor-backend ray)Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
The operator automatically injects environment variables based on component type and configuration:
DYN_NAMESPACE: The Dynamo namespace for the componentDYN_PARENT_DGD_K8S_NAME: The parent DynamoGraphDeployment Kubernetes resource nameDYN_PARENT_DGD_K8S_NAMESPACE: The parent DynamoGraphDeployment Kubernetes namespaceDYNAMO_PORT: 8000DYN_HTTP_PORT: 8000DYN_SYSTEM_PORT: 9090 (automatically enables the system metrics server)DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS: ["generate"]DYN_SYSTEM_ENABLED: true (needed for runtime images 0.6.1 and older)PLANNER_PROMETHEUS_PORT: 9085When a volume mount is configured with useAsCompilationCache: true:
VLLM_CACHE_ROOT: Set to the mount point of the cache volumePlanner components automatically receive the following service account:
serviceAccountName: planner-serviceaccountThe operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
kubernetes.io/dockerconfigjson in the component’s namespaceimagePullSecrets in the pod specificationThis eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
When autoscaling is enabled but no metrics are specified, the operator applies:
80%Default container ports are configured based on component type:
httpsystemmetricsOMPI_MCA_orte_keep_fqdn_hostnames=1For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurationsinternal/secrets/docker.go - Implements the docker secret indexer and automatic discoveryinternal/consts/consts.go - Defines annotation keys and other constantslivenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaultssecurityContext in extraPodSpec, no defaults will be injected, giving you full controlextraPodSpec.mainContainer field can be used to override probe configurations set by the operator