API Reference (K8s)
API Reference (K8s)
API Reference (K8s)
⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
Underlying type: string
CheckpointMode defines how checkpoint creation is handled
Validation:
Appears in:
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
Appears in:
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
Underlying type: string
Validation:
Appears in:
Underlying type: string
Validation:
Appears in:
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state
DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent
Appears in:
DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job
Appears in:
Underlying type: string
DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle
Validation:
Appears in:
DynamoCheckpointSpec defines the desired state of DynamoCheckpoint
Appears in:
DynamoCheckpointStatus defines the observed state of DynamoCheckpoint
Appears in:
Underlying type: string
DynamoCheckpointStorageType defines the supported storage backends for checkpoints
Validation:
Appears in:
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
Appears in:
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
DynamoModel is the Schema for the dynamo models API
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.
Appears in:
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
Appears in:
Appears in:
FrontendSidecarSpec configures the auto-generated frontend sidecar container. The operator uses these fields together with built-in frontend defaults (command, probes, ports, and Dynamo env vars) to produce a fully configured sidecar container.
Appears in:
Appears in:
Appears in:
ModelReference identifies a model served by this component
Appears in:
ModelSource defines the source location of a model
Appears in:
Appears in:
Appears in:
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
Appears in:
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
Appears in:
Underlying type: string
Appears in:
RestartStatus contains the status of the restart of the graph deployment.
Appears in:
Appears in:
Underlying type: string
Appears in:
Underlying type: string
RollingUpdatePhase represents the current phase of a rolling update.
Validation:
Appears in:
RollingUpdateStatus tracks the progress of a rolling update.
Appears in:
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
ServiceCheckpointConfig configures checkpointing for a DGD service
Appears in:
ServiceCheckpointStatus contains checkpoint information for a single service.
Appears in:
ServiceReplicaStatus contains replica information for a single service.
Appears in:
Appears in:
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
Underlying type: string
BackendType specifies the inference backend.
Validation:
Appears in:
Underlying type: string
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
Validation:
Appears in:
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.
Lifecycle:
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.
Appears in:
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
Appears in:
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
Appears in:
Underlying type: string
GPUSKUType is the AIC hardware system identifier for a supported GPU.
Validation:
Appears in:
HardwareSpec describes the hardware resources available for profiling and deployment. These fields are typically auto-filled by the operator from cluster discovery.
Appears in:
MockerSpec configures the simulated (mocker) backend.
Appears in:
ModelCacheSpec references a PVC containing pre-downloaded model weights.
Appears in:
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
Appears in:
ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.
Appears in:
Underlying type: string
ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.
Validation:
Appears in:
ProfilingResultsStatus contains the output of the profiling process.
Appears in:
SLASpec defines the service-level agreement targets for profiling optimization.
Appears in:
Underlying type: string
SearchStrategy controls the profiling search depth.
Validation:
Appears in:
WorkloadSpec defines the workload characteristics for SLA-based profiling.
Appears in:
Underlying type: string
CertProvisionMode controls how webhook TLS certificates are managed.
Appears in:
CheckpointConfiguration holds checkpoint/restore settings.
Appears in:
CheckpointOCIConfig holds OCI registry storage configuration.
Appears in:
CheckpointPVCConfig holds PVC storage configuration.
Appears in:
CheckpointS3Config holds S3 storage configuration.
Appears in:
CheckpointStorageConfiguration holds storage backend configuration for checkpoints.
Appears in:
Underlying type: string
DiscoveryBackend is the type for the discovery backend.
Appears in:
DiscoveryConfiguration holds discovery backend settings.
Appears in:
GPUConfiguration holds GPU discovery settings.
Appears in:
GroveConfiguration holds Grove orchestrator settings.
Appears in:
InfrastructureConfiguration holds service mesh and backend addresses.
Appears in:
IngressConfiguration holds ingress settings.
Appears in:
KaiSchedulerConfiguration holds Kai-scheduler settings.
Appears in:
LWSConfiguration holds LWS orchestrator settings.
Appears in:
LeaderElectionConfiguration holds leader election settings.
Appears in:
LoggingConfiguration holds logging settings.
Appears in:
MPIConfiguration holds MPI SSH secret settings.
Appears in:
MetricsServer extends Server with secure serving option.
Appears in:
NamespaceConfiguration determines operator namespace mode.
Appears in:
NamespaceScopeConfiguration holds lease settings for namespace-restricted mode.
Appears in:
OperatorConfiguration is the Schema for the operator configuration.
OrchestratorConfiguration holds orchestrator override settings.
Appears in:
RBACConfiguration holds RBAC settings for cluster-wide mode.
Appears in:
SecurityConfiguration holds HTTP/2 and TLS settings.
Appears in:
Server holds a bind address and port.
Appears in:
ServerConfiguration holds server bind addresses and ports.
Appears in:
WebhookServer extends Server with host and certificate directory.
Appears in:
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds: 60 secondsrestartPolicy: AlwaysThe operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumesThis default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:
Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.
Shared memory is enabled by default for all components:
true (unless explicitly disabled via sharedMemory.disabled)8Gi/dev/shmemptyDir with memory mediumTo disable shared memory or customize the size, use the sharedMemory field in your component specification.
The operator applies different default health probes based on the component type.
Frontend components receive the following probe configurations:
Liveness Probe:
/healthhttp (8000)Readiness Probe:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""Worker components receive the following probe configurations:
Liveness Probe:
/livesystem (9090)Readiness Probe:
/healthsystem (9090)Startup Probe:
/livesystem (9090):::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
For multinode deployments, the operator modifies probes based on the backend framework and node role:
The operator automatically selects between two deployment modes based on parallelism configuration:
Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):
--distributed-executor-backend ray)Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.
These environment variables are injected into every component container regardless of type.
These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.
These environment variables are injected when checkpoint/restore is enabled for a component.
The following component types automatically receive dedicated service accounts:
planner-serviceaccountepp-serviceaccountThe operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
kubernetes.io/dockerconfigjson in the component’s namespaceimagePullSecrets in the pod specificationThis eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
When autoscaling is enabled but no metrics are specified, the operator applies:
80%Default container ports are configured based on component type:
httpsystemnixlmetricsgrpcgrpc-healthmetricsOMPI_MCA_orte_keep_fqdn_hostnames=1For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurationsinternal/dynamo/component_common.go - Base container and pod spec shared by all component typesinternal/dynamo/component_frontend.gointernal/dynamo/component_worker.gointernal/dynamo/component_planner.gointernal/dynamo/component_epp.gointernal/secrets/docker.go - Implements the docker secret indexer and automatic discoveryinternal/checkpoint/dgd_integration.go - Checkpoint env var injection and volume setupinternal/consts/consts.go - Defines annotation keys and other constantslivenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaultssecurityContext in extraPodSpec, no defaults will be injected, giving you full controlextraPodSpec.mainContainer field can be used to override probe configurations set by the operator