API Reference

⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.

API Reference

Packages

nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

Resource Types

Autoscaling

Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Deprecated: This field is ignored.
`minReplicas` integer	Deprecated: This field is ignored.
`maxReplicas` integer	Deprecated: This field is ignored.
`behavior` HorizontalPodAutoscalerBehavior	Deprecated: This field is ignored.
`metrics` MetricSpec array	Deprecated: This field is ignored.

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled

Validation:

Enum: [Auto Manual]

Appears in:

ServiceCheckpointConfig

Field	Description
`Auto`	CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR
`Manual`	CheckpointModeManual means the user must create the Checkpoint CR themselves

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource.

Validation:

Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

ServiceReplicaStatus

Field	Description
`PodClique`	ComponentKindPodClique represents a PodClique resource.
`PodCliqueScalingGroup`	ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.
`Deployment`	ComponentKindDeployment represents a Deployment resource.
`LeaderWorkerSet`	ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.

ConfigMapKeySelector

ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.

Appears in:

ProfilingConfigSpec

Field	Description	Default	Validation
`name` string	Name of the ConfigMap containing the desired data.		Required: {}
`key` string	Key in the ConfigMap to select. If not specified, defaults to “disagg.yaml”.	disagg.yaml

DGDRState

Underlying type: string

Validation:

Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Initializing`
`Pending`
`Profiling`
`Deploying`
`Ready`
`DeploymentDeleted`
`Failed`

DGDState

Underlying type: string

Validation:

Enum: [initializing pending successful failed]

Appears in:

Field	Description
`initializing`
`pending`
`successful`
`failed`

DeploymentOverridesSpec

DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`name` string	Name is the desired name for the created DynamoGraphDeployment. If not specified, defaults to the DGDR name.	Optional: {}
`namespace` string	Namespace is the desired namespace for the created DynamoGraphDeployment. If not specified, defaults to the DGDR namespace.	Optional: {}
`labels` object (keys:string, values:string)	Labels are additional labels to add to the DynamoGraphDeployment metadata. These are merged with auto-generated labels from the profiling process.	Optional: {}
`annotations` object (keys:string, values:string)	Annotations are additional annotations to add to the DynamoGraphDeployment metadata.	Optional: {}
`workersImage` string	WorkersImage specifies the container image to use for DynamoGraphDeployment worker components. This image is used for both temporary DGDs created during online profiling and the final DGD. If omitted, the image from the base config file (e.g., disagg.yaml) is used. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1”	Optional: {}

DeploymentStatus

DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`name` string	Name is the name of the created DynamoGraphDeployment.
`namespace` string	Namespace is the namespace of the created DynamoGraphDeployment.
`state` DGDState	State is the current state of the DynamoGraphDeployment. This value is mirrored from the DGD’s status.state field.	initializing	Enum: [initializing pending successful failed]
`created` boolean	Created indicates whether the DGD has been successfully created. Used to prevent recreation if the DGD is manually deleted by users.

DynamoCheckpoint

DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoCheckpoint`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoCheckpointSpec
`status` DynamoCheckpointStatus

DynamoCheckpointIdentity

DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent

Appears in:

Field	Description	Default	Validation
`model` string	Model is the model identifier (e.g., “meta-llama/Llama-3-70B”)		Required: {}
`backendFramework` string	BackendFramework is the runtime framework (vllm, sglang, trtllm)		Enum: [vllm sglang trtllm] Required: {}
`dynamoVersion` string	DynamoVersion is the Dynamo platform version (optional) If not specified, version is not included in identity hash This ensures checkpoint compatibility across Dynamo releases		Optional: {}
`tensorParallelSize` integer	TensorParallelSize is the tensor parallel configuration	1	Minimum: 1 Optional: {}
`pipelineParallelSize` integer	PipelineParallelSize is the pipeline parallel configuration	1	Minimum: 1 Optional: {}
`dtype` string	Dtype is the data type (fp16, bf16, fp8, etc.)		Optional: {}
`maxModelLen` integer	MaxModelLen is the maximum sequence length		Minimum: 1 Optional: {}
`extraParameters` object (keys:string, values:string)	ExtraParameters are additional parameters that affect the checkpoint hash Use for any framework-specific or custom parameters not covered above		Optional: {}

DynamoCheckpointJobConfig

DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job

Appears in:

DynamoCheckpointSpec

Field	Description	Default	Validation
`podTemplateSpec` PodTemplateSpec	PodTemplateSpec allows customizing the checkpoint Job pod This should include the container that runs the workload to be checkpointed		Required: {}
`targetContainerName` string	TargetContainerName is the container in PodTemplateSpec to snapshot.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm for the checkpoint Job pod. When omitted, checkpoint Jobs use the same default 8Gi tmpfs as Dynamo components.		Optional: {}
`activeDeadlineSeconds` integer	ActiveDeadlineSeconds specifies the maximum time the Job can run	3600	Minimum: 1 Optional: {}
`backoffLimit` integer	Deprecated: BackoffLimit is ignored. Checkpoint Jobs never retry.		Minimum: 0 Optional: {}
`ttlSecondsAfterFinished` integer	Deprecated: TTLSecondsAfterFinished is ignored. Checkpoint Jobs use a fixed 300 second TTL.		Minimum: 0 Optional: {}

DynamoCheckpointPhase

Underlying type: string

DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle

Validation:

Enum: [Pending Creating Ready Failed]

Appears in:

DynamoCheckpointStatus

Field	Description
`Pending`	DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started
`Creating`	DynamoCheckpointPhaseCreating indicates the checkpoint Job is running
`Ready`	DynamoCheckpointPhaseReady indicates the checkpoint artifact is available
`Failed`	DynamoCheckpointPhaseFailed indicates the checkpoint creation failed

DynamoCheckpointSpec

DynamoCheckpointSpec defines the desired state of DynamoCheckpoint

Appears in:

DynamoCheckpoint

Field	Description	Validation
`identity` DynamoCheckpointIdentity	Identity defines the inputs that determine checkpoint equivalence	Required: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService enables checkpoint-time GPU Memory Service wiring. It is intentionally outside spec.identity, so it does not affect the checkpoint identity hash or deduplication.	Optional: {}
`job` DynamoCheckpointJobConfig	Job defines the configuration for the checkpoint creation Job	Required: {}

DynamoCheckpointStatus

DynamoCheckpointStatus defines the observed state of DynamoCheckpoint

Appears in:

DynamoCheckpoint

Field	Description	Validation
`phase` DynamoCheckpointPhase	Phase represents the current phase of the checkpoint lifecycle	Enum: [Pending Creating Ready Failed] Optional: {}
`identityHash` string	IdentityHash is the computed hash of the checkpoint identity This hash is used to identify equivalent checkpoints	Optional: {}
`location` string	Deprecated: Location is ignored and no longer populated. It is retained only so older objects continue to validate.	Optional: {}
`storageType` DynamoCheckpointStorageType	Deprecated: StorageType is ignored and no longer populated. It is retained only so older objects continue to validate.	Enum: [pvc s3 oci] Optional: {}
`jobName` string	JobName is the name of the checkpoint creation Job	Optional: {}
`createdAt` Time	CreatedAt is the timestamp when the checkpoint became ready	Optional: {}
`message` string	Message provides additional information about the current state	Optional: {}
`conditions` Condition array	DEPRECATED: Conditions are deprecated. Use status.phase instead.	Optional: {}

DynamoCheckpointStorageType

Underlying type: string

Deprecated: StorageType is retained for compatibility with older DynamoCheckpoint status consumers. The current checkpoint flow publishes PVC-backed artifacts discovered from the snapshot-agent DaemonSet.

Validation:

Enum: [pvc s3 oci]

Appears in:

DynamoCheckpointStatus

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoComponentDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoComponentDeploymentSpec	Spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

Appears in:

Field	Description	Validation
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery	Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.	Optional: {}
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.	Optional: {}
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.
`scalingAdapter` ScalingAdapter	ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly.	Optional: {}
`eppConfig` EPPConfig	EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is “epp”.	Optional: {}
`frontendSidecar` FrontendSidecarSpec	FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE)	Optional: {}
`checkpoint` ServiceCheckpointConfig	Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start.	Optional: {}
`topologyConstraint` TopologyConstraint	TopologyConstraint for this service. packDomain is required. When both this and spec.topologyConstraint.packDomain are set, packDomain must be narrower than or equal to the spec-level packDomain.	Optional: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService configures the GPU Memory Service (GMS) sidecar. When enabled, a GMS sidecar is injected and GPU access is managed via DRA.	Optional: {}
`failover` FailoverSpec	Failover configures GMS (GPU Memory Service) failover for this service. For intraPod mode: the main container is cloned into two engine containers (active + standby). For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.	Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment

Appears in:

DynamoComponentDeployment

Field	Description	Validation
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”)	Enum: [sglang vllm trtllm]
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery	Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.	Optional: {}
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.	Optional: {}
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.
`scalingAdapter` ScalingAdapter	ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly.	Optional: {}
`eppConfig` EPPConfig	EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is “epp”.	Optional: {}
`frontendSidecar` FrontendSidecarSpec	FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE)	Optional: {}
`checkpoint` ServiceCheckpointConfig	Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start.	Optional: {}
`topologyConstraint` TopologyConstraint	TopologyConstraint for this service. packDomain is required. When both this and spec.topologyConstraint.packDomain are set, packDomain must be narrower than or equal to the spec-level packDomain.	Optional: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService configures the GPU Memory Service (GMS) sidecar. When enabled, a GMS sidecar is injected and GPU access is managed via DRA.	Optional: {}
`failover` FailoverSpec	Failover configures GMS (GPU Memory Service) failover for this service. For intraPod mode: the main container is cloned into two engine containers (active + standby). For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.	Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentSpec	Spec defines the desired state for this graph deployment.
`status` DynamoGraphDeploymentStatus	Status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentExperimentalSpec

DynamoGraphDeploymentExperimentalSpec groups graph-level opt-in preview features. Component-level experimental features are represented separately on component specs.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`kvTransferPolicy` KvTransferPolicy	KvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers.		Optional: {}

DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.

Lifecycle:

Initializing → Pending: Validates spec and prepares for profiling
Pending → Profiling: Creates and runs profiling job (online or AIC)
Profiling → Ready/Deploying: Generates DGD spec after profiling completes
Deploying → Ready: When autoApply=true, monitors DGD until Ready
Ready: Terminal state when DGD is operational or spec is available
DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.

DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeploymentRequest`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentRequestSpec	Spec defines the desired state for this deployment request.
`status` DynamoGraphDeploymentRequestStatus	Status reflects the current observed state of this deployment request.

DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`model` string	Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). This is a high-level identifier for easy reference in kubectl output and logs. The controller automatically sets this value in profilingConfig.config.deployment.model.		Required: {}
`backend` string	Backend specifies the inference backend for profiling. The controller automatically sets this value in profilingConfig.config.engine.backend. Profiling runs on real GPUs or via AIC simulation to collect performance data.		Enum: [auto vllm sglang trtllm] Required: {}
`useMocker` boolean	UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of a real backend deployment. When true, the deployment uses simulated engines that don’t require GPUs, using the profiling data to simulate realistic timing behavior. Mocker is available in all backend images and useful for large-scale experiments. Profiling still runs against the real backend (specified above) to collect performance data.	false
`profilingConfig` ProfilingConfigSpec	ProfilingConfig provides the complete configuration for the profiling job. Note: GPU discovery is automatically attempted to detect GPU resources from Kubernetes cluster nodes. If the operator has node read permissions (cluster-wide or explicitly granted), discovered GPU configuration is used as defaults when hardware configuration is not manually specified (minNumGpusPerEngine, maxNumGpusPerEngine, numGpusPerNode). User-specified values always take precedence over auto-discovered values. If GPU discovery fails (e.g., namespace-restricted operator without node permissions), manual hardware config is required. This configuration is passed directly to the profiler. The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema). Note: deployment.model and engine.backend are automatically set from the high-level modelName and backend fields and should not be specified in this config.		Required: {}
`enableGpuDiscovery` boolean	EnableGPUDiscovery controls whether the operator attempts to discover GPU hardware from cluster nodes. DEPRECATED: This field is deprecated and will be removed in v1beta1. GPU discovery is now always attempted automatically. Setting this field has no effect - the operator will always try to discover GPU hardware when node read permissions are available. If discovery is unavailable (e.g., namespace-scoped operator without permissions), manual hardware configuration is required regardless of this setting.	true	Optional: {}
`autoApply` boolean	AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, only the spec is generated and stored in status. Users can then manually create a DGD using the generated spec.	false
`deploymentOverrides` DeploymentOverridesSpec	DeploymentOverrides allows customizing metadata for the auto-created DGD. Only applicable when AutoApply is true.		Optional: {}

DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`state` DGDRState	State is a high-level textual status of the deployment request lifecycle.	Initializing	Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]
`backend` string	Backend is extracted from profilingConfig.config.engine.backend for display purposes. This field is populated by the controller and shown in kubectl output.		Optional: {}
`observedGeneration` integer	ObservedGeneration reflects the generation of the most recently observed spec. Used to detect spec changes and enforce immutability after profiling starts.
`conditions` Condition array	Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady. Conditions are merged by type on patch updates.
`profilingResults` string	ProfilingResults contains a reference to the ConfigMap holding profiling data. Format: “configmap/<name>“		Optional: {}
`generatedDeployment` RawExtension	GeneratedDeployment contains the full generated DynamoGraphDeployment specification including metadata, based on profiling results. Users can extract this to create a DGD manually, or it’s used automatically when autoApply is true. Stored as RawExtension to preserve all fields including metadata. For mocker backends, this contains the mocker DGD spec.		EmbeddedResource: {} Optional: {}
`deployment` DeploymentStatus	Deployment tracks the auto-created DGD when AutoApply is true. Contains name, namespace, state, and creation status of the managed DGD.		Optional: {}

DynamoGraphDeploymentScalingAdapter

DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.

The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeploymentScalingAdapter`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentScalingAdapterSpec
`status` DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Default	Validation
`replicas` integer	Replicas is the desired number of replicas for the target service. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.		Minimum: 0 Required: {}
`dgdRef` DynamoGraphDeploymentServiceRef	DGDRef references the DynamoGraphDeployment and the specific service to scale.		Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Validation
`replicas` integer	Replicas is the current number of replicas for the target service. This is synced from the DGD’s service replicas and is required for the scale subresource.	Optional: {}
`selector` string	Selector is a label selector string for the pods managed by this adapter. Required for HPA compatibility via the scale subresource.	Optional: {}
`lastScaleTime` Time	LastScaleTime is the last time the adapter scaled the target service.	Optional: {}

DynamoGraphDeploymentServiceRef

DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment

Appears in:

DynamoGraphDeploymentScalingAdapterSpec

Field	Description	Default	Validation
`name` string	Name of the DynamoGraphDeployment		MinLength: 1 Required: {}
`serviceName` string	ServiceName is the key name of the service within the DGD’s spec.services map to scale		MinLength: 1 Required: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Validation
`annotations` object (keys:string, values:string)	Annotations to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Service-level annotations take precedence over these values.	Optional: {}
`labels` object (keys:string, values:string)	Labels to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Service-level labels take precedence over these values.	Optional: {}
`pvcs` PVC array	PVCs defines a list of persistent volume claims that can be referenced by components. Each PVC must have a unique name that can be referenced in component specifications.	MaxItems: 100 Optional: {}
`services` object (keys:string, values:DynamoComponentDeploymentSharedSpec)	Services are the services to deploy as part of this deployment.	MaxProperties: 25 Optional: {}
`envs` EnvVar array	Envs are environment variables applied to all services in the deployment unless overridden by service-specific configuration.	Optional: {}
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”).	Enum: [sglang vllm trtllm]
`restart` Restart	Restart specifies the restart policy for the graph deployment.	Optional: {}
`topologyConstraint` SpecTopologyConstraint	TopologyConstraint is the deployment-level topology constraint. When set, topologyProfile is required and names the ClusterTopology CR to use. packDomain is optional here — it can be omitted when only services carry constraints. Services without their own topologyConstraint inherit from this value.	Optional: {}
`experimental` DynamoGraphDeploymentExperimentalSpec	Experimental groups graph-level preview features whose API shape and behavior may change in breaking ways between releases.	Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Default	Validation
`observedGeneration` integer	ObservedGeneration is the most recent generation observed by the controller.		Optional: {}
`state` DGDState	State is a high-level textual status of the graph deployment lifecycle.	initializing	Enum: [initializing pending successful failed]
`conditions` Condition array	Conditions contains the latest observed conditions of the graph deployment. The slice is merged by type on patch updates.
`services` object (keys:string, values:ServiceReplicaStatus)	Services contains per-service replica status information. The map key is the service name from spec.services.		Optional: {}
`restart` RestartStatus	Restart contains the status of the restart of the graph deployment.		Optional: {}
`checkpoints` object (keys:string, values:ServiceCheckpointStatus)	Checkpoints contains per-service checkpoint status information. The map key is the service name from spec.services.		Optional: {}
`rollingUpdate` RollingUpdateStatus	RollingUpdate tracks the progress of operator manged rolling updates. Currently only supported for singl-node, non-Grove deployments (DCD/Deployment).		Optional: {}

DynamoModel

DynamoModel is the Schema for the dynamo models API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoModel`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoModelSpec
`status` DynamoModelStatus

DynamoModelSpec

DynamoModelSpec defines the desired state of DynamoModel

Appears in:

DynamoModel

Field	Description	Default	Validation
`modelName` string	ModelName is the full model identifier (e.g., “meta-llama/Llama-3.3-70B-Instruct-lora”)		Required: {}
`baseModelName` string	BaseModelName is the base model identifier that matches the service label This is used to discover endpoints via headless services		Required: {}
`modelType` string	ModelType specifies the type of model (e.g., “base”, “lora”, “adapter”)	base	Enum: [base lora adapter] Optional: {}
`source` ModelSource	Source specifies the model source location (only applicable for lora model type)		Optional: {}

DynamoModelStatus

DynamoModelStatus defines the observed state of DynamoModel

Appears in:

DynamoModel

Field	Description	Validation
`endpoints` EndpointInfo array	Endpoints is the current list of all endpoints for this model	Optional: {}
`readyEndpoints` integer	ReadyEndpoints is the count of endpoints that are ready
`totalEndpoints` integer	TotalEndpoints is the total count of endpoints
`conditions` Condition array	Conditions represents the latest available observations of the model’s state	Optional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.

Appears in:

Field	Description	Default	Validation
`configMapRef` ConfigMapKeySelector	ConfigMapRef references a user-provided ConfigMap containing EPP configuration. The ConfigMap should contain EndpointPickerConfig YAML. Mutually exclusive with Config.		Optional: {}
`config` EndpointPickerConfig	Config allows specifying EPP EndpointPickerConfig directly as a structured object. The operator will marshal this to YAML and create a ConfigMap automatically. Mutually exclusive with ConfigMapRef. One of ConfigMapRef or Config must be specified (no default configuration). Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension		Type: object Optional: {}

EndpointInfo

EndpointInfo represents a single endpoint (pod) serving the model

Appears in:

DynamoModelStatus

Field	Description	Validation
`address` string	Address is the full address of the endpoint (e.g., “http://10.0.1.5:9090”)
`podName` string	PodName is the name of the pod serving this endpoint	Optional: {}
`ready` boolean	Ready indicates whether the endpoint is ready to serve traffic For LoRA models: true if the POST /loras request succeeded with a 2xx status code For base models: always false (no probing performed)

ExtraPodMetadata

Appears in:

Field	Description	Default	Validation
`annotations` object (keys:string, values:string)
`labels` object (keys:string, values:string)

ExtraPodSpec

Appears in:

Field	Description	Default	Validation
`mainContainer` Container

FailoverSpec

FailoverSpec configures active-passive failover for a worker component. For intraPod mode: requires gpuMemoryService.enabled; the main container is cloned into engine containers (active + standby) within the same pod. For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled activates failover mode.
`mode` GPUMemoryServiceMode	Mode selects the failover deployment topology. intraPod: engine containers run within the same pod (requires gpuMemoryService.enabled). interPod: a dedicated GMS weight server pod + engine pods per rank (requires Grove).	intraPod	Enum: [intraPod interPod] Optional: {}
`numShadows` integer	NumShadows is the number of shadow (standby) engine pods per rank. Total engine pods per rank = NumShadows + 1 (1 primary + NumShadows shadows). NumShadows is only meaningful for mode=interPod; intraPod uses a fixed 1 primary + 1 shadow sidecar layout and any value other than 1 is rejected at admission time.	1	Minimum: 1 Optional: {}

FrontendSidecarSpec

FrontendSidecarSpec configures the auto-generated frontend sidecar container. The operator uses these fields together with built-in frontend defaults (command, probes, ports, and Dynamo env vars) to produce a fully configured sidecar container.

Appears in:

Field	Description	Validation
`image` string	Image is the container image for the frontend sidecar.	Required: {}
`args` string array	Args overrides the default frontend arguments. When specified, these replace the default [“-m”, “dynamo.frontend”] entirely. For example, [“-m”, “dynamo.frontend”, “—router-mode”, “direct”] for GAIE deployments.	Optional: {}
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the frontend sidecar container.	Optional: {}
`envs` EnvVar array	Envs defines additional environment variables for the frontend sidecar. These are merged with (and can override) the auto-generated Dynamo env vars.	Optional: {}

GMSClientPodSpec

GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.

Appears in:

GPUMemoryServiceSpec

Field	Description	Default	Validation
`name` string	Name identifies this client pod.		MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`podTemplate` PodTemplateSpec	PodTemplate configures the pod to run as a GMS client.		Schemaless: {} Type: object

GPUMemoryServiceMode

Underlying type: string

GPUMemoryServiceMode selects the GMS deployment topology.

Appears in:

Field	Description
`intraPod`	GMSModeIntraPod runs GMS as a sidecar within the same pod.
`interPod`	GMSModeInterPod runs GMS as a separate weight server pod and one or more engine pods per rank, sharing GPUs via DRA ResourceClaims and a shared hostPath volume for UDS sockets. Extra client pod rendering is reserved for a follow-up change.

GPUMemoryServiceSpec

GPUMemoryServiceSpec configures the GPU Memory Service (GMS) for a worker component.

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled activates GMS wiring. GPU resources on client containers are replaced with a DRA ResourceClaim for shared GPU access.
`mode` GPUMemoryServiceMode	Mode selects the GMS deployment topology.	intraPod	Enum: [intraPod interPod] Optional: {}
`deviceClassName` string	DeviceClassName is the DRA DeviceClass to request GPUs from.	gpu.nvidia.com	Optional: {}
`extraClientContainers` string array	ExtraClientContainers lists additional user-declared containers that should be wired as GMS clients in pods rendered from the enclosing spec. DGD/DCD services apply this to service pods; DynamoCheckpoint applies this to checkpoint Job pods. In each rendered pod, only matching container names are wired; absent names are ignored.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`extraClientPods` GMSClientPodSpec array	ExtraClientPods declares additional GMS client pods for inter-pod GMS. This field is reserved for future use and is rejected until inter-pod client orchestration is wired.		Optional: {}

IngressSpec

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled exposes the component through an ingress or virtual service when true.
`host` string	Host is the base host name to route external traffic to this component.
`useVirtualService` boolean	UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress.
`virtualServiceGateway` string	VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to.
`hostPrefix` string	HostPrefix is an optional prefix added before the host.
`annotations` object (keys:string, values:string)	Annotations to set on the generated Ingress/VirtualService resources.
`labels` object (keys:string, values:string)	Labels to set on the generated Ingress/VirtualService resources.
`tls` IngressTLSSpec	TLS holds the TLS configuration used by the Ingress/VirtualService.
`hostSuffix` string	HostSuffix is an optional suffix appended after the host.
`ingressControllerClassName` string	IngressControllerClassName selects the ingress controller class (e.g., “nginx”).

IngressTLSSpec

Appears in:

IngressSpec

Field	Description	Default	Validation
`secretName` string	SecretName is the name of a Kubernetes Secret containing the TLS certificate and key.

KvTransferEnforcement

Underlying type: string

KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.

Validation:

Enum: [required preferred]

Appears in:

KvTransferPolicy

Field	Description
`required`	KvTransferEnforcementRequired enforces same-domain decode worker selection.
`preferred`	KvTransferEnforcementPreferred biases decode worker selection toward the same domain.

KvTransferPolicy

KvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers. This graph-wide policy lives under spec.experimental while the API is incubating.

Appears in:

DynamoGraphDeploymentExperimentalSpec

Field	Description	Default	Validation
`labelKey` string	LabelKey is a Kubernetes node label key (e.g. ”topology.kubernetes.io/zone”) whose value identifies the topology domain for each worker. The operator copies the node label onto worker pods so the runtime can publish it as worker metadata. The label should correspond to the topology level named in `domain`.		MaxLength: 317 MinLength: 1 Pattern: `^(([a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)(\.[a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)*/)?([A-Za-z0-9]([-A-Za-z0-9_.]\{0,61\}[A-Za-z0-9])?)$` Optional: {}
`domain` TopologyDomain	Domain is the logical name for the topology level to enforce (e.g. “zone”, “rack”). The router uses this to match workers that share the same value for the label identified by `labelKey`.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
`enforcement` KvTransferEnforcement	Enforcement controls how the selected prefill worker’s topology is applied to decode routing. “required” only allows decode workers in the same topology domain as the selected prefill worker. “preferred” keeps all decode workers eligible, but biases selection toward workers in the same topology domain. Defaults to “required”.	required	Enum: [required preferred] Optional: {}
`preferredWeight` float	PreferredWeight is required and used only when enforcement is ”preferred”. Higher values create a stronger same-domain routing preference, but do not guarantee same-domain selection. The value is not a probability; worker selection still depends on load and other routing inputs. A value of 0 disables the topology preference; 1 is the strongest supported preference.		Maximum: 1 Minimum: 0 Optional: {}

ModelReference

ModelReference identifies a model served by this component

Appears in:

Field	Description	Default	Validation
`name` string	Name is the base model identifier (e.g., “llama-3-70b-instruct-v1”)		Required: {}
`revision` string	Revision is the model revision/version (optional)		Optional: {}

ModelSource

ModelSource defines the source location of a model

Appears in:

DynamoModelSpec

Field	Description	Default	Validation
`uri` string	URI is the model source URI Supported formats: - S3: s3://bucket/path/to/model - HuggingFace: hf://org/model@revision_sha		Required: {}

MultinodeSpec

Appears in:

Field	Description	Default	Validation
`nodeCount` integer	Indicates the number of nodes to deploy for multinode components. Total number of GPUs is NumberOfNodes * GPU limit. Must be greater than 1.	2	Minimum: 2

PVC

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Validation
`create` boolean	Create indicates to create a new PVC
`name` string	Name is the name of the PVC	Required: {}
`storageClass` string	StorageClass to be used for PVC creation. Required when create is true.
`size` Quantity	Size of the volume in Gi, used during PVC creation. Required when create is true.
`volumeAccessMode` PersistentVolumeAccessMode	VolumeAccessMode is the volume access mode of the PVC. Required when create is true.

ProfilingConfigSpec

ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`config` JSON	Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler. The profiler will validate the configuration and report any errors.	Optional: {} Type: object
`configMapRef` ConfigMapKeySelector	ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment base config file (disagg.yaml). This is separate from the profiling config above. The path to this config will be set as engine.config in the profiling config.	Optional: {}
`profilerImage` string	ProfilerImage specifies the container image to use for profiling jobs. This image contains the profiler code and dependencies needed for SLA-based profiling. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1”	Required: {}
`outputPVC` string	OutputPVC is an optional PersistentVolumeClaim name for storing profiling output. If specified, all profiling artifacts (logs, plots, configs, raw data) will be written to this PVC instead of an ephemeral emptyDir volume. This allows users to access complete profiling results after the job completes by mounting the PVC. The PVC must exist in the same namespace as the DGDR. If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps. Note: ConfigMaps are still created regardless of this setting for planner integration.	Optional: {}
`resources` ResourceRequirements	Resources specifies the compute resource requirements for the profiling job container. If not specified, no resource requests or limits are set.	Optional: {}
`tolerations` Toleration array	Tolerations allows the profiling job to be scheduled on nodes with matching taints. For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint.	Optional: {}
`nodeSelector` object (keys:string, values:string)	NodeSelector is a selector which must match a node’s labels for the profiling pod to be scheduled on that node. For example, to schedule on ARM64 nodes, use {“kubernetes.io/arch”: “arm64”}.	Optional: {}

ResourceItem

Appears in:

Resources

Field	Description	Default	Validation
`cpu` string	CPU specifies the CPU resource request/limit (e.g., “1000m”, “2”)
`memory` string	Memory specifies the memory resource request/limit (e.g., “4Gi”, “8Gi”)
`gpu` string	GPU indicates the number of GPUs to request. Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment.
`gpuType` string	GPUType can specify a custom GPU type, e.g. “gpu.intel.com/xe” By default if not specified, the GPU type is “nvidia.com/gpu”
`custom` object (keys:string, values:string)	Custom specifies additional custom resource requests/limits

Resources

Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.

Appears in:

Field	Description	Default	Validation
`requests` ResourceItem	Requests specifies the minimum resources required by the component
`limits` ResourceItem	Limits specifies the maximum resources allowed for the component
`claims` ResourceClaim array	Claims specifies resource claims for dynamic resource allocation

Restart

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`id` string	ID is an arbitrary string that triggers a restart when changed. Any modification to this value will initiate a restart of the graph deployment according to the strategy.		MinLength: 1 Required: {}
`strategy` RestartStrategy	Strategy specifies the restart strategy for the graph deployment.		Optional: {}

RestartPhase

Underlying type: string

Appears in:

RestartStatus

Field	Description
`Pending`
`Restarting`
`Completed`
`Failed`
`Superseded`

RestartStatus

RestartStatus contains the status of the restart of the graph deployment.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`observedID` string	ObservedID is the restart ID that has been observed and is being processed. Matches the Restart.ID field in the spec.
`phase` RestartPhase	Phase is the phase of the restart.
`inProgress` string array	InProgress contains the names of the services that are currently being restarted.	Optional: {}

RestartStrategy

Appears in:

Restart

Field	Description	Default	Validation
`type` RestartStrategyType	Type specifies the restart strategy type.	Sequential	Enum: [Sequential Parallel]
`order` string array	Order specifies the order in which the services should be restarted.		Optional: {}

RestartStrategyType

Underlying type: string

Appears in:

RestartStrategy

Field	Description
`Sequential`
`Parallel`

RollingUpdatePhase

Underlying type: string

RollingUpdatePhase represents the current phase of a rolling update.

Validation:

Enum: [Pending InProgress Completed Failed ]

Appears in:

RollingUpdateStatus

Field	Description
`Pending`
`InProgress`
`Completed`
“

RollingUpdateStatus

RollingUpdateStatus tracks the progress of a rolling update.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`phase` RollingUpdatePhase	Phase indicates the current phase of the rolling update.	Enum: [Pending InProgress Completed Failed ] Optional: {}
`startTime` Time	StartTime is when the rolling update began.	Optional: {}
`endTime` Time	EndTime is when the rolling update completed (successfully or failed).	Optional: {}
`updatedServices` string array	UpdatedServices is the list of services that have completed the rolling update. A service is considered updated when its new replicas are all ready and old replicas are fully scaled down. Only services of componentType Worker (or Prefill/Decode) are considered.	Optional: {}

ScalingAdapter

ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether the ScalingAdapter should be enabled for this service. When true, a DGDSA is created and owns the replicas field. When false (default), no DGDSA is created and replicas can be modified directly in the DGD.	false	Optional: {}

ServiceCheckpointConfig

ServiceCheckpointConfig configures checkpointing for a DGD service

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether checkpointing is enabled for this service	false	Optional: {}
`mode` CheckpointMode	Mode defines how checkpoint creation is handled - Auto: DGD controller creates Checkpoint CR automatically - Manual: User must create Checkpoint CR	Auto	Enum: [Auto Manual] Optional: {}
`checkpointRef` string	CheckpointRef references an existing DynamoCheckpoint CR by metadata.name. If specified, this service’s Identity is ignored and the referenced checkpoint is used directly.		Optional: {}
`identity` DynamoCheckpointIdentity	Identity defines the checkpoint identity for hash computation Used when Mode is Auto or when looking up existing checkpoints Required when checkpointRef is not specified		Optional: {}
`targetContainerName` string	TargetContainerName is the workload container to snapshot and restore.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`job` ServiceCheckpointJobConfig	Job customizes the checkpoint Job that is created in Auto mode.		Optional: {}

ServiceCheckpointJobConfig

ServiceCheckpointJobConfig customizes the checkpoint Job created for a DGD service.

Appears in:

ServiceCheckpointConfig

Field	Description	Default	Validation
`gmsClientContainers` string array	GMSClientContainers lists checkpoint Job containers that should receive GMS client wiring. Requires gpuMemoryService on the service.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`podTemplate` PodTemplateSpec	PodTemplate customizes the checkpoint Job pod. The operator starts from the selected workload container and merges this template so users can add helper containers such as gms-saver.		Schemaless: {} Type: object Optional: {}

ServiceCheckpointStatus

ServiceCheckpointStatus contains checkpoint information for a single service.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`checkpointName` string	CheckpointName is the name of the associated Checkpoint CR	Optional: {}
`identityHash` string	IdentityHash is the computed hash of the checkpoint identity	Optional: {}
`ready` boolean	Ready indicates if the checkpoint was visible to the worker at startup	Optional: {}

ServiceReplicaStatus

ServiceReplicaStatus contains replica information for a single service.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`componentKind` ComponentKind	ComponentKind is the underlying resource kind (e.g., “PodClique”, “PodCliqueScalingGroup”, “Deployment”, “LeaderWorkerSet”).	Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
`componentName` string	ComponentName is the name of the primary underlying resource. DEPRECATED: Use ComponentNames instead. This field will be removed in a future release. During rolling updates, this reflects the new (target) component name.
`componentNames` string array	ComponentNames is the list of underlying resource names for this service. During normal operation, this contains a single name. During rolling updates, this contains both old and new component names.	Optional: {}
`replicas` integer	Replicas is the total number of non-terminated replicas. Required for all component kinds.	Minimum: 0
`updatedReplicas` integer	UpdatedReplicas is the number of replicas at the current/desired revision. Required for all component kinds.	Minimum: 0
`readyReplicas` integer	ReadyReplicas is the number of ready replicas. Populated for PodClique, Deployment, and LeaderWorkerSet. Not available for PodCliqueScalingGroup. When nil, the field is omitted from the API response.	Minimum: 0 Optional: {}
`availableReplicas` integer	AvailableReplicas is the number of available replicas. For Deployment: replicas ready for >= minReadySeconds. For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods. Not available for PodClique or LeaderWorkerSet. When nil, the field is omitted from the API response.	Minimum: 0 Optional: {}

SharedMemorySpec

Appears in:

Field	Description	Default	Validation
`disabled` boolean	Disabled, when true, opts out of mounting a shared-memory medium for the component. When false (or unset), shared memory is enabled and Size is required (enforced by the validating webhook). Size is ignored when Disabled is true.
`size` Quantity

SpecTopologyConstraint

SpecTopologyConstraint defines deployment-level topology placement requirements. It carries both the topology profile (which ClusterTopology CR to use) and an optional default pack domain that services without their own constraint inherit.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`topologyProfile` string	TopologyProfile is the name of the ClusterTopology CR that defines the topology hierarchy for this deployment.		MinLength: 1
`packDomain` TopologyDomain	PackDomain is the default topology domain to pack pods within. Optional — omit when only services carry constraints.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` Optional: {}

TopologyConstraint

TopologyConstraint defines service-level topology placement requirements. The topology profile is inherited from the deployment-level SpecTopologyConstraint; only the pack domain is specified here.

Appears in:

Field	Description	Default	Validation
`packDomain` TopologyDomain	PackDomain is the topology domain to pack pods within. Must match a domain defined in the referenced ClusterTopology CR.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`

TopologyDomain

Underlying type: string

TopologyDomain is a free-form topology level identifier. Common examples: “region”, “zone”, “datacenter”, “block”, “rack”, “host”, “numa”. When used with a ClusterTopology CR, domain names are defined in the CR’s hierarchy; when used with spec.experimental.kvTransferPolicy.labelKey alone, the value is a user-chosen logical name for the topology level. Must match ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$ (lowercase alphanumeric, may contain hyphens but must not start or end with one).

Validation:

Pattern: ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$

Appears in:

VolumeMount

VolumeMount references a PVC defined at the top level for volumes to be mounted by the component

Appears in:

Field	Description	Default	Validation
`name` string	Name references a PVC name defined in the top-level PVCs map		Required: {}
`mountPoint` string	MountPoint specifies where to mount the volume. If useAsCompilationCache is true and mountPoint is not specified, a backend-specific default will be used.
`useAsCompilationCache` boolean	UseAsCompilationCache indicates this volume should be used as a compilation cache. When true, backend-specific environment variables will be set and default mount points may be used.	false

nvidia.com/v1beta1

Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.

Resource Types

BackendType

Underlying type: string

BackendType specifies the inference backend.

Validation:

Enum: [auto sglang trtllm vllm]

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description
`auto`
`sglang`
`trtllm`
`vllm`

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled.

Validation:

Enum: [Auto Manual]

Appears in:

ComponentCheckpointConfig

Field	Description
`Auto`	CheckpointModeAuto means the DGD controller creates the DynamoCheckpoint CR automatically.
`Manual`	CheckpointModeManual means the user creates the DynamoCheckpoint CR themselves.

CompilationCacheConfig

CompilationCacheConfig configures a PVC-backed compilation cache for a component. The operator handles backend-specific mount paths and environment variables so users do not need to hand-wire them into the pod template.

Appears in:

Field	Description	Default	Validation
`pvcName` string	pvcName references a user-created PVC by name. The PVC must exist in the same namespace as the DynamoGraphDeployment.		MinLength: 1 Required: {}
`mountPath` string	mountPath overrides the backend-specific default mount path. When empty, the operator selects a default appropriate for the backend framework.		Optional: {}

ComponentCheckpointConfig

ComponentCheckpointConfig configures checkpointing for a DGD component.

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` CheckpointMode	mode defines how checkpoint creation is handled. `Auto`: DGD controller creates the DynamoCheckpoint CR automatically. `Manual`: user must create the DynamoCheckpoint CR.	Auto	Enum: [Auto Manual] Optional: {}
`checkpointRef` string	checkpointRef references an existing DynamoCheckpoint CR by `metadata.name`. When set, this component’s `identity` is ignored and the referenced checkpoint is used directly.		Optional: {}
`identity` DynamoCheckpointIdentity	identity defines the checkpoint identity for hash computation. Used when `mode` is `Auto` or when looking up existing checkpoints. Required when `checkpointRef` is not specified.		Optional: {}
`targetContainerName` string	targetContainerName is the workload container to snapshot and restore.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`job` ComponentCheckpointJobConfig	job customizes the checkpoint Job that is created in Auto mode.		Optional: {}

ComponentCheckpointJobConfig

ComponentCheckpointJobConfig customizes the checkpoint Job created for a DGD component.

Appears in:

ComponentCheckpointConfig

Field	Description	Default	Validation
`gmsClientContainers` string array	gmsClientContainers lists checkpoint Job containers that should receive GMS client wiring. Requires gpuMemoryService on the component.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`podTemplate` PodTemplateSpec	podTemplate customizes the checkpoint Job pod. The operator starts from the selected workload container and merges this template so users can add helper containers such as gms-saver.		Schemaless: {} Type: object Optional: {}

ComponentCheckpointStatus

ComponentCheckpointStatus contains checkpoint information for a single component.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`checkpointName` string	checkpointName is the name of the associated DynamoCheckpoint CR.	Optional: {}
`identityHash` string	identityHash is the computed hash of the checkpoint identity.	Optional: {}
`ready` boolean	ready indicates if the checkpoint was visible to the worker at startup.	Optional: {}

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource backing a DGD component.

Validation:

Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

ComponentReplicaStatus

Field	Description
`PodClique`
`PodCliqueScalingGroup`
`Deployment`
`LeaderWorkerSet`

ComponentReplicaStatus

ComponentReplicaStatus contains replica information for a single component.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`componentKind` ComponentKind	componentKind is the underlying resource kind (e.g. `PodClique`, `Deployment`, `LeaderWorkerSet`).	Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
`componentNames` string array	componentNames is the list of underlying Kubernetes resource names for this Dynamo component. During normal operation this contains a single name; during rolling updates it contains both old and new resource names.	Optional: {}
`replicas` integer	replicas is the total number of non-terminated replicas.	Minimum: 0
`updatedReplicas` integer	updatedReplicas is the number of replicas at the current/desired revision.	Minimum: 0
`readyReplicas` integer	readyReplicas is the number of ready replicas. Populated for `PodClique`, `Deployment`, and `LeaderWorkerSet`; not available for `PodCliqueScalingGroup`.	Minimum: 0 Optional: {}
`availableReplicas` integer	availableReplicas is the number of available replicas. Populated for `Deployment` and `PodCliqueScalingGroup`; not available for `PodClique` or `LeaderWorkerSet`.	Minimum: 0 Optional: {}

ComponentType

Underlying type: string

ComponentType identifies the role of a Dynamo component within a graph. In v1beta1 this is a strict enum. Unlike v1alpha1 (where subComponentType was used as a workaround for disaggregated serving), prefill and decode are first-class values: users can set them directly and downstream consumers (e.g., the EPP) can filter on the pod label nvidia.com/dynamo-component-type.

Validation:

Enum: [frontend worker prefill decode planner epp]

Appears in:

Field	Description
`frontend`
`worker`
`prefill`
`decode`
`planner`
`epp`

DGDRPhase

Underlying type: string

DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.

Validation:

Enum: [Pending Profiling Ready Deploying Deployed Failed]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Pending`
`Profiling`
`Ready`
`Deploying`
`Deployed`
`Failed`

DGDState

Underlying type: string

DGDState is the high-level lifecycle state of a DynamoGraphDeployment.

Validation:

Enum: [initializing pending successful failed]

Appears in:

DynamoGraphDeploymentStatus

Field	Description
`initializing`
`pending`
`successful`
`failed`

DeploymentInfoStatus

DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`replicas` integer	Replicas is the desired number of replicas.		Optional: {}
`availableReplicas` integer	AvailableReplicas is the number of replicas that are available and ready.		Optional: {}

DynamoCheckpointIdentity

DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence. Two checkpoints with the same identity hash are considered equivalent. Duplicated from v1alpha1 to keep the v1beta1 type graph self-contained. The DynamoCheckpoint resource itself is not graduating in this MR; this type is only used as a sub-field of ComponentCheckpointConfig.

Appears in:

ComponentCheckpointConfig

Field	Description	Default	Validation
`model` string	model is the model identifier (e.g. “meta-llama/Llama-3-70B”).		MinLength: 1 Required: {}
`backendFramework` string	backendFramework is the runtime framework (`vllm`, `sglang`, `trtllm`).		Enum: [vllm sglang trtllm] Required: {}
`dynamoVersion` string	dynamoVersion is the Dynamo platform version. If empty, the version is not included in the identity hash, so checkpoints remain compatible across releases.		Optional: {}
`tensorParallelSize` integer	tensorParallelSize is the tensor parallel configuration.	1	Minimum: 1 Optional: {}
`pipelineParallelSize` integer	pipelineParallelSize is the pipeline parallel configuration.	1	Minimum: 1 Optional: {}
`dtype` string	dtype is the data type (`fp16`, `bf16`, `fp8`, etc.).		Optional: {}
`maxModelLen` integer	maxModelLen is the maximum sequence length.		Minimum: 1 Optional: {}
`extraParameters` object (keys:string, values:string)	extraParameters are additional parameters that affect the checkpoint hash.		Optional: {}

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API.

v1beta1 is a served version: the API server accepts reads and writes against it, and transparently converts to/from v1alpha1 (still the storage version until a later MR flips it). Conversion goes through the operator’s conversion webhook; see api/v1alpha1/*_conversion.go.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoComponentDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoComponentDeploymentSpec	spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

DynamoComponentDeploymentSharedSpec is the shared configuration used by both standalone DCDs and by the components embedded in a DynamoGraphDeployment.

In v1beta1 the ten per-component pod-configuration fields that existed in v1alpha1 (resources, envs, envFromSecret, livenessProbe, readinessProbe, volumeMounts, annotations, labels, extraPodMetadata, extraPodSpec) are replaced with a single podTemplate field holding a native corev1.PodTemplateSpec. The operator injects its defaults into the container named "main" and merges user overrides using strategic-merge-by-name semantics. Users can add sidecars, init containers, and pod-level configuration directly in podTemplate without any extraPodSpec-style escape hatch.

Appears in:

Field	Description	Validation
`name` string	name is the stable logical identifier for this component within its DynamoGraphDeployment. It must be unique within the parent’s `spec.components` list. For standalone DynamoComponentDeployment objects, the defaulting webhook populates `name` from `metadata.name` on admission, so users typically do not need to set it explicitly. `name` is decoupled from the underlying Kubernetes resource name so that the operator can rename child workloads (e.g. suffixing worker DCDs with a hash during rolling updates) without losing the stable identity that downstream consumers (labels, status maps, DGDSA references, planner RBAC, EPP filters) depend on.	MaxLength: 63 MinLength: 1 Pattern: `^[A-Za-z0-9]([-A-Za-z0-9]*[A-Za-z0-9])?$` Required: {}
`type` ComponentType	type indicates the role of this component within a Dynamo graph. Drives port mapping, frontend detection, planner RBAC, and the pod label `nvidia.com/dynamo-component-type`. Because `prefill` and `decode` are first-class values, users can set them directly.	Enum: [frontend worker prefill decode planner epp] Optional: {}
`globalDynamoNamespace` boolean	globalDynamoNamespace places the component in the global Dynamo namespace rather than the per-deployment namespace derived from the DGD name.	Optional: {}
`podTemplate` PodTemplateSpec	podTemplate is the pod template used to create the component’s pods. The operator injects its defaults (image, command, env, ports, probes, resources, volume mounts) into the container named `"main"` inside `podTemplate.spec.containers`, merging user overrides by name. If no container named `"main"` is present, the operator auto-generates it with standard defaults. All other containers in `podTemplate.spec.containers` are treated as user-managed sidecars: the operator does not inject defaults into them, so sidecars must specify required fields (e.g. `image`) themselves. The validation webhook rejects pod templates where a non-`"main"` container is missing a required field such as `image`.	Optional: {}
`replicas` integer	replicas is the desired number of Pods for this component. When `scalingAdapter` is set on this component, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0 Optional: {}
`multinode` MultinodeSpec	multinode configures multinode components.	Optional: {}
`sharedMemorySize` Quantity	sharedMemorySize controls the size of the tmpfs mounted at `/dev/shm`. `nil` selects the operator default (8Gi), a positive quantity sets a custom size, and `"0"` disables the shared-memory volume entirely. Simpler replacement for v1alpha1’s `SharedMemorySpec` struct with its `disabled bool` + `size Quantity` pattern.	Optional: {}
`modelRef` ModelReference	modelRef references a model served by this component. When specified, a headless service is created for endpoint discovery.	Optional: {}
`scalingAdapter` ScalingAdapter	scalingAdapter opts this component into using the DynamoGraphDeploymentScalingAdapter. When set (even as an empty object, `scalingAdapter: \{\}`), a DGDSA is created and owns the `replicas` field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via the Scale subresource. Omit the field to opt out.	Optional: {}
`eppConfig` EPPConfig	eppConfig holds EPP-specific configuration for Endpoint Picker Plugin components. Only meaningful when `type` is `epp`.	Optional: {}
`frontendSidecar` string	frontendSidecar optionally designates a container in `podTemplate.spec.containers` as the frontend sidecar. The value must match the `name` of a container in that list; the operator merges its frontend-sidecar defaults (auto-generated Dynamo env vars, ports, health probes) into that container the same way it merges into `"main"`. The full container definition (image, args, envFrom, env) lives in `podTemplate` — this eliminates the redundant `image`, `args`, `envFromSecret`, and `envs` fields from v1alpha1’s `FrontendSidecarSpec`. The validation webhook rejects values that do not match any container name in `podTemplate.spec.containers`.	Optional: {}
`compilationCache` CompilationCacheConfig	compilationCache configures a PVC-backed compilation cache. The operator handles backend-specific mount paths and environment variables, so users do not need to hand-wire them into `podTemplate`. Extracted from v1alpha1’s `volumeMount.useAsCompilationCache` flag.	Optional: {}
`topologyConstraint` TopologyConstraint	topologyConstraint applies to this component. `topologyConstraint.packDomain` is required. When both this and `spec.topologyConstraint.packDomain` are set, this field’s `packDomain` must be narrower than or equal to the spec-level value.	Optional: {}
`experimental` ExperimentalSpec	experimental groups opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases, including disappearing without a name-preserving graduation path. In v1beta1 this block holds `gpuMemoryService` and `failover` (which remain tightly coupled — failover requires GMS — and are expected to evolve together as the DRA-based GPU sharing story matures), and `checkpoint` (whose interaction with the standalone DynamoCheckpoint resource and identity-hash computation is still settling). Fields here are explicitly NOT covered by the normal v1beta1 deprecation policy; do not depend on them for production workloads.	Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of a DynamoComponentDeployment.

Appears in:

DynamoComponentDeployment

Field	Description	Validation
`backendFramework` string	backendFramework specifies the backend framework.	Enum: [sglang vllm trtllm]
`name` string	name is the stable logical identifier for this component within its DynamoGraphDeployment. It must be unique within the parent’s `spec.components` list. For standalone DynamoComponentDeployment objects, the defaulting webhook populates `name` from `metadata.name` on admission, so users typically do not need to set it explicitly. `name` is decoupled from the underlying Kubernetes resource name so that the operator can rename child workloads (e.g. suffixing worker DCDs with a hash during rolling updates) without losing the stable identity that downstream consumers (labels, status maps, DGDSA references, planner RBAC, EPP filters) depend on.	MaxLength: 63 MinLength: 1 Pattern: `^[A-Za-z0-9]([-A-Za-z0-9]*[A-Za-z0-9])?$` Required: {}
`type` ComponentType	type indicates the role of this component within a Dynamo graph. Drives port mapping, frontend detection, planner RBAC, and the pod label `nvidia.com/dynamo-component-type`. Because `prefill` and `decode` are first-class values, users can set them directly.	Enum: [frontend worker prefill decode planner epp] Optional: {}
`globalDynamoNamespace` boolean	globalDynamoNamespace places the component in the global Dynamo namespace rather than the per-deployment namespace derived from the DGD name.	Optional: {}
`podTemplate` PodTemplateSpec	podTemplate is the pod template used to create the component’s pods. The operator injects its defaults (image, command, env, ports, probes, resources, volume mounts) into the container named `"main"` inside `podTemplate.spec.containers`, merging user overrides by name. If no container named `"main"` is present, the operator auto-generates it with standard defaults. All other containers in `podTemplate.spec.containers` are treated as user-managed sidecars: the operator does not inject defaults into them, so sidecars must specify required fields (e.g. `image`) themselves. The validation webhook rejects pod templates where a non-`"main"` container is missing a required field such as `image`.	Optional: {}
`replicas` integer	replicas is the desired number of Pods for this component. When `scalingAdapter` is set on this component, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0 Optional: {}
`multinode` MultinodeSpec	multinode configures multinode components.	Optional: {}
`sharedMemorySize` Quantity	sharedMemorySize controls the size of the tmpfs mounted at `/dev/shm`. `nil` selects the operator default (8Gi), a positive quantity sets a custom size, and `"0"` disables the shared-memory volume entirely. Simpler replacement for v1alpha1’s `SharedMemorySpec` struct with its `disabled bool` + `size Quantity` pattern.	Optional: {}
`modelRef` ModelReference	modelRef references a model served by this component. When specified, a headless service is created for endpoint discovery.	Optional: {}
`scalingAdapter` ScalingAdapter	scalingAdapter opts this component into using the DynamoGraphDeploymentScalingAdapter. When set (even as an empty object, `scalingAdapter: \{\}`), a DGDSA is created and owns the `replicas` field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via the Scale subresource. Omit the field to opt out.	Optional: {}
`eppConfig` EPPConfig	eppConfig holds EPP-specific configuration for Endpoint Picker Plugin components. Only meaningful when `type` is `epp`.	Optional: {}
`frontendSidecar` string	frontendSidecar optionally designates a container in `podTemplate.spec.containers` as the frontend sidecar. The value must match the `name` of a container in that list; the operator merges its frontend-sidecar defaults (auto-generated Dynamo env vars, ports, health probes) into that container the same way it merges into `"main"`. The full container definition (image, args, envFrom, env) lives in `podTemplate` — this eliminates the redundant `image`, `args`, `envFromSecret`, and `envs` fields from v1alpha1’s `FrontendSidecarSpec`. The validation webhook rejects values that do not match any container name in `podTemplate.spec.containers`.	Optional: {}
`compilationCache` CompilationCacheConfig	compilationCache configures a PVC-backed compilation cache. The operator handles backend-specific mount paths and environment variables, so users do not need to hand-wire them into `podTemplate`. Extracted from v1alpha1’s `volumeMount.useAsCompilationCache` flag.	Optional: {}
`topologyConstraint` TopologyConstraint	topologyConstraint applies to this component. `topologyConstraint.packDomain` is required. When both this and `spec.topologyConstraint.packDomain` are set, this field’s `packDomain` must be narrower than or equal to the spec-level value.	Optional: {}
`experimental` ExperimentalSpec	experimental groups opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases, including disappearing without a name-preserving graduation path. In v1beta1 this block holds `gpuMemoryService` and `failover` (which remain tightly coupled — failover requires GMS — and are expected to evolve together as the DRA-based GPU sharing story matures), and `checkpoint` (whose interaction with the standalone DynamoCheckpoint resource and identity-hash computation is still settling). Fields here are explicitly NOT covered by the normal v1beta1 deprecation policy; do not depend on them for production workloads.	Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentSpec	spec defines the desired state for this graph deployment.
`status` DynamoGraphDeploymentStatus	status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentComponentRef

DynamoGraphDeploymentComponentRef identifies a specific component within a DynamoGraphDeployment. Renamed from v1alpha1’s DynamoGraphDeploymentServiceRef to align with the v1beta1 services -> components and serviceName -> componentName renames.

Appears in:

DynamoGraphDeploymentScalingAdapterSpec

Field	Description	Default	Validation
`name` string	name is the `metadata.name` of the target DynamoGraphDeployment.		MinLength: 1 Required: {}
`componentName` string	componentName is the `componentName` of the entry within the target DGD’s `spec.components` list to scale.		MinLength: 1 Required: {}

DynamoGraphDeploymentExperimentalSpec

DynamoGraphDeploymentExperimentalSpec groups graph-level opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases. Component-level experimental features live under spec.components[*].experimental.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`kvTransferPolicy` KvTransferPolicy	kvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers.		Optional: {}

v1beta1 DynamoGraphDeploymentRequest

DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.

Lifecycle:

Pending: Spec validated, preparing for profiling
Profiling: Profiling job is running to discover optimal configurations
Ready: Profiling complete, generated DGD spec available in status
Deploying: DGD is being created and rolled out (when autoApply=true)
Deployed: DGD is running and healthy
Failed: An unrecoverable error occurred

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeploymentRequest`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentRequestSpec	Spec defines the desired state for this deployment request.
`status` DynamoGraphDeploymentRequestStatus	Status reflects the current observed state of this deployment request.

v1beta1 DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`model` string	Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). Can be a HuggingFace ID or a private model name.		MinLength: 1 Required: {}
`backend` BackendType	Backend specifies the inference backend to use for profiling and deployment.	auto	Enum: [auto sglang trtllm vllm] Optional: {}
`image` string	Image is the container image reference for the profiling job (planner image). Example: “nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1”. For Dynamo < 1.1.0, use dynamo-frontend.		Optional: {}
`modelCache` ModelCacheSpec	ModelCache provides optional PVC configuration for pre-downloaded model weights. When provided, weights are loaded from the PVC instead of downloading from HuggingFace.		Optional: {}
`hardware` HardwareSpec	Hardware describes the hardware resources available for profiling and deployment. Typically auto-filled by the operator from cluster discovery.		Optional: {}
`workload` WorkloadSpec	Workload defines the expected workload characteristics for SLA-based profiling.		Optional: {}
`sla` SLASpec	SLA defines service-level agreement targets that drive profiling optimization.		Optional: {}
`overrides` OverridesSpec	Overrides allows customizing the profiling job and the generated DynamoGraphDeployment.		Optional: {}
`features` FeaturesSpec	Features controls optional Dynamo platform features in the generated deployment.		Optional: {}
`searchStrategy` SearchStrategy	SearchStrategy controls the profiling search depth. ”rapid” performs a fast sweep; “thorough” explores more configurations.	rapid	Enum: [rapid thorough] Optional: {}
`autoApply` boolean	AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, the generated spec is stored in status for manual review and application.	true	Optional: {}

v1beta1 DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Validation
`phase` DGDRPhase	Phase is the high-level lifecycle phase of the deployment request.	Enum: [Pending Profiling Ready Deploying Deployed Failed] Optional: {}
`profilingPhase` ProfilingPhase	ProfilingPhase indicates the current sub-phase of the profiling pipeline. Only meaningful when Phase is “Profiling”. Cleared when profiling completes or fails.	Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] Optional: {}
`dgdName` string	DGDName is the name of the generated or created DynamoGraphDeployment.	Optional: {}
`profilingJobName` string	ProfilingJobName is the name of the Kubernetes Job running the profiler.	Optional: {}
`conditions` Condition array	Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Succeeded, Validation, Profiling, SpecGenerated, DeploymentReady.	Optional: {}
`profilingResults` ProfilingResultsStatus	ProfilingResults contains the output of the profiling process including Pareto-optimal configurations and the selected deployment configuration.	Optional: {}
`deploymentInfo` DeploymentInfoStatus	DeploymentInfo tracks the state of the deployed DynamoGraphDeployment. Populated when a DGD has been created (either via autoApply or manually).	Optional: {}
`observedGeneration` integer	ObservedGeneration is the most recent generation observed by the controller.	Optional: {}

DynamoGraphDeploymentScalingAdapter

DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual components within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.

The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s component replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.

v1alpha1 remains the storage version; conversion between served versions is handled by the operator’s conversion webhook (see api/v1alpha1/dynamographdeploymentscalingadapter_conversion.go).

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeploymentScalingAdapter`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentScalingAdapterSpec
`status` DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of a DynamoGraphDeploymentScalingAdapter.

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Default	Validation
`replicas` integer	replicas is the desired number of replicas for the target component. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.		Minimum: 0 Required: {}
`dgdRef` DynamoGraphDeploymentComponentRef	dgdRef references the DynamoGraphDeployment and the specific component to scale.		Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of a DynamoGraphDeploymentScalingAdapter.

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Validation
`replicas` integer	replicas is the current number of replicas for the target component. This is synced from the DGD’s component replicas and is required for the scale subresource.
`selector` string	selector is a label selector string for the pods managed by this adapter. Required for HPA compatibility via the scale subresource.	Optional: {}
`lastScaleTime` Time	lastScaleTime is the last time the adapter scaled the target component.	Optional: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of a DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Validation
`annotations` object (keys:string, values:string)	annotations to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Component-level (`podTemplate`) values take precedence on conflict.	Optional: {}
`labels` object (keys:string, values:string)	labels to propagate to all child resources. Same precedence rules as `annotations`.	Optional: {}
`components` DynamoComponentDeploymentSharedSpec array	components are the components deployed as part of this graph. Each entry carries its own stable logical `name`, and names must be unique within the list. Component types are generally repeatable, except `type: epp` which may appear at most once.	MaxItems: 25 Optional: {}
`env` EnvVar array	env is prepended to every component’s environment. Component-specific env entries with the same name take precedence and may reference values from this list.	Optional: {}
`backendFramework` string	backendFramework specifies the backend framework (e.g. “sglang”, “vllm”, “trtllm”).	Enum: [sglang vllm trtllm]
`restart` Restart	restart specifies the restart policy for the graph deployment.	Optional: {}
`topologyConstraint` SpecTopologyConstraint	topologyConstraint is the deployment-level topology constraint. When set, `spec.topologyConstraint.clusterTopologyName` names the ClusterTopology CR to use. `spec.topologyConstraint.packDomain` is optional at this level and can be omitted when only components carry constraints. Components without their own `topologyConstraint` inherit from this value.	Optional: {}
`experimental` DynamoGraphDeploymentExperimentalSpec	experimental groups graph-level preview features whose API shape and behavior may change in breaking ways between v1beta1 releases.	Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of a DynamoGraphDeployment. Unchanged between v1alpha1 and v1beta1.

Appears in:

DynamoGraphDeployment

Field	Description	Default	Validation
`observedGeneration` integer	observedGeneration is the most recent generation observed by the controller.		Optional: {}
`state` DGDState	state is a high-level textual status of the graph deployment lifecycle.	initializing	Enum: [initializing pending successful failed]
`conditions` Condition array	conditions contains the latest observed conditions of the graph deployment. Merged by type on patch updates.		Optional: {}
`components` object (keys:string, values:ComponentReplicaStatus)	components contains per-component replica status information, keyed by component name.		Optional: {}
`restart` RestartStatus	restart contains the status of a graph-level restart.		Optional: {}
`checkpoints` object (keys:string, values:ComponentCheckpointStatus)	checkpoints contains per-component checkpoint status, keyed by component name.		Optional: {}
`rollingUpdate` RollingUpdateStatus	rollingUpdate tracks the progress of operator-managed rolling updates. Currently only supported for single-node, non-Grove deployments (DCD/Deployment).		Optional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components.

Appears in:

Field	Description	Default	Validation
`configMapRef` ConfigMapKeySelector	configMapRef references a user-provided ConfigMap containing EPP configuration. Mutually exclusive with `config`.		Optional: {}
`config` EndpointPickerConfig	config allows specifying EPP `EndpointPickerConfig` directly as a structured object. The operator marshals this to YAML and creates a ConfigMap automatically. Mutually exclusive with `configMapRef`. One of `configMapRef` or `config` must be specified.		Type: object Optional: {}

ExperimentalSpec

ExperimentalSpec groups opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases (including disappearing without a name-preserving graduation path). Fields placed under experimental are explicitly NOT covered by the normal v1beta1 deprecation policy and should not be relied on for production workloads. Features graduate out of this block (and become first-class fields on the shared spec) once their API is considered stable.

Appears in:

Field	Description	Validation
`gpuMemoryService` GPUMemoryServiceSpec	gpuMemoryService configures the GPU Memory Service (GMS). When set, GPU access for GMS clients is managed via DRA.	Optional: {}
`failover` FailoverSpec	failover configures active-passive GPU failover for this component. Requires `gpuMemoryService` to also be set, and `failover.mode` must match `gpuMemoryService.mode` (enforced by the validation webhook).	Optional: {}
`checkpoint` ComponentCheckpointConfig	checkpoint configures container-image snapshotting and restore for this component. When set, the DGD controller can produce a DynamoCheckpoint CR from a running pod and later restore pods from that checkpoint for faster cold start. The user-facing shape of this field — especially its interaction with the standalone DynamoCheckpoint resource and the identity-hash computation — is still settling, which is why it lives under `experimental` in v1beta1 instead of at the top level.	Optional: {}

FailoverSpec

FailoverSpec configures active-passive failover for a worker component. The main container is cloned into two engine containers (active + standby) sharing GPUs via DRA, and the standby acquires the flock when the active engine fails. Failover requires that gpuMemoryService is also set, and that failover.mode matches gpuMemoryService.mode. Also requires the nvidia.com/dynamo-kube-discovery-mode: container annotation on the DGD. See ExperimentalSpec for the stability caveat.

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` GPUMemoryServiceMode	mode selects the failover deployment topology. Must match `spec.experimental.gpuMemoryService.mode` (or `spec.components[*].experimental.gpuMemoryService.mode` inside a DynamoGraphDeployment).	IntraPod	Enum: [IntraPod InterPod] Optional: {}
`numShadows` integer	numShadows is the number of shadow (standby) engine containers per rank. Reserved for future use; the operator currently creates exactly one shadow.	1	Maximum: 1 Minimum: 1 Optional: {}

FeaturesSpec

FeaturesSpec controls optional Dynamo platform features in the generated deployment.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`planner` RawExtension	Planner is the raw SLA planner configuration passed to the planner service. Its schema is defined by dynamo.planner.config.planner_config.PlannerConfig. Go treats this as opaque bytes; the Planner service validates it at startup. The presence of this field (non-null) enables the planner in the generated DGD.		Type: object Optional: {}
`mocker` MockerSpec	Mocker configures the simulated (mocker) backend for testing without GPUs.		Optional: {}

GMSClientPodSpec

GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.

Appears in:

GPUMemoryServiceSpec

Field	Description	Default	Validation
`name` string	name identifies this client pod.		MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`podTemplate` PodTemplateSpec	podTemplate configures the pod to run as a GMS client.		Schemaless: {} Type: object

GPUMemoryServiceMode

Underlying type: string

GPUMemoryServiceMode selects the GMS deployment topology.

Appears in:

Field	Description
`IntraPod`	GMSModeIntraPod runs GMS as a sidecar within the same pod.
`InterPod`	GMSModeInterPod runs GMS as rank-local pods that share GPUs through DRA. Extra client pod rendering is reserved for a follow-up change.

GPUMemoryServiceSpec

GPUMemoryServiceSpec configures the GPU Memory Service (GMS) for a worker component. The operator injects GMS wiring and replaces the main container’s GPU resources with a DRA ResourceClaim for shared GPU access. See ExperimentalSpec for the stability caveat.

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` GPUMemoryServiceMode	mode selects the GMS deployment topology.	IntraPod	Enum: [IntraPod InterPod] Optional: {}
`deviceClassName` string	deviceClassName is the DRA `DeviceClass` to request GPUs from.	gpu.nvidia.com	Optional: {}
`extraClientContainers` string array	extraClientContainers lists additional user-declared containers that should be wired as GMS clients in service pods. Checkpoint Job clients are declared under checkpoint.job.gmsClientContainers. In each rendered pod, only matching container names are wired; absent names are ignored.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`extraClientPods` GMSClientPodSpec array	extraClientPods declares additional GMS client pods for inter-pod GMS. This field is reserved for future use and is rejected until inter-pod client orchestration is wired.		Optional: {}

GPUSKUType

Underlying type: string

GPUSKUType is the AIC hardware system identifier for a supported GPU.

Validation:

Enum: [gb200_sxm gb10 b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie a30 l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300]

Appears in:

HardwareSpec

Field	Description
`gb200_sxm`	--- Blackwell ---
`gb10`
`b200_sxm`
`h200_sxm`	--- Hopper ---
`h100_sxm`
`h100_pcie`
`a100_sxm`	--- Ampere ---
`a100_pcie`
`a30`
`l40s`	--- Ada ---
`l40`
`l4`
`v100_sxm`	--- Older NVIDIA ---
`v100_pcie`
`t4`
`mi200`	--- AMD ---
`mi300`

HardwareSpec

HardwareSpec describes the GPU hardware for profiling and deployment. All fields are auto-detected from cluster GPU nodes when omitted (requires cluster-wide mode with GPU discovery enabled). gpuSku is a selector (restricts which nodes are considered); the other fields are pure overrides passed to the profiler. If all four fields are set, discovery is skipped.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`gpuSku` GPUSKUType	GPUSKU selects the GPU type to target. When omitted, auto-detected by selecting the GPU with the highest node count, then highest VRAM. In mixed-GPU clusters, set this to choose which GPU type to use. Discovery and totalGpus are then restricted to nodes matching this SKU.	Enum: [gb200_sxm gb10 b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie a30 l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300] Optional: {}
`vramMb` float	VRAMMB is the VRAM per GPU in MiB. When omitted, auto-detected from cluster GPU nodes.	Optional: {}
`totalGpus` integer	TotalGPUs is the GPU budget for profiling and deployment. The profiler uses this to determine parallelism and replica count. When omitted, computed by counting GPUs on discovered nodes (filtered by gpuSku when set), temporarily capped at 32 to limit profiler search space. This cap may be removed in a future release. Set this field explicitly to override.	Optional: {}
`numGpusPerNode` integer	NumGPUsPerNode is the number of GPUs per node. When omitted, auto-detected from cluster GPU nodes.	Optional: {}
`interconnect` string	Interconnect describes the primary GPU-to-GPU interconnect within a node. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed. - When omitted, the operator may attempt best-effort discovery (currently distinguishes “nvlink” vs “pcie” based on DCGM NVLink link count). If discovery is unavailable, it may remain empty. Impact of wrong / missing values: - If set more optimistically than reality (e.g., “nvlink” when only PCIe is present), performance models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts, resulting in degraded performance compared to expectations. - If set more pessimistically than reality (e.g., “pcie” when NVLink is present), the system may choose conservative plans and leave performance on the table. - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to conservative assumptions. Example values: “pcie”, “nvlink”. Other values may be accepted but may not be auto-detected.	Optional: {}
`rdma` boolean	RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator, GPUDirect settings). It only expresses availability/intent. - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery is unavailable, it may remain unset. Impact of wrong / missing values: - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or deployments to assume RDMA is available; depending on the runtime transport selection and fallback behavior, this can lead to connection/setup failures or performance regressions. - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and fall back to non-RDMA transports, usually remaining functional but potentially slower. - If unset and undiscovered, consumers should treat RDMA availability as unknown and use conservative defaults / fallback transports.	Optional: {}

KvTransferEnforcement

Underlying type: string

KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.

Validation:

Enum: [required preferred]

Appears in:

KvTransferPolicy

Field	Description
`required`	KvTransferEnforcementRequired enforces same-domain decode worker selection.
`preferred`	KvTransferEnforcementPreferred biases decode worker selection toward the same domain.

KvTransferPolicy

KvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers. This is a graph-wide concern placed under spec.experimental while the API is incubating.

Appears in:

DynamoGraphDeploymentExperimentalSpec

Field	Description	Default	Validation
`labelKey` string	labelKey is a Kubernetes node label key (e.g. ”topology.kubernetes.io/zone”) whose value identifies the topology domain for each worker. The operator copies the node label onto worker pods so the runtime can publish it as worker metadata. The label should correspond to the topology level named in `domain`.		MaxLength: 317 MinLength: 1 Pattern: `^(([a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)(\.[a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)*/)?([A-Za-z0-9]([-A-Za-z0-9_.]\{0,61\}[A-Za-z0-9])?)$` Optional: {}
`domain` TopologyDomain	domain is the logical name for the topology level to enforce (e.g. “zone”, “rack”). The router uses this to match workers that share the same value for the label identified by `labelKey`.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
`enforcement` KvTransferEnforcement	enforcement controls how the selected prefill worker’s topology is applied to decode routing. “required” only allows decode workers in the same topology domain as the selected prefill worker. “preferred” keeps all decode workers eligible, but biases selection toward workers in the same topology domain. Defaults to “required”.	required	Enum: [required preferred] Optional: {}
`preferredWeight` float	preferredWeight is required and used only when enforcement is ”preferred”. Higher values create a stronger same-domain routing preference, but do not guarantee same-domain selection. The value is not a probability; worker selection still depends on load and other routing inputs. A value of 0 disables the topology preference; 1 is the strongest supported preference.		Maximum: 1 Minimum: 0 Optional: {}

MockerSpec

MockerSpec configures the simulated (mocker) backend.

Appears in:

FeaturesSpec

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether to deploy mocker workers instead of real inference workers. Useful for large-scale testing without GPUs.		Optional: {}

ModelCacheSpec

ModelCacheSpec references a PVC containing pre-downloaded model weights.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`pvcName` string	PVCName is the name of the PersistentVolumeClaim containing model weights. The PVC must exist in the same namespace as the DGDR.		Optional: {}
`pvcModelPath` string	PVCModelPath is the path to the model checkpoint directory within the PVC (e.g. “deepseek-r1” or “models/Llama-3.1-405B-FP8”).		Optional: {}
`pvcMountPath` string	PVCMountPath is the mount path for the PVC inside the container.	/opt/model-cache	Optional: {}

ModelReference

ModelReference identifies a model served by a component. When specified, a headless service is created for endpoint discovery.

Appears in:

Field	Description	Default	Validation
`name` string	name is the base model identifier (e.g. “llama-3-70b-instruct-v1”).		MinLength: 1 Required: {}
`revision` string	revision is the model revision/version.		Optional: {}

MultinodeSpec

MultinodeSpec configures a multinode component.

Appears in:

Field	Description	Default	Validation
`nodeCount` integer	nodeCount is the number of nodes to deploy for the multinode component. Total GPUs used is `nodeCount * container GPU request`.	2	Minimum: 2 Optional: {}

OptimizationType

Underlying type: string

OptimizationType defines the optimization target for SLA-based profiling.

Validation:

Enum: [latency throughput]

Appears in:

SLASpec

Field	Description
`latency`
`throughput`

OverridesSpec

OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`profilingJob` JobSpec	ProfilingJob allows overriding the profiling Job specification. Fields set here are merged into the controller-generated Job spec.		Optional: {}
`dgd` RawExtension	DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment to use as the base for the generated deployment. Fields from profiling results are merged on top. Use this to override backend worker images. The field is stored as a raw embedded resource rather than a typed *v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already imports v1beta1 as the conversion hub and Go does not allow import cycles. The EmbeddedResource marker tells the API server to validate that the value is a well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion, kind, and field schema) is performed by the controller during reconciliation.		EmbeddedResource: {} Optional: {}

ParetoConfig

ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.

Appears in:

ProfilingResultsStatus

Field	Description	Default	Validation
`config` RawExtension	Config is the full deployment configuration for this Pareto point.		Type: object

ProfilingPhase

Underlying type: string

ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.

Validation:

Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Initializing`	Profiler is loading the DGD template, detecting GPU hardware, and resolving the model architecture from HuggingFace.
`SweepingPrefill`	Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts for prefill, measuring TTFT at each configuration.
`SweepingDecode`	Sweeping parallelization strategies and concurrency levels for decode, measuring ITL at each configuration.
`SelectingConfig`	Filtering results against SLA targets and selecting the most cost-efficient configuration that meets TTFT/ITL requirements.
`BuildingCurves`	Building detailed interpolation curves (ISL→TTFT for prefill, KV-usage×context-length→ITL for decode) using the selected configs.
`GeneratingDGD`	Packaging profiling data into a ConfigMap and generating the final DGD YAML with planner integration.
`Done`	Profiling pipeline finished successfully.

ProfilingResultsStatus

ProfilingResultsStatus contains the output of the profiling process.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`pareto` ParetoConfig array	Pareto is the list of Pareto-optimal deployment configurations discovered during profiling. Each entry represents a different cost/performance trade-off.		Optional: {}
`selectedConfig` RawExtension	SelectedConfig is the recommended configuration chosen by the profiler based on the SLA targets. This is the configuration used for deployment when autoApply is true.		Type: object Optional: {}

Restart

Restart specifies the restart policy for a graph deployment.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`id` string	id is an arbitrary string that triggers a restart when changed. Any modification to this value initiates a restart of the graph deployment according to the configured strategy.		MinLength: 1 Required: {}
`strategy` RestartStrategy	strategy specifies the restart strategy for the graph deployment.		Optional: {}

RestartPhase

Underlying type: string

RestartPhase enumerates phases of a graph-level restart.

Appears in:

RestartStatus

Field	Description
`Pending`
`Restarting`
`Completed`
`Failed`
`Superseded`

RestartStatus

RestartStatus contains the status of a graph-level restart.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`observedID` string	observedID is the restart ID currently being processed. Matches `Restart.id` in the spec.
`phase` RestartPhase	phase is the phase of the restart.
`inProgress` string array	inProgress contains the names of the components currently being restarted.	Optional: {}

RestartStrategy

RestartStrategy defines how components are restarted.

Appears in:

Restart

Field	Description	Default	Validation
`type` RestartStrategyType	type specifies the restart strategy type.	Sequential	Enum: [Sequential Parallel] Optional: {}
`order` string array	order is the complete ordered set of component names for sequential restarts. Omit or leave empty to use the controller’s default order. This field must not be set for parallel restarts.		Optional: {}

RestartStrategyType

Underlying type: string

RestartStrategyType enumerates restart strategies.

Appears in:

RestartStrategy

Field	Description
`Sequential`
`Parallel`

RollingUpdatePhase

Underlying type: string

RollingUpdatePhase represents the current phase of a rolling update.

Validation:

Enum: [Pending InProgress Completed Failed ]

Appears in:

RollingUpdateStatus

Field	Description
`Pending`
`InProgress`
`Completed`
`Failed`
“

RollingUpdateStatus

RollingUpdateStatus tracks the progress of an operator-managed rolling update.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`phase` RollingUpdatePhase	phase indicates the current phase of the rolling update.	Enum: [Pending InProgress Completed Failed ] Optional: {}
`startTime` Time	startTime is when the rolling update began.	Optional: {}
`endTime` Time	endTime is when the rolling update completed (successfully or failed).	Optional: {}
`updatedComponents` string array	updatedComponents is the list of components that have completed the rolling update.	Optional: {}

SLASpec

SLASpec defines the service-level agreement targets for profiling optimization.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`ttft` float	TTFT is the Time To First Token target in milliseconds.	Optional: {}
`itl` float	ITL is the Inter-Token Latency target in milliseconds.	Optional: {}
`e2eLatency` float	E2ELatency is the target end-to-end request latency in milliseconds. Alternative to specifying TTFT + ITL.	Optional: {}
`optimizationType` OptimizationType	OptimizationType is the optimization target for SLA profiling. Valid values: latency, throughput.	Enum: [latency throughput] Optional: {}

ScalingAdapter

ScalingAdapter opts a component into using the DynamoGraphDeploymentScalingAdapter (DGDSA). When scalingAdapter is set on a component (even as an empty object, scalingAdapter: {}), the DGDSA is created and owns the replicas field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via the Scale subresource. Omitting the field opts the component out.

Appears in:

SearchStrategy

Underlying type: string

SearchStrategy controls the profiling search depth.

Validation:

Enum: [rapid thorough]

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description
`rapid`
`thorough`

SpecTopologyConstraint

SpecTopologyConstraint defines deployment-level topology placement requirements.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`clusterTopologyName` string	clusterTopologyName is the name of the ClusterTopology resource that defines the topology hierarchy for this deployment.		MinLength: 1
`packDomain` TopologyDomain	packDomain is the default topology domain to pack pods within. Optional; omit when only components carry constraints.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` Optional: {}

TopologyConstraint

TopologyConstraint defines component-level topology placement requirements. The topology profile is inherited from the deployment-level SpecTopologyConstraint.

Appears in:

Field	Description	Default	Validation
`packDomain` TopologyDomain	packDomain is the topology domain to pack pods within. Must match a domain defined in the referenced ClusterTopology CR.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`

TopologyDomain

Underlying type: string

Validation:

Pattern: ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$

Appears in:

WorkloadSpec

WorkloadSpec defines the workload characteristics for SLA-based profiling.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`isl` integer	ISL is the Input Sequence Length (number of tokens).	4000	Optional: {}
`osl` integer	OSL is the Output Sequence Length (number of tokens).	1000	Optional: {}
`concurrency` float	Concurrency is the target concurrency level. Required (or RequestRate) when the planner is disabled.		Optional: {}
`requestRate` float	RequestRate is the target request rate (req/s). Required (or Concurrency) when the planner is disabled.		Optional: {}

operator.config.dynamo.nvidia.com/v1alpha1

Resource Types

OperatorConfiguration

CertProvisionMode

Underlying type: string

CertProvisionMode controls how webhook TLS certificates are managed.

Appears in:

WebhookServer

Field	Description
`auto`	CertProvisionModeAuto uses the built-in cert-controller to generate and rotate certificates.
`manual`	CertProvisionModeManual expects certificates to be provided externally (e.g., cert-manager, admin).

CheckpointConfiguration

CheckpointConfiguration holds checkpoint/restore settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates if checkpoint functionality is enabled
`seccomp` CheckpointSeccompConfiguration	Seccomp controls the localhost seccomp profile applied to checkpoint and restore pods. A nil value means “use the default profile”; set Seccomp.Disabled=true to disable seccomp injection entirely.
`storage` CheckpointStorageConfiguration	Storage optionally configures the namespace-local checkpoint PVC that workload pods mount. When omitted, the operator preserves the legacy behavior of discovering storage from a snapshot-agent DaemonSet in the workload namespace.

CheckpointOCIConfig

Deprecated: CheckpointOCIConfig is retained for compatibility and ignored by the current snapshot flow.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`uri` string	URI is the legacy OCI URI (oci://registry/repository).
`credentialsSecretRef` string	CredentialsSecretRef is the legacy docker config secret name.

CheckpointPVCConfig

CheckpointPVCConfig configures the namespace-local PVC mounted into checkpoint and restore workload pods.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`pvcName` string	PVCName is the PVC name in each workload namespace.
`basePath` string	BasePath is the mount path inside checkpoint and restore workload pods.
`create` boolean	Create tells the operator to create the PVC in workload namespaces when it is missing. When false, the PVC must already exist.
`size` string	Size is the storage request used when Create is true.
`storageClassName` string	StorageClassName is the optional StorageClass name used when Create is true.
`accessMode` string	AccessMode is the PVC access mode used when Create is true.

CheckpointS3Config

Deprecated: CheckpointS3Config is retained for compatibility and ignored by the current snapshot flow.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`uri` string	URI is the legacy S3 URI (s3://[endpoint/]bucket/prefix).
`credentialsSecretRef` string	CredentialsSecretRef is the legacy credentials secret name.

CheckpointSeccompConfiguration

CheckpointSeccompConfiguration controls the localhost seccomp profile applied to checkpoint and restore pods. The profile blocks io_uring syscalls (which CRIU cannot dump). Default behavior (zero-value substruct, or absent substruct) applies DefaultSeccompProfile. Set Disabled=true on OpenShift (custom localhost profiles require privileged SCC) or when using a CRIU build with io_uring support. Set Profile to override the default path.

Appears in:

CheckpointConfiguration

Field	Description	Default	Validation
`disabled` boolean	Disabled, when true, suppresses seccomp profile injection entirely. Use this for clusters where custom localhost profiles are not allowed (e.g. OpenShift’s restricted-v2 SCC) or for CRIU builds that handle io_uring natively.
`profile` string	Profile is the localhost seccomp profile path. Empty falls back to DefaultSeccompProfile. Ignored when Disabled is true.

CheckpointStorageConfiguration

CheckpointStorageConfiguration configures checkpoint storage for operator pod mutations. Only PVC storage is implemented today.

Appears in:

CheckpointConfiguration

Field	Description	Default	Validation
`type` string	Type is the storage backend type. Only pvc is implemented today.
`pvc` CheckpointPVCConfig	PVC configuration for pvc-based settings.
`s3` CheckpointS3Config	Deprecated: S3 is retained for compatibility and ignored.
`oci` CheckpointOCIConfig	Deprecated: OCI is retained for compatibility and ignored.

DRAConfiguration

DRAConfiguration holds Dynamic Resource Allocation (resource.k8s.io/v1) settings.

NOTE: auto-detection here only verifies that the resource.k8s.io/v1 API is registered on the apiserver (Kubernetes 1.34+). It does NOT verify that a GPU-specific DRA resource driver (e.g. nvidia/k8s-dra-driver-gpu) is installed, that its DeviceClass exists, or that node-level GPU drivers are compatible. An admin can use enabled: false to force-off DRA integration on clusters where the API is present but the GPU driver stack is not wired up — this makes the operator fail GMS / inter-pod failover admissions early with a clear error instead of letting pods Pend with a confusing “resourceclaim not found” at schedule time.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection of the resource.k8s.io/v1 API. nil = auto-detect. Setting true requires detection to also succeed (the operator will exit at startup otherwise).

DiscoveryBackend

Underlying type: string

DiscoveryBackend is the type for the discovery backend.

Appears in:

DiscoveryConfiguration

Field	Description
`kubernetes`	DiscoveryBackendKubernetes is the Kubernetes discovery backend
`etcd`	DiscoveryBackendEtcd is the etcd discovery backend

DiscoveryConfiguration

DiscoveryConfiguration holds discovery backend settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`backend` DiscoveryBackend	Backend is the discovery backend: “kubernetes” or “etcd”	kubernetes

GPUConfiguration

GPUConfiguration holds GPU discovery settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`discoveryEnabled` boolean	DiscoveryEnabled indicates whether GPU discovery is enabled	true

GroveConfiguration

GroveConfiguration holds Grove orchestrator settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.
`terminationDelay` Duration	TerminationDelay configures the termination delay for Grove PodCliqueSets	15m

InfrastructureConfiguration

InfrastructureConfiguration holds service mesh and backend addresses.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`natsAddress` string	NATSAddress is the address of the NATS server
`etcdAddress` string	ETCDAddress is the address of the etcd server
`modelExpressURL` string	ModelExpressURL is the URL of the Model Express server to inject into all pods
`prometheusEndpoint` string	PrometheusEndpoint is the URL of the Prometheus endpoint to use for metrics

IngressConfiguration

IngressConfiguration holds ingress settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`virtualServiceGateway` string	VirtualServiceGateway is the name of the Istio virtual service gateway
`controllerClassName` string	ControllerClassName is the ingress controller class name
`controllerTLSSecretName` string	ControllerTLSSecretName is the TLS secret for the ingress controller
`hostSuffix` string	HostSuffix is the suffix for ingress hostnames

IstioMeshConfiguration

IstioMeshConfiguration holds Istio-specific mesh settings.

Appears in:

ServiceMeshConfiguration

Field	Description	Default	Validation
`tlsMode` string	TLSMode is the Istio TLS mode for DestinationRules. Supported values: “DISABLE”, “SIMPLE”, “ISTIO_MUTUAL”, “MUTUAL”. Defaults to “SIMPLE”.
`insecureSkipVerify` boolean	InsecureSkipVerify skips TLS certificate verification in DestinationRules. Defaults to true (matching upstream GAIE behavior with self-signed certs).
`clientCertificate` string	ClientCertificate is the path (in the istio-proxy sidecar’s filesystem) to the file holding the client-side TLS certificate used for mTLS. REQUIRED when TLSMode is “MUTUAL”; ignored for other modes.
`privateKey` string	PrivateKey is the path (in the istio-proxy sidecar’s filesystem) to the file holding the client-side TLS private key used for mTLS. REQUIRED when TLSMode is “MUTUAL”; ignored for other modes.
`caCertificates` string	CaCertificates is the optional path (in the istio-proxy sidecar’s filesystem) to the file holding CA certificates used to verify the server certificate. Used only when TLSMode is “MUTUAL”; for other modes the field is ignored.

KaiSchedulerConfiguration

KaiSchedulerConfiguration holds Kai-scheduler settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.

LWSConfiguration

LWSConfiguration holds LWS orchestrator settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.

LeaderElectionConfiguration

LeaderElectionConfiguration holds leader election settings.

Appears in:

OperatorConfiguration

Field	Description	Default
`enabled` boolean	Enabled enables leader election for controller manager	false
`id` string	ID is the leader election resource identity
`namespace` string	Namespace is the namespace for the leader election resource

LoggingConfiguration

LoggingConfiguration holds logging settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`level` string	Level is the log level (e.g., “info”, “debug”)	info
`format` string	Format is the log format (e.g., “json”, “text”)	json

MPIConfiguration

MPIConfiguration holds MPI SSH secret settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`sshSecretName` string	SSHSecretName is the name of the secret containing the SSH key for MPI
`sshSecretNamespace` string	SSHSecretNamespace is the namespace where the MPI SSH secret is located

MetricsServer

MetricsServer extends Server with secure serving option.

Appears in:

ServerConfiguration

Field	Description	Default	Validation
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on
`secure` boolean	Secure enables secure serving for the metrics endpoint. nil = default to true (secure by default).

NamespaceConfiguration

NamespaceConfiguration determines operator namespace mode.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`restricted` string	Deprecated: Namespace-restricted mode is deprecated and will be removed in a future release. Use cluster-wide mode (leave Restricted empty) instead.
`scope` NamespaceScopeConfiguration	Deprecated: Scope is only used in namespace-restricted mode, which is deprecated.

NamespaceScopeConfiguration

Deprecated: NamespaceScopeConfiguration is used only by the deprecated namespace-restricted mode and will be removed in a future release.

Appears in:

NamespaceConfiguration

Field	Description	Default	Validation
`leaseDuration` Duration	LeaseDuration is the duration of namespace scope marker lease before expiration	30s
`leaseRenewInterval` Duration	LeaseRenewInterval is the interval for renewing namespace scope marker lease	10s

OperatorConfiguration

OperatorConfiguration is the Schema for the operator configuration.

Field	Description	Default	Validation
`apiVersion` string	`operator.config.dynamo.nvidia.com/v1alpha1`
`kind` string	`OperatorConfiguration`
`server` ServerConfiguration	Server configuration (metrics, health probes, webhooks)
`leaderElection` LeaderElectionConfiguration	Leader election configuration
`namespace` NamespaceConfiguration	Namespace configuration (restricted vs cluster-wide)
`orchestrators` OrchestratorConfiguration	Orchestrator configuration with optional overrides
`dra` DRAConfiguration	DRA (Dynamic Resource Allocation) settings with optional override
`infrastructure` InfrastructureConfiguration	Service mesh and infrastructure addresses
`ingress` IngressConfiguration	Ingress configuration
`serviceMesh` ServiceMeshConfiguration	ServiceMesh configures automatic generation of service-mesh resources (e.g., Istio DestinationRules) for EPP components.
`rbac` RBACConfiguration	RBAC configuration for cross-namespace resource management (cluster-wide mode)
`mpi` MPIConfiguration	MPI SSH secret configuration
`checkpoint` CheckpointConfiguration	Checkpoint/restore configuration
`discovery` DiscoveryConfiguration	Discovery backend configuration
`gpu` GPUConfiguration	GPU discovery configuration
`logging` LoggingConfiguration	Logging configuration
`security` SecurityConfiguration	HTTP/2 and TLS settings

OrchestratorConfiguration

OrchestratorConfiguration holds orchestrator override settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`grove` GroveConfiguration	Grove orchestrator configuration
`lws` LWSConfiguration	LWS orchestrator configuration
`kaiScheduler` KaiSchedulerConfiguration	KaiScheduler configuration

RBACConfiguration

RBACConfiguration holds RBAC settings for cluster-wide mode.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`plannerClusterRoleName` string	PlannerClusterRoleName is the ClusterRole for planner
`dgdrProfilingClusterRoleName` string	DGDRProfilingClusterRoleName is the ClusterRole for DGDR profiling jobs
`eppClusterRoleName` string	EPPClusterRoleName is the ClusterRole for EPP

SecurityConfiguration

SecurityConfiguration holds HTTP/2 and TLS settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enableHTTP2` boolean	EnableHTTP2 enables HTTP/2 for metrics and webhook servers	false

Server

Server holds a bind address and port.

Appears in:

Field	Description	Default	Validation
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on

ServerConfiguration

ServerConfiguration holds server bind addresses and ports.

Appears in:

OperatorConfiguration

Field	Description	Default
`metrics` MetricsServer	Metrics server configuration	{ bindAddress:0.0.0.0 port:8080 secure:true }
`healthProbe` Server	Health probe server configuration	{ bindAddress:0.0.0.0 port:8081 }
`webhook` WebhookServer	Webhook server configuration	{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 }

ServiceMeshConfiguration

ServiceMeshConfiguration holds service mesh integration settings. The operator uses this to generate mesh-specific resources (e.g., Istio DestinationRules) for EPP components so that sidecar proxies connect correctly without double-TLS issues.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`provider` string	Provider selects the service mesh implementation. Supported: “istio”, "". Empty string disables service mesh resource generation.
`istio` IstioMeshConfiguration	Istio holds Istio-specific settings. Only used when Provider is “istio”.

WebhookServer

WebhookServer extends Server with host and certificate directory.

Appears in:

ServerConfiguration

Field	Description	Default
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on
`host` string	Host is the address the webhook server binds to
`certDir` string	CertDir is the directory containing TLS certificates
`certProvisionMode` CertProvisionMode	CertProvisionMode controls certificate management: “auto” (built-in cert-controller) or “manual” (external)	auto
`secretName` string	SecretName is the name of the Kubernetes Secret holding webhook TLS certificates	webhook-server-cert
`serviceName` string	ServiceName is the name of the Kubernetes Service fronting the webhook server. Used to generate certificate SANs. Set by the Helm chart.

Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

terminationGracePeriodSeconds: 60 seconds
restartPolicy: Always

Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:

Model downloads and caching
Compilation cache directories
Persistent volume claims (PVCs)
SSH key generation in multinode deployments

Overriding Security Context

To override the default security context, specify your own securityContext in the extraPodSpec of your component:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         fsGroup: 2000  # Custom group ID
6         runAsUser: 1000
7         runAsGroup: 1000
8         runAsNonRoot: true

Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).

OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         # Omit fsGroup to let OpenShift assign it based on SCC
6         # OpenShift will inject the appropriate UID range

Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.

Shared Memory Configuration

Shared memory is enabled by default for all components:

Enabled: true (unless explicitly disabled via sharedMemory.disabled)
Size: 8Gi
Mount Path: /dev/shm
Volume Type: emptyDir with memory medium

To disable shared memory or customize the size, use the sharedMemory field in your component specification.

Health Probes by Component Type

The operator applies different default health probes based on the component type.

Frontend Components

Frontend components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /health
Port: http (8000)
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Readiness Probe:

Type: Exec command
Command: curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Worker Components

Worker components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 5 seconds
Timeout: 30 seconds
Failure Threshold: 1

Readiness Probe:

Type: HTTP GET
Path: /health
Port: system (9090)
Period: 10 seconds
Timeout: 30 seconds
Failure Threshold: 60

Startup Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 10 seconds
Timeout: 5 seconds
Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

:::{note} For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient. :::

Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):

Uses Ray for distributed execution (--distributed-executor-backend ray)
Leader nodes: Starts Ray head and runs vLLM; all probes remain active
Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed

Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):

Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active

SGLang Backend

Worker nodes: All probes (liveness, readiness, startup) are removed

TensorRT-LLM Backend

Leader nodes: All probes remain unchanged
Worker nodes:
- Liveness and startup probes are removed
- Readiness probe is replaced with a TCP socket check on SSH port (2222):
  - Initial Delay: 20 seconds
  - Period: 20 seconds
  - Timeout: 5 seconds
  - Failure Threshold: 10

Environment Variables

The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.

All Components

These environment variables are injected into every component container regardless of type.

Variable	Purpose	Default	Type	Source
`DYN_NAMESPACE`	Dynamo service namespace used for service discovery and routing	Derived from DGD spec	`string`	Downward API annotation on checkpoint-restored pods
`DYN_COMPONENT`	Identifies the component type for runtime behavior	One of: `frontend`, `worker`, `prefill`, `decode`, `planner`, `epp`	`string`	Set from component spec
`DYN_PARENT_DGD_K8S_NAME`	Kubernetes name of the parent DynamoGraphDeployment resource	—	`string`	Set from DGD metadata
`DYN_PARENT_DGD_K8S_NAMESPACE`	Kubernetes namespace of the parent DynamoGraphDeployment resource	—	`string`	Set from DGD metadata
`POD_NAME`	Current pod name	—	`string`	Downward API (`metadata.name`)
`POD_NAMESPACE`	Current pod namespace	—	`string`	Downward API (`metadata.namespace`)
`POD_UID`	Current pod UID	—	`string`	Downward API (`metadata.uid`)
`DYN_DISCOVERY_BACKEND`	Service discovery backend for inter-component communication	`kubernetes`	`string`	Options: `kubernetes`, `etcd`

Infrastructure (Conditional)

These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.

Variable	Purpose	Default	Type	Condition
`NATS_SERVER`	NATS messaging server address	—	`string`	Set when `infrastructure.natsAddress` is configured
`ETCD_ENDPOINTS`	etcd endpoint addresses for distributed state	—	`string`	Set when `infrastructure.etcdAddress` is configured
`MODEL_EXPRESS_URL`	Model Express service URL for model management	—	`string`	Set when `infrastructure.modelExpressURL` is configured
`PROMETHEUS_ENDPOINT`	Prometheus endpoint for metrics collection	—	`string`	Set when `infrastructure.prometheusEndpoint` is configured

Frontend Components

Variable	Purpose	Default	Type
`DYNAMO_PORT`	HTTP port the frontend listens on	`8000`	`int`
`DYN_HTTP_PORT`	HTTP port for the frontend service (alias)	`8000`	`int`
`DYN_NAMESPACE_PREFIX`	Namespace prefix used for frontend request routing	Same as `DYN_NAMESPACE`	`string`

Worker Components

Variable	Purpose	Default	Type
`DYN_SYSTEM_ENABLED`	Enables the system HTTP server for health checks and metrics	`true`	`string` (boolean)
`DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS`	Endpoints whose health status is used for readiness	`["generate"]`	`string` (JSON array)
`DYN_SYSTEM_PORT`	Port for the system HTTP server (health, metrics)	`9090`	`int`
`DYN_HEALTH_CHECK_ENABLED`	Disables the legacy health check mechanism in favor of the system server	`false`	`string` (boolean)
`NIXL_TELEMETRY_ENABLE`	Enables or disables NIXL telemetry collection	`n`	`string`
`NIXL_TELEMETRY_EXPORTER`	Telemetry exporter format for NIXL metrics	`prometheus`	`string`
`NIXL_TELEMETRY_PROMETHEUS_PORT`	Port for NIXL Prometheus metrics endpoint	`19090`	`int`
`DYN_NAMESPACE_WORKER_SUFFIX`	Hash suffix appended to worker namespace for rolling updates	—	`string`

Planner Components

Variable	Purpose	Default	Type
`PLANNER_PROMETHEUS_PORT`	Port for the planner’s Prometheus metrics endpoint	`9085`	`int`

EPP (Endpoint Picker Plugin) Components

Variable	Purpose	Default	Type
`USE_STREAMING`	Enables streaming mode for inference request proxying	`true`	`string` (boolean)
`RUST_LOG`	Rust log level and filter configuration	`debug,dynamo_llm::kv_router=trace`	`string`

VLLM Backend

Variable	Purpose	Default	Type	Condition
`VLLM_CACHE_ROOT`	Directory for vLLM compilation cache artifacts	—	`string`	Set when a volume mount has `useAsCompilationCache: true`
`VLLM_NIXL_SIDE_CHANNEL_HOST`	Host IP for the NIXL side channel in multiprocessing mode	Pod IP	`string`	Multinode mp backend only (Downward API: `status.podIP`)

TensorRT-LLM Backend

Variable	Purpose	Default	Type	Condition
`OMPI_MCA_orte_keep_fqdn_hostnames`	Instructs OpenMPI to preserve FQDN hostnames for inter-node communication	`1`	`string`	Multinode deployments only

Service Accounts

The following component types automatically receive dedicated service accounts:

Planner: planner-serviceaccount
EPP: epp-serviceaccount

Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

Scans all Kubernetes secrets of type kubernetes.io/dockerconfigjson in the component’s namespace
Extracts the docker registry server URLs from each secret’s authentication configuration
Matches the container image’s registry host against the discovered registry URLs
Automatically injects matching secrets as imagePullSecrets in the pod specification

This eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.

To disable automatic image pull secret discovery for a specific component, add the following annotation:

1 annotations:
2   nvidia.com/disable-image-pull-secret-discovery: "true"

Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

Default Metric: CPU utilization
Target Average Utilization: 80%

Port Configurations

Default container ports are configured based on component type:

Frontend Components

Port: 8000
Protocol: TCP
Name: http

Worker Components

Port: 9090 (system)
Protocol: TCP
Name: system
Port: 19090 (NIXL)
Protocol: TCP
Name: nixl

Planner Components

Port: 9085
Protocol: TCP
Name: metrics

EPP Components

Port: 9002 (gRPC)
Protocol: TCP
Name: grpc
Port: 9003 (gRPC health)
Protocol: TCP
Name: grpc-health
Port: 9090 (metrics)
Protocol: TCP
Name: metrics

Backend-Specific Configurations

VLLM

Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
Data Parallel RPC Port: 13445 (for data parallel multinode deployments)

SGLang

Distribution Init Port: 29500 (for multinode deployments)

TensorRT-LLM

SSH Port: 2222 (for multinode MPI communication)
OpenMPI Environment: OMPI_MCA_orte_keep_fqdn_hostnames=1

Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

Health Probes, Security Context & Pod Specifications: internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
Component-Specific Defaults:
- internal/dynamo/component_common.go - Base container and pod spec shared by all component types
- internal/dynamo/component_frontend.go
- internal/dynamo/component_worker.go
- internal/dynamo/component_planner.go
- internal/dynamo/component_epp.go
Image Pull Secrets: internal/secrets/docker.go - Implements the docker secret indexer and automatic discovery
Backend-Specific Behavior:
Checkpoint / Restore:
- internal/checkpoint/podspec.go - Checkpoint env var injection and volume setup
- internal/checkpoint/resolve.go - Checkpoint resolution logic
- internal/checkpoint/resource.go - Checkpoint resource management
Constants & Annotations: internal/consts/consts.go - Defines annotation keys and other constants

Notes

All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
User-specified probes (via livenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaults
For security context, if you provide any securityContext in extraPodSpec, no defaults will be injected, giving you full control
For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
The extraPodSpec.mainContainer field can be used to override probe configurations set by the operator

⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.

API Reference

Packages

nvidia.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.

Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.

Resource Types

Autoscaling

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Deprecated: This field is ignored.
`minReplicas` integer	Deprecated: This field is ignored.
`maxReplicas` integer	Deprecated: This field is ignored.
`behavior` HorizontalPodAutoscalerBehavior	Deprecated: This field is ignored.
`metrics` MetricSpec array	Deprecated: This field is ignored.

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled

Validation:

Enum: [Auto Manual]

Appears in:

ServiceCheckpointConfig

Field	Description
`Auto`	CheckpointModeAuto means the DGD controller will automatically create a Checkpoint CR
`Manual`	CheckpointModeManual means the user must create the Checkpoint CR themselves

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource.

Validation:

Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

ServiceReplicaStatus

Field	Description
`PodClique`	ComponentKindPodClique represents a PodClique resource.
`PodCliqueScalingGroup`	ComponentKindPodCliqueScalingGroup represents a PodCliqueScalingGroup resource.
`Deployment`	ComponentKindDeployment represents a Deployment resource.
`LeaderWorkerSet`	ComponentKindLeaderWorkerSet represents a LeaderWorkerSet resource.

ConfigMapKeySelector

ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.

Appears in:

ProfilingConfigSpec

Field	Description	Default	Validation
`name` string	Name of the ConfigMap containing the desired data.		Required: {}
`key` string	Key in the ConfigMap to select. If not specified, defaults to “disagg.yaml”.	disagg.yaml

DGDRState

Underlying type: string

Validation:

Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Initializing`
`Pending`
`Profiling`
`Deploying`
`Ready`
`DeploymentDeleted`
`Failed`

DGDState

Underlying type: string

Validation:

Enum: [initializing pending successful failed]

Appears in:

Field	Description
`initializing`
`pending`
`successful`
`failed`

DeploymentOverridesSpec

DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`name` string	Name is the desired name for the created DynamoGraphDeployment. If not specified, defaults to the DGDR name.	Optional: {}
`namespace` string	Namespace is the desired namespace for the created DynamoGraphDeployment. If not specified, defaults to the DGDR namespace.	Optional: {}
`labels` object (keys:string, values:string)	Labels are additional labels to add to the DynamoGraphDeployment metadata. These are merged with auto-generated labels from the profiling process.	Optional: {}
`annotations` object (keys:string, values:string)	Annotations are additional annotations to add to the DynamoGraphDeployment metadata.	Optional: {}
`workersImage` string	WorkersImage specifies the container image to use for DynamoGraphDeployment worker components. This image is used for both temporary DGDs created during online profiling and the final DGD. If omitted, the image from the base config file (e.g., disagg.yaml) is used. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1”	Optional: {}

DeploymentStatus

DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`name` string	Name is the name of the created DynamoGraphDeployment.
`namespace` string	Namespace is the namespace of the created DynamoGraphDeployment.
`state` DGDState	State is the current state of the DynamoGraphDeployment. This value is mirrored from the DGD’s status.state field.	initializing	Enum: [initializing pending successful failed]
`created` boolean	Created indicates whether the DGD has been successfully created. Used to prevent recreation if the DGD is manually deleted by users.

DynamoCheckpoint

DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoCheckpoint`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoCheckpointSpec
`status` DynamoCheckpointStatus

DynamoCheckpointIdentity

DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent

Appears in:

Field	Description	Default	Validation
`model` string	Model is the model identifier (e.g., “meta-llama/Llama-3-70B”)		Required: {}
`backendFramework` string	BackendFramework is the runtime framework (vllm, sglang, trtllm)		Enum: [vllm sglang trtllm] Required: {}
`dynamoVersion` string	DynamoVersion is the Dynamo platform version (optional) If not specified, version is not included in identity hash This ensures checkpoint compatibility across Dynamo releases		Optional: {}
`tensorParallelSize` integer	TensorParallelSize is the tensor parallel configuration	1	Minimum: 1 Optional: {}
`pipelineParallelSize` integer	PipelineParallelSize is the pipeline parallel configuration	1	Minimum: 1 Optional: {}
`dtype` string	Dtype is the data type (fp16, bf16, fp8, etc.)		Optional: {}
`maxModelLen` integer	MaxModelLen is the maximum sequence length		Minimum: 1 Optional: {}
`extraParameters` object (keys:string, values:string)	ExtraParameters are additional parameters that affect the checkpoint hash Use for any framework-specific or custom parameters not covered above		Optional: {}

DynamoCheckpointJobConfig

DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job

Appears in:

DynamoCheckpointSpec

Field	Description	Default	Validation
`podTemplateSpec` PodTemplateSpec	PodTemplateSpec allows customizing the checkpoint Job pod This should include the container that runs the workload to be checkpointed		Required: {}
`targetContainerName` string	TargetContainerName is the container in PodTemplateSpec to snapshot.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm for the checkpoint Job pod. When omitted, checkpoint Jobs use the same default 8Gi tmpfs as Dynamo components.		Optional: {}
`activeDeadlineSeconds` integer	ActiveDeadlineSeconds specifies the maximum time the Job can run	3600	Minimum: 1 Optional: {}
`backoffLimit` integer	Deprecated: BackoffLimit is ignored. Checkpoint Jobs never retry.		Minimum: 0 Optional: {}
`ttlSecondsAfterFinished` integer	Deprecated: TTLSecondsAfterFinished is ignored. Checkpoint Jobs use a fixed 300 second TTL.		Minimum: 0 Optional: {}

DynamoCheckpointPhase

Underlying type: string

DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle

Validation:

Enum: [Pending Creating Ready Failed]

Appears in:

DynamoCheckpointStatus

Field	Description
`Pending`	DynamoCheckpointPhasePending indicates the checkpoint CR has been created but the Job has not started
`Creating`	DynamoCheckpointPhaseCreating indicates the checkpoint Job is running
`Ready`	DynamoCheckpointPhaseReady indicates the checkpoint artifact is available
`Failed`	DynamoCheckpointPhaseFailed indicates the checkpoint creation failed

DynamoCheckpointSpec

DynamoCheckpointSpec defines the desired state of DynamoCheckpoint

Appears in:

DynamoCheckpoint

Field	Description	Validation
`identity` DynamoCheckpointIdentity	Identity defines the inputs that determine checkpoint equivalence	Required: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService enables checkpoint-time GPU Memory Service wiring. It is intentionally outside spec.identity, so it does not affect the checkpoint identity hash or deduplication.	Optional: {}
`job` DynamoCheckpointJobConfig	Job defines the configuration for the checkpoint creation Job	Required: {}

DynamoCheckpointStatus

DynamoCheckpointStatus defines the observed state of DynamoCheckpoint

Appears in:

DynamoCheckpoint

Field	Description	Validation
`phase` DynamoCheckpointPhase	Phase represents the current phase of the checkpoint lifecycle	Enum: [Pending Creating Ready Failed] Optional: {}
`identityHash` string	IdentityHash is the computed hash of the checkpoint identity This hash is used to identify equivalent checkpoints	Optional: {}
`location` string	Deprecated: Location is ignored and no longer populated. It is retained only so older objects continue to validate.	Optional: {}
`storageType` DynamoCheckpointStorageType	Deprecated: StorageType is ignored and no longer populated. It is retained only so older objects continue to validate.	Enum: [pvc s3 oci] Optional: {}
`jobName` string	JobName is the name of the checkpoint creation Job	Optional: {}
`createdAt` Time	CreatedAt is the timestamp when the checkpoint became ready	Optional: {}
`message` string	Message provides additional information about the current state	Optional: {}
`conditions` Condition array	DEPRECATED: Conditions are deprecated. Use status.phase instead.	Optional: {}

DynamoCheckpointStorageType

Underlying type: string

Validation:

Enum: [pvc s3 oci]

Appears in:

DynamoCheckpointStatus

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoComponentDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoComponentDeploymentSpec	Spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

Appears in:

Field	Description	Validation
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery	Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.	Optional: {}
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.	Optional: {}
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.
`scalingAdapter` ScalingAdapter	ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly.	Optional: {}
`eppConfig` EPPConfig	EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is “epp”.	Optional: {}
`frontendSidecar` FrontendSidecarSpec	FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE)	Optional: {}
`checkpoint` ServiceCheckpointConfig	Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start.	Optional: {}
`topologyConstraint` TopologyConstraint	TopologyConstraint for this service. packDomain is required. When both this and spec.topologyConstraint.packDomain are set, packDomain must be narrower than or equal to the spec-level packDomain.	Optional: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService configures the GPU Memory Service (GMS) sidecar. When enabled, a GMS sidecar is injected and GPU access is managed via DRA.	Optional: {}
`failover` FailoverSpec	Failover configures GMS (GPU Memory Service) failover for this service. For intraPod mode: the main container is cloned into two engine containers (active + standby). For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.	Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment

Appears in:

DynamoComponentDeployment

Field	Description	Validation
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”)	Enum: [sglang vllm trtllm]
`annotations` object (keys:string, values:string)	Annotations to add to generated Kubernetes resources for this component (such as Pod, Service, and Ingress when applicable).
`labels` object (keys:string, values:string)	Labels to add to generated Kubernetes resources for this component.
`serviceName` string	The name of the component
`componentType` string	ComponentType indicates the role of this component (for example, “main”).
`subComponentType` string	SubComponentType indicates the sub-role of this component (for example, “prefill”).
`globalDynamoNamespace` boolean	GlobalDynamoNamespace indicates that the Component will be placed in the global Dynamo namespace
`resources` Resources	Resources requested and limits for this component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
`autoscaling` Autoscaling	Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
`envs` EnvVar array	Envs defines additional environment variables to inject into the component containers.
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the component containers.
`volumeMounts` VolumeMount array	VolumeMounts references PVCs defined at the top level for volumes to be mounted by the component.
`ingress` IngressSpec	Ingress config to expose the component outside the cluster (or through a service mesh).
`modelRef` ModelReference	ModelRef references a model that this component serves When specified, a headless service will be created for endpoint discovery	Optional: {}
`sharedMemory` SharedMemorySpec	SharedMemory controls the tmpfs mounted at /dev/shm (enable/disable and size).
`extraPodMetadata` ExtraPodMetadata	ExtraPodMetadata adds labels/annotations to the created Pods.	Optional: {}
`extraPodSpec` ExtraPodSpec	ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration.	Optional: {}
`livenessProbe` Probe	LivenessProbe to detect and restart unhealthy containers.
`readinessProbe` Probe	ReadinessProbe to signal when the container is ready to receive traffic.
`replicas` integer	Replicas is the desired number of Pods for this component. When scalingAdapter is enabled, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0
`multinode` MultinodeSpec	Multinode is the configuration for multinode components.
`scalingAdapter` ScalingAdapter	ScalingAdapter configures whether this service uses the DynamoGraphDeploymentScalingAdapter. When enabled, replicas are managed via DGDSA and external autoscalers can scale the service using the Scale subresource. When disabled, replicas can be modified directly.	Optional: {}
`eppConfig` EPPConfig	EPPConfig defines EPP-specific configuration options for Endpoint Picker Plugin components. Only applicable when ComponentType is “epp”.	Optional: {}
`frontendSidecar` FrontendSidecarSpec	FrontendSidecar configures an auto-generated frontend sidecar container. When specified, the operator injects a fully configured frontend container with all standard Dynamo environment variables, health probes, and ports. This eliminates the need to manually specify these in extraPodSpec.containers. (GAIE)	Optional: {}
`checkpoint` ServiceCheckpointConfig	Checkpoint configures container checkpointing for this service. When enabled, pods can be restored from a checkpoint files for faster cold start.	Optional: {}
`topologyConstraint` TopologyConstraint	TopologyConstraint for this service. packDomain is required. When both this and spec.topologyConstraint.packDomain are set, packDomain must be narrower than or equal to the spec-level packDomain.	Optional: {}
`gpuMemoryService` GPUMemoryServiceSpec	GPUMemoryService configures the GPU Memory Service (GMS) sidecar. When enabled, a GMS sidecar is injected and GPU access is managed via DRA.	Optional: {}
`failover` FailoverSpec	Failover configures GMS (GPU Memory Service) failover for this service. For intraPod mode: the main container is cloned into two engine containers (active + standby). For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.	Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentSpec	Spec defines the desired state for this graph deployment.
`status` DynamoGraphDeploymentStatus	Status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentExperimentalSpec

DynamoGraphDeploymentExperimentalSpec groups graph-level opt-in preview features. Component-level experimental features are represented separately on component specs.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`kvTransferPolicy` KvTransferPolicy	KvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers.		Optional: {}

DynamoGraphDeploymentRequest

Lifecycle:

Initializing → Pending: Validates spec and prepares for profiling
Pending → Profiling: Creates and runs profiling job (online or AIC)
Profiling → Ready/Deploying: Generates DGD spec after profiling completes
Deploying → Ready: When autoApply=true, monitors DGD until Ready
Ready: Terminal state when DGD is operational or spec is available
DeploymentDeleted: Terminal state when auto-created DGD is manually deleted

The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.

DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeploymentRequest`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentRequestSpec	Spec defines the desired state for this deployment request.
`status` DynamoGraphDeploymentRequestStatus	Status reflects the current observed state of this deployment request.

DynamoGraphDeploymentRequestSpec

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`model` string	Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). This is a high-level identifier for easy reference in kubectl output and logs. The controller automatically sets this value in profilingConfig.config.deployment.model.		Required: {}
`backend` string	Backend specifies the inference backend for profiling. The controller automatically sets this value in profilingConfig.config.engine.backend. Profiling runs on real GPUs or via AIC simulation to collect performance data.		Enum: [auto vllm sglang trtllm] Required: {}
`useMocker` boolean	UseMocker indicates whether to deploy a mocker DynamoGraphDeployment instead of a real backend deployment. When true, the deployment uses simulated engines that don’t require GPUs, using the profiling data to simulate realistic timing behavior. Mocker is available in all backend images and useful for large-scale experiments. Profiling still runs against the real backend (specified above) to collect performance data.	false
`profilingConfig` ProfilingConfigSpec	ProfilingConfig provides the complete configuration for the profiling job. Note: GPU discovery is automatically attempted to detect GPU resources from Kubernetes cluster nodes. If the operator has node read permissions (cluster-wide or explicitly granted), discovered GPU configuration is used as defaults when hardware configuration is not manually specified (minNumGpusPerEngine, maxNumGpusPerEngine, numGpusPerNode). User-specified values always take precedence over auto-discovered values. If GPU discovery fails (e.g., namespace-restricted operator without node permissions), manual hardware config is required. This configuration is passed directly to the profiler. The structure matches the profile_sla config format exactly (see ProfilingConfigSpec for schema). Note: deployment.model and engine.backend are automatically set from the high-level modelName and backend fields and should not be specified in this config.		Required: {}
`enableGpuDiscovery` boolean	EnableGPUDiscovery controls whether the operator attempts to discover GPU hardware from cluster nodes. DEPRECATED: This field is deprecated and will be removed in v1beta1. GPU discovery is now always attempted automatically. Setting this field has no effect - the operator will always try to discover GPU hardware when node read permissions are available. If discovery is unavailable (e.g., namespace-scoped operator without permissions), manual hardware configuration is required regardless of this setting.	true	Optional: {}
`autoApply` boolean	AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, only the spec is generated and stored in status. Users can then manually create a DGD using the generated spec.	false
`deploymentOverrides` DeploymentOverridesSpec	DeploymentOverrides allows customizing metadata for the auto-created DGD. Only applicable when AutoApply is true.		Optional: {}

DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`state` DGDRState	State is a high-level textual status of the deployment request lifecycle.	Initializing	Enum: [Initializing Pending Profiling Deploying Ready DeploymentDeleted Failed]
`backend` string	Backend is extracted from profilingConfig.config.engine.backend for display purposes. This field is populated by the controller and shown in kubectl output.		Optional: {}
`observedGeneration` integer	ObservedGeneration reflects the generation of the most recently observed spec. Used to detect spec changes and enforce immutability after profiling starts.
`conditions` Condition array	Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Validation, Profiling, SpecGenerated, DeploymentReady. Conditions are merged by type on patch updates.
`profilingResults` string	ProfilingResults contains a reference to the ConfigMap holding profiling data. Format: “configmap/<name>“		Optional: {}
`generatedDeployment` RawExtension	GeneratedDeployment contains the full generated DynamoGraphDeployment specification including metadata, based on profiling results. Users can extract this to create a DGD manually, or it’s used automatically when autoApply is true. Stored as RawExtension to preserve all fields including metadata. For mocker backends, this contains the mocker DGD spec.		EmbeddedResource: {} Optional: {}
`deployment` DeploymentStatus	Deployment tracks the auto-created DGD when AutoApply is true. Contains name, namespace, state, and creation status of the managed DGD.		Optional: {}

DynamoGraphDeploymentScalingAdapter

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoGraphDeploymentScalingAdapter`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentScalingAdapterSpec
`status` DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Default	Validation
`replicas` integer	Replicas is the desired number of replicas for the target service. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.		Minimum: 0 Required: {}
`dgdRef` DynamoGraphDeploymentServiceRef	DGDRef references the DynamoGraphDeployment and the specific service to scale.		Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Validation
`replicas` integer	Replicas is the current number of replicas for the target service. This is synced from the DGD’s service replicas and is required for the scale subresource.	Optional: {}
`selector` string	Selector is a label selector string for the pods managed by this adapter. Required for HPA compatibility via the scale subresource.	Optional: {}
`lastScaleTime` Time	LastScaleTime is the last time the adapter scaled the target service.	Optional: {}

DynamoGraphDeploymentServiceRef

DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment

Appears in:

DynamoGraphDeploymentScalingAdapterSpec

Field	Description	Default	Validation
`name` string	Name of the DynamoGraphDeployment		MinLength: 1 Required: {}
`serviceName` string	ServiceName is the key name of the service within the DGD’s spec.services map to scale		MinLength: 1 Required: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Validation
`annotations` object (keys:string, values:string)	Annotations to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Service-level annotations take precedence over these values.	Optional: {}
`labels` object (keys:string, values:string)	Labels to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Service-level labels take precedence over these values.	Optional: {}
`pvcs` PVC array	PVCs defines a list of persistent volume claims that can be referenced by components. Each PVC must have a unique name that can be referenced in component specifications.	MaxItems: 100 Optional: {}
`services` object (keys:string, values:DynamoComponentDeploymentSharedSpec)	Services are the services to deploy as part of this deployment.	MaxProperties: 25 Optional: {}
`envs` EnvVar array	Envs are environment variables applied to all services in the deployment unless overridden by service-specific configuration.	Optional: {}
`backendFramework` string	BackendFramework specifies the backend framework (e.g., “sglang”, “vllm”, “trtllm”).	Enum: [sglang vllm trtllm]
`restart` Restart	Restart specifies the restart policy for the graph deployment.	Optional: {}
`topologyConstraint` SpecTopologyConstraint	TopologyConstraint is the deployment-level topology constraint. When set, topologyProfile is required and names the ClusterTopology CR to use. packDomain is optional here — it can be omitted when only services carry constraints. Services without their own topologyConstraint inherit from this value.	Optional: {}
`experimental` DynamoGraphDeploymentExperimentalSpec	Experimental groups graph-level preview features whose API shape and behavior may change in breaking ways between releases.	Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Default	Validation
`observedGeneration` integer	ObservedGeneration is the most recent generation observed by the controller.		Optional: {}
`state` DGDState	State is a high-level textual status of the graph deployment lifecycle.	initializing	Enum: [initializing pending successful failed]
`conditions` Condition array	Conditions contains the latest observed conditions of the graph deployment. The slice is merged by type on patch updates.
`services` object (keys:string, values:ServiceReplicaStatus)	Services contains per-service replica status information. The map key is the service name from spec.services.		Optional: {}
`restart` RestartStatus	Restart contains the status of the restart of the graph deployment.		Optional: {}
`checkpoints` object (keys:string, values:ServiceCheckpointStatus)	Checkpoints contains per-service checkpoint status information. The map key is the service name from spec.services.		Optional: {}
`rollingUpdate` RollingUpdateStatus	RollingUpdate tracks the progress of operator manged rolling updates. Currently only supported for singl-node, non-Grove deployments (DCD/Deployment).		Optional: {}

DynamoModel

DynamoModel is the Schema for the dynamo models API

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1alpha1`
`kind` string	`DynamoModel`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoModelSpec
`status` DynamoModelStatus

DynamoModelSpec

DynamoModelSpec defines the desired state of DynamoModel

Appears in:

DynamoModel

Field	Description	Default	Validation
`modelName` string	ModelName is the full model identifier (e.g., “meta-llama/Llama-3.3-70B-Instruct-lora”)		Required: {}
`baseModelName` string	BaseModelName is the base model identifier that matches the service label This is used to discover endpoints via headless services		Required: {}
`modelType` string	ModelType specifies the type of model (e.g., “base”, “lora”, “adapter”)	base	Enum: [base lora adapter] Optional: {}
`source` ModelSource	Source specifies the model source location (only applicable for lora model type)		Optional: {}

DynamoModelStatus

DynamoModelStatus defines the observed state of DynamoModel

Appears in:

DynamoModel

Field	Description	Validation
`endpoints` EndpointInfo array	Endpoints is the current list of all endpoints for this model	Optional: {}
`readyEndpoints` integer	ReadyEndpoints is the count of endpoints that are ready
`totalEndpoints` integer	TotalEndpoints is the total count of endpoints
`conditions` Condition array	Conditions represents the latest available observations of the model’s state	Optional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.

Appears in:

Field	Description	Default	Validation
`configMapRef` ConfigMapKeySelector	ConfigMapRef references a user-provided ConfigMap containing EPP configuration. The ConfigMap should contain EndpointPickerConfig YAML. Mutually exclusive with Config.		Optional: {}
`config` EndpointPickerConfig	Config allows specifying EPP EndpointPickerConfig directly as a structured object. The operator will marshal this to YAML and create a ConfigMap automatically. Mutually exclusive with ConfigMapRef. One of ConfigMapRef or Config must be specified (no default configuration). Uses the upstream type from github.com/kubernetes-sigs/gateway-api-inference-extension		Type: object Optional: {}

EndpointInfo

EndpointInfo represents a single endpoint (pod) serving the model

Appears in:

DynamoModelStatus

Field	Description	Validation
`address` string	Address is the full address of the endpoint (e.g., “http://10.0.1.5:9090”)
`podName` string	PodName is the name of the pod serving this endpoint	Optional: {}
`ready` boolean	Ready indicates whether the endpoint is ready to serve traffic For LoRA models: true if the POST /loras request succeeded with a 2xx status code For base models: always false (no probing performed)

ExtraPodMetadata

Appears in:

Field	Description	Default	Validation
`annotations` object (keys:string, values:string)
`labels` object (keys:string, values:string)

ExtraPodSpec

Appears in:

Field	Description	Default	Validation
`mainContainer` Container

FailoverSpec

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled activates failover mode.
`mode` GPUMemoryServiceMode	Mode selects the failover deployment topology. intraPod: engine containers run within the same pod (requires gpuMemoryService.enabled). interPod: a dedicated GMS weight server pod + engine pods per rank (requires Grove).	intraPod	Enum: [intraPod interPod] Optional: {}
`numShadows` integer	NumShadows is the number of shadow (standby) engine pods per rank. Total engine pods per rank = NumShadows + 1 (1 primary + NumShadows shadows). NumShadows is only meaningful for mode=interPod; intraPod uses a fixed 1 primary + 1 shadow sidecar layout and any value other than 1 is rejected at admission time.	1	Minimum: 1 Optional: {}

FrontendSidecarSpec

Appears in:

Field	Description	Validation
`image` string	Image is the container image for the frontend sidecar.	Required: {}
`args` string array	Args overrides the default frontend arguments. When specified, these replace the default [“-m”, “dynamo.frontend”] entirely. For example, [“-m”, “dynamo.frontend”, “—router-mode”, “direct”] for GAIE deployments.	Optional: {}
`envFromSecret` string	EnvFromSecret references a Secret whose key/value pairs will be exposed as environment variables in the frontend sidecar container.	Optional: {}
`envs` EnvVar array	Envs defines additional environment variables for the frontend sidecar. These are merged with (and can override) the auto-generated Dynamo env vars.	Optional: {}

GMSClientPodSpec

GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.

Appears in:

GPUMemoryServiceSpec

Field	Description	Default	Validation
`name` string	Name identifies this client pod.		MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`podTemplate` PodTemplateSpec	PodTemplate configures the pod to run as a GMS client.		Schemaless: {} Type: object

GPUMemoryServiceMode

Underlying type: string

GPUMemoryServiceMode selects the GMS deployment topology.

Appears in:

Field	Description
`intraPod`	GMSModeIntraPod runs GMS as a sidecar within the same pod.
`interPod`	GMSModeInterPod runs GMS as a separate weight server pod and one or more engine pods per rank, sharing GPUs via DRA ResourceClaims and a shared hostPath volume for UDS sockets. Extra client pod rendering is reserved for a follow-up change.

GPUMemoryServiceSpec

GPUMemoryServiceSpec configures the GPU Memory Service (GMS) for a worker component.

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled activates GMS wiring. GPU resources on client containers are replaced with a DRA ResourceClaim for shared GPU access.
`mode` GPUMemoryServiceMode	Mode selects the GMS deployment topology.	intraPod	Enum: [intraPod interPod] Optional: {}
`deviceClassName` string	DeviceClassName is the DRA DeviceClass to request GPUs from.	gpu.nvidia.com	Optional: {}
`extraClientContainers` string array	ExtraClientContainers lists additional user-declared containers that should be wired as GMS clients in pods rendered from the enclosing spec. DGD/DCD services apply this to service pods; DynamoCheckpoint applies this to checkpoint Job pods. In each rendered pod, only matching container names are wired; absent names are ignored.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`extraClientPods` GMSClientPodSpec array	ExtraClientPods declares additional GMS client pods for inter-pod GMS. This field is reserved for future use and is rejected until inter-pod client orchestration is wired.		Optional: {}

IngressSpec

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled exposes the component through an ingress or virtual service when true.
`host` string	Host is the base host name to route external traffic to this component.
`useVirtualService` boolean	UseVirtualService indicates whether to configure a service-mesh VirtualService instead of a standard Ingress.
`virtualServiceGateway` string	VirtualServiceGateway optionally specifies the gateway name to attach the VirtualService to.
`hostPrefix` string	HostPrefix is an optional prefix added before the host.
`annotations` object (keys:string, values:string)	Annotations to set on the generated Ingress/VirtualService resources.
`labels` object (keys:string, values:string)	Labels to set on the generated Ingress/VirtualService resources.
`tls` IngressTLSSpec	TLS holds the TLS configuration used by the Ingress/VirtualService.
`hostSuffix` string	HostSuffix is an optional suffix appended after the host.
`ingressControllerClassName` string	IngressControllerClassName selects the ingress controller class (e.g., “nginx”).

IngressTLSSpec

Appears in:

IngressSpec

Field	Description	Default	Validation
`secretName` string	SecretName is the name of a Kubernetes Secret containing the TLS certificate and key.

KvTransferEnforcement

Underlying type: string

KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.

Validation:

Enum: [required preferred]

Appears in:

KvTransferPolicy

Field	Description
`required`	KvTransferEnforcementRequired enforces same-domain decode worker selection.
`preferred`	KvTransferEnforcementPreferred biases decode worker selection toward the same domain.

KvTransferPolicy

KvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers. This graph-wide policy lives under spec.experimental while the API is incubating.

Appears in:

DynamoGraphDeploymentExperimentalSpec

Field	Description	Default	Validation
`labelKey` string	LabelKey is a Kubernetes node label key (e.g. ”topology.kubernetes.io/zone”) whose value identifies the topology domain for each worker. The operator copies the node label onto worker pods so the runtime can publish it as worker metadata. The label should correspond to the topology level named in `domain`.		MaxLength: 317 MinLength: 1 Pattern: `^(([a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)(\.[a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)*/)?([A-Za-z0-9]([-A-Za-z0-9_.]\{0,61\}[A-Za-z0-9])?)$` Optional: {}
`domain` TopologyDomain	Domain is the logical name for the topology level to enforce (e.g. “zone”, “rack”). The router uses this to match workers that share the same value for the label identified by `labelKey`.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
`enforcement` KvTransferEnforcement	Enforcement controls how the selected prefill worker’s topology is applied to decode routing. “required” only allows decode workers in the same topology domain as the selected prefill worker. “preferred” keeps all decode workers eligible, but biases selection toward workers in the same topology domain. Defaults to “required”.	required	Enum: [required preferred] Optional: {}
`preferredWeight` float	PreferredWeight is required and used only when enforcement is ”preferred”. Higher values create a stronger same-domain routing preference, but do not guarantee same-domain selection. The value is not a probability; worker selection still depends on load and other routing inputs. A value of 0 disables the topology preference; 1 is the strongest supported preference.		Maximum: 1 Minimum: 0 Optional: {}

ModelReference

ModelReference identifies a model served by this component

Appears in:

Field	Description	Default	Validation
`name` string	Name is the base model identifier (e.g., “llama-3-70b-instruct-v1”)		Required: {}
`revision` string	Revision is the model revision/version (optional)		Optional: {}

ModelSource

ModelSource defines the source location of a model

Appears in:

DynamoModelSpec

Field	Description	Default	Validation
`uri` string	URI is the model source URI Supported formats: - S3: s3://bucket/path/to/model - HuggingFace: hf://org/model@revision_sha		Required: {}

MultinodeSpec

Appears in:

Field	Description	Default	Validation
`nodeCount` integer	Indicates the number of nodes to deploy for multinode components. Total number of GPUs is NumberOfNodes * GPU limit. Must be greater than 1.	2	Minimum: 2

PVC

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Validation
`create` boolean	Create indicates to create a new PVC
`name` string	Name is the name of the PVC	Required: {}
`storageClass` string	StorageClass to be used for PVC creation. Required when create is true.
`size` Quantity	Size of the volume in Gi, used during PVC creation. Required when create is true.
`volumeAccessMode` PersistentVolumeAccessMode	VolumeAccessMode is the volume access mode of the PVC. Required when create is true.

ProfilingConfigSpec

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`config` JSON	Config is the profiling configuration as arbitrary JSON/YAML. This will be passed directly to the profiler. The profiler will validate the configuration and report any errors.	Optional: {} Type: object
`configMapRef` ConfigMapKeySelector	ConfigMapRef is an optional reference to a ConfigMap containing the DynamoGraphDeployment base config file (disagg.yaml). This is separate from the profiling config above. The path to this config will be set as engine.config in the profiling config.	Optional: {}
`profilerImage` string	ProfilerImage specifies the container image to use for profiling jobs. This image contains the profiler code and dependencies needed for SLA-based profiling. Example: “nvcr.io/nvidia/ai-dynamo/vllm-runtime:1.1.1”	Required: {}
`outputPVC` string	OutputPVC is an optional PersistentVolumeClaim name for storing profiling output. If specified, all profiling artifacts (logs, plots, configs, raw data) will be written to this PVC instead of an ephemeral emptyDir volume. This allows users to access complete profiling results after the job completes by mounting the PVC. The PVC must exist in the same namespace as the DGDR. If not specified, profiling uses emptyDir and only essential data is saved to ConfigMaps. Note: ConfigMaps are still created regardless of this setting for planner integration.	Optional: {}
`resources` ResourceRequirements	Resources specifies the compute resource requirements for the profiling job container. If not specified, no resource requests or limits are set.	Optional: {}
`tolerations` Toleration array	Tolerations allows the profiling job to be scheduled on nodes with matching taints. For example, to schedule on GPU nodes, add a toleration for the nvidia.com/gpu taint.	Optional: {}
`nodeSelector` object (keys:string, values:string)	NodeSelector is a selector which must match a node’s labels for the profiling pod to be scheduled on that node. For example, to schedule on ARM64 nodes, use {“kubernetes.io/arch”: “arm64”}.	Optional: {}

ResourceItem

Appears in:

Resources

Field	Description	Default	Validation
`cpu` string	CPU specifies the CPU resource request/limit (e.g., “1000m”, “2”)
`memory` string	Memory specifies the memory resource request/limit (e.g., “4Gi”, “8Gi”)
`gpu` string	GPU indicates the number of GPUs to request. Total number of GPUs is NumberOfNodes * GPU in case of multinode deployment.
`gpuType` string	GPUType can specify a custom GPU type, e.g. “gpu.intel.com/xe” By default if not specified, the GPU type is “nvidia.com/gpu”
`custom` object (keys:string, values:string)	Custom specifies additional custom resource requests/limits

Resources

Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.

Appears in:

Field	Description	Default	Validation
`requests` ResourceItem	Requests specifies the minimum resources required by the component
`limits` ResourceItem	Limits specifies the maximum resources allowed for the component
`claims` ResourceClaim array	Claims specifies resource claims for dynamic resource allocation

Restart

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`id` string	ID is an arbitrary string that triggers a restart when changed. Any modification to this value will initiate a restart of the graph deployment according to the strategy.		MinLength: 1 Required: {}
`strategy` RestartStrategy	Strategy specifies the restart strategy for the graph deployment.		Optional: {}

RestartPhase

Underlying type: string

Appears in:

RestartStatus

Field	Description
`Pending`
`Restarting`
`Completed`
`Failed`
`Superseded`

RestartStatus

RestartStatus contains the status of the restart of the graph deployment.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`observedID` string	ObservedID is the restart ID that has been observed and is being processed. Matches the Restart.ID field in the spec.
`phase` RestartPhase	Phase is the phase of the restart.
`inProgress` string array	InProgress contains the names of the services that are currently being restarted.	Optional: {}

RestartStrategy

Appears in:

Restart

Field	Description	Default	Validation
`type` RestartStrategyType	Type specifies the restart strategy type.	Sequential	Enum: [Sequential Parallel]
`order` string array	Order specifies the order in which the services should be restarted.		Optional: {}

RestartStrategyType

Underlying type: string

Appears in:

RestartStrategy

Field	Description
`Sequential`
`Parallel`

RollingUpdatePhase

Underlying type: string

RollingUpdatePhase represents the current phase of a rolling update.

Validation:

Enum: [Pending InProgress Completed Failed ]

Appears in:

RollingUpdateStatus

Field	Description
`Pending`
`InProgress`
`Completed`
“

RollingUpdateStatus

RollingUpdateStatus tracks the progress of a rolling update.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`phase` RollingUpdatePhase	Phase indicates the current phase of the rolling update.	Enum: [Pending InProgress Completed Failed ] Optional: {}
`startTime` Time	StartTime is when the rolling update began.	Optional: {}
`endTime` Time	EndTime is when the rolling update completed (successfully or failed).	Optional: {}
`updatedServices` string array	UpdatedServices is the list of services that have completed the rolling update. A service is considered updated when its new replicas are all ready and old replicas are fully scaled down. Only services of componentType Worker (or Prefill/Decode) are considered.	Optional: {}

ScalingAdapter

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether the ScalingAdapter should be enabled for this service. When true, a DGDSA is created and owns the replicas field. When false (default), no DGDSA is created and replicas can be modified directly in the DGD.	false	Optional: {}

ServiceCheckpointConfig

ServiceCheckpointConfig configures checkpointing for a DGD service

Appears in:

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether checkpointing is enabled for this service	false	Optional: {}
`mode` CheckpointMode	Mode defines how checkpoint creation is handled - Auto: DGD controller creates Checkpoint CR automatically - Manual: User must create Checkpoint CR	Auto	Enum: [Auto Manual] Optional: {}
`checkpointRef` string	CheckpointRef references an existing DynamoCheckpoint CR by metadata.name. If specified, this service’s Identity is ignored and the referenced checkpoint is used directly.		Optional: {}
`identity` DynamoCheckpointIdentity	Identity defines the checkpoint identity for hash computation Used when Mode is Auto or when looking up existing checkpoints Required when checkpointRef is not specified		Optional: {}
`targetContainerName` string	TargetContainerName is the workload container to snapshot and restore.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`job` ServiceCheckpointJobConfig	Job customizes the checkpoint Job that is created in Auto mode.		Optional: {}

ServiceCheckpointJobConfig

ServiceCheckpointJobConfig customizes the checkpoint Job created for a DGD service.

Appears in:

ServiceCheckpointConfig

Field	Description	Default	Validation
`gmsClientContainers` string array	GMSClientContainers lists checkpoint Job containers that should receive GMS client wiring. Requires gpuMemoryService on the service.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`podTemplate` PodTemplateSpec	PodTemplate customizes the checkpoint Job pod. The operator starts from the selected workload container and merges this template so users can add helper containers such as gms-saver.		Schemaless: {} Type: object Optional: {}

ServiceCheckpointStatus

ServiceCheckpointStatus contains checkpoint information for a single service.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`checkpointName` string	CheckpointName is the name of the associated Checkpoint CR	Optional: {}
`identityHash` string	IdentityHash is the computed hash of the checkpoint identity	Optional: {}
`ready` boolean	Ready indicates if the checkpoint was visible to the worker at startup	Optional: {}

ServiceReplicaStatus

ServiceReplicaStatus contains replica information for a single service.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`componentKind` ComponentKind	ComponentKind is the underlying resource kind (e.g., “PodClique”, “PodCliqueScalingGroup”, “Deployment”, “LeaderWorkerSet”).	Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
`componentName` string	ComponentName is the name of the primary underlying resource. DEPRECATED: Use ComponentNames instead. This field will be removed in a future release. During rolling updates, this reflects the new (target) component name.
`componentNames` string array	ComponentNames is the list of underlying resource names for this service. During normal operation, this contains a single name. During rolling updates, this contains both old and new component names.	Optional: {}
`replicas` integer	Replicas is the total number of non-terminated replicas. Required for all component kinds.	Minimum: 0
`updatedReplicas` integer	UpdatedReplicas is the number of replicas at the current/desired revision. Required for all component kinds.	Minimum: 0
`readyReplicas` integer	ReadyReplicas is the number of ready replicas. Populated for PodClique, Deployment, and LeaderWorkerSet. Not available for PodCliqueScalingGroup. When nil, the field is omitted from the API response.	Minimum: 0 Optional: {}
`availableReplicas` integer	AvailableReplicas is the number of available replicas. For Deployment: replicas ready for >= minReadySeconds. For PodCliqueScalingGroup: replicas where all constituent PodCliques have >= MinAvailable ready pods. Not available for PodClique or LeaderWorkerSet. When nil, the field is omitted from the API response.	Minimum: 0 Optional: {}

SharedMemorySpec

Appears in:

Field	Description	Default	Validation
`disabled` boolean	Disabled, when true, opts out of mounting a shared-memory medium for the component. When false (or unset), shared memory is enabled and Size is required (enforced by the validating webhook). Size is ignored when Disabled is true.
`size` Quantity

SpecTopologyConstraint

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`topologyProfile` string	TopologyProfile is the name of the ClusterTopology CR that defines the topology hierarchy for this deployment.		MinLength: 1
`packDomain` TopologyDomain	PackDomain is the default topology domain to pack pods within. Optional — omit when only services carry constraints.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` Optional: {}

TopologyConstraint

TopologyConstraint defines service-level topology placement requirements. The topology profile is inherited from the deployment-level SpecTopologyConstraint; only the pack domain is specified here.

Appears in:

Field	Description	Default	Validation
`packDomain` TopologyDomain	PackDomain is the topology domain to pack pods within. Must match a domain defined in the referenced ClusterTopology CR.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`

TopologyDomain

Underlying type: string

Validation:

Pattern: ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$

Appears in:

VolumeMount

VolumeMount references a PVC defined at the top level for volumes to be mounted by the component

Appears in:

Field	Description	Default	Validation
`name` string	Name references a PVC name defined in the top-level PVCs map		Required: {}
`mountPoint` string	MountPoint specifies where to mount the volume. If useAsCompilationCache is true and mountPoint is not specified, a backend-specific default will be used.
`useAsCompilationCache` boolean	UseAsCompilationCache indicates this volume should be used as a compilation cache. When true, backend-specific environment variables will be set and default mount points may be used.	false

nvidia.com/v1beta1

Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.

Resource Types

BackendType

Underlying type: string

BackendType specifies the inference backend.

Validation:

Enum: [auto sglang trtllm vllm]

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description
`auto`
`sglang`
`trtllm`
`vllm`

CheckpointMode

Underlying type: string

CheckpointMode defines how checkpoint creation is handled.

Validation:

Enum: [Auto Manual]

Appears in:

ComponentCheckpointConfig

Field	Description
`Auto`	CheckpointModeAuto means the DGD controller creates the DynamoCheckpoint CR automatically.
`Manual`	CheckpointModeManual means the user creates the DynamoCheckpoint CR themselves.

CompilationCacheConfig

Appears in:

Field	Description	Default	Validation
`pvcName` string	pvcName references a user-created PVC by name. The PVC must exist in the same namespace as the DynamoGraphDeployment.		MinLength: 1 Required: {}
`mountPath` string	mountPath overrides the backend-specific default mount path. When empty, the operator selects a default appropriate for the backend framework.		Optional: {}

ComponentCheckpointConfig

ComponentCheckpointConfig configures checkpointing for a DGD component.

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` CheckpointMode	mode defines how checkpoint creation is handled. `Auto`: DGD controller creates the DynamoCheckpoint CR automatically. `Manual`: user must create the DynamoCheckpoint CR.	Auto	Enum: [Auto Manual] Optional: {}
`checkpointRef` string	checkpointRef references an existing DynamoCheckpoint CR by `metadata.name`. When set, this component’s `identity` is ignored and the referenced checkpoint is used directly.		Optional: {}
`identity` DynamoCheckpointIdentity	identity defines the checkpoint identity for hash computation. Used when `mode` is `Auto` or when looking up existing checkpoints. Required when `checkpointRef` is not specified.		Optional: {}
`targetContainerName` string	targetContainerName is the workload container to snapshot and restore.	main	MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`job` ComponentCheckpointJobConfig	job customizes the checkpoint Job that is created in Auto mode.		Optional: {}

ComponentCheckpointJobConfig

ComponentCheckpointJobConfig customizes the checkpoint Job created for a DGD component.

Appears in:

ComponentCheckpointConfig

Field	Description	Default	Validation
`gmsClientContainers` string array	gmsClientContainers lists checkpoint Job containers that should receive GMS client wiring. Requires gpuMemoryService on the component.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`podTemplate` PodTemplateSpec	podTemplate customizes the checkpoint Job pod. The operator starts from the selected workload container and merges this template so users can add helper containers such as gms-saver.		Schemaless: {} Type: object Optional: {}

ComponentCheckpointStatus

ComponentCheckpointStatus contains checkpoint information for a single component.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`checkpointName` string	checkpointName is the name of the associated DynamoCheckpoint CR.	Optional: {}
`identityHash` string	identityHash is the computed hash of the checkpoint identity.	Optional: {}
`ready` boolean	ready indicates if the checkpoint was visible to the worker at startup.	Optional: {}

ComponentKind

Underlying type: string

ComponentKind represents the type of underlying Kubernetes resource backing a DGD component.

Validation:

Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]

Appears in:

ComponentReplicaStatus

Field	Description
`PodClique`
`PodCliqueScalingGroup`
`Deployment`
`LeaderWorkerSet`

ComponentReplicaStatus

ComponentReplicaStatus contains replica information for a single component.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`componentKind` ComponentKind	componentKind is the underlying resource kind (e.g. `PodClique`, `Deployment`, `LeaderWorkerSet`).	Enum: [PodClique PodCliqueScalingGroup Deployment LeaderWorkerSet]
`componentNames` string array	componentNames is the list of underlying Kubernetes resource names for this Dynamo component. During normal operation this contains a single name; during rolling updates it contains both old and new resource names.	Optional: {}
`replicas` integer	replicas is the total number of non-terminated replicas.	Minimum: 0
`updatedReplicas` integer	updatedReplicas is the number of replicas at the current/desired revision.	Minimum: 0
`readyReplicas` integer	readyReplicas is the number of ready replicas. Populated for `PodClique`, `Deployment`, and `LeaderWorkerSet`; not available for `PodCliqueScalingGroup`.	Minimum: 0 Optional: {}
`availableReplicas` integer	availableReplicas is the number of available replicas. Populated for `Deployment` and `PodCliqueScalingGroup`; not available for `PodClique` or `LeaderWorkerSet`.	Minimum: 0 Optional: {}

ComponentType

Underlying type: string

Validation:

Enum: [frontend worker prefill decode planner epp]

Appears in:

Field	Description
`frontend`
`worker`
`prefill`
`decode`
`planner`
`epp`

DGDRPhase

Underlying type: string

DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.

Validation:

Enum: [Pending Profiling Ready Deploying Deployed Failed]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Pending`
`Profiling`
`Ready`
`Deploying`
`Deployed`
`Failed`

DGDState

Underlying type: string

DGDState is the high-level lifecycle state of a DynamoGraphDeployment.

Validation:

Enum: [initializing pending successful failed]

Appears in:

DynamoGraphDeploymentStatus

Field	Description
`initializing`
`pending`
`successful`
`failed`

DeploymentInfoStatus

DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`replicas` integer	Replicas is the desired number of replicas.		Optional: {}
`availableReplicas` integer	AvailableReplicas is the number of replicas that are available and ready.		Optional: {}

DynamoCheckpointIdentity

Appears in:

ComponentCheckpointConfig

Field	Description	Default	Validation
`model` string	model is the model identifier (e.g. “meta-llama/Llama-3-70B”).		MinLength: 1 Required: {}
`backendFramework` string	backendFramework is the runtime framework (`vllm`, `sglang`, `trtllm`).		Enum: [vllm sglang trtllm] Required: {}
`dynamoVersion` string	dynamoVersion is the Dynamo platform version. If empty, the version is not included in the identity hash, so checkpoints remain compatible across releases.		Optional: {}
`tensorParallelSize` integer	tensorParallelSize is the tensor parallel configuration.	1	Minimum: 1 Optional: {}
`pipelineParallelSize` integer	pipelineParallelSize is the pipeline parallel configuration.	1	Minimum: 1 Optional: {}
`dtype` string	dtype is the data type (`fp16`, `bf16`, `fp8`, etc.).		Optional: {}
`maxModelLen` integer	maxModelLen is the maximum sequence length.		Minimum: 1 Optional: {}
`extraParameters` object (keys:string, values:string)	extraParameters are additional parameters that affect the checkpoint hash.		Optional: {}

DynamoComponentDeployment

DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoComponentDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoComponentDeploymentSpec	spec defines the desired state for this Dynamo component deployment.

DynamoComponentDeploymentSharedSpec

DynamoComponentDeploymentSharedSpec is the shared configuration used by both standalone DCDs and by the components embedded in a DynamoGraphDeployment.

Appears in:

Field	Description	Validation
`name` string	name is the stable logical identifier for this component within its DynamoGraphDeployment. It must be unique within the parent’s `spec.components` list. For standalone DynamoComponentDeployment objects, the defaulting webhook populates `name` from `metadata.name` on admission, so users typically do not need to set it explicitly. `name` is decoupled from the underlying Kubernetes resource name so that the operator can rename child workloads (e.g. suffixing worker DCDs with a hash during rolling updates) without losing the stable identity that downstream consumers (labels, status maps, DGDSA references, planner RBAC, EPP filters) depend on.	MaxLength: 63 MinLength: 1 Pattern: `^[A-Za-z0-9]([-A-Za-z0-9]*[A-Za-z0-9])?$` Required: {}
`type` ComponentType	type indicates the role of this component within a Dynamo graph. Drives port mapping, frontend detection, planner RBAC, and the pod label `nvidia.com/dynamo-component-type`. Because `prefill` and `decode` are first-class values, users can set them directly.	Enum: [frontend worker prefill decode planner epp] Optional: {}
`globalDynamoNamespace` boolean	globalDynamoNamespace places the component in the global Dynamo namespace rather than the per-deployment namespace derived from the DGD name.	Optional: {}
`podTemplate` PodTemplateSpec	podTemplate is the pod template used to create the component’s pods. The operator injects its defaults (image, command, env, ports, probes, resources, volume mounts) into the container named `"main"` inside `podTemplate.spec.containers`, merging user overrides by name. If no container named `"main"` is present, the operator auto-generates it with standard defaults. All other containers in `podTemplate.spec.containers` are treated as user-managed sidecars: the operator does not inject defaults into them, so sidecars must specify required fields (e.g. `image`) themselves. The validation webhook rejects pod templates where a non-`"main"` container is missing a required field such as `image`.	Optional: {}
`replicas` integer	replicas is the desired number of Pods for this component. When `scalingAdapter` is set on this component, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0 Optional: {}
`multinode` MultinodeSpec	multinode configures multinode components.	Optional: {}
`sharedMemorySize` Quantity	sharedMemorySize controls the size of the tmpfs mounted at `/dev/shm`. `nil` selects the operator default (8Gi), a positive quantity sets a custom size, and `"0"` disables the shared-memory volume entirely. Simpler replacement for v1alpha1’s `SharedMemorySpec` struct with its `disabled bool` + `size Quantity` pattern.	Optional: {}
`modelRef` ModelReference	modelRef references a model served by this component. When specified, a headless service is created for endpoint discovery.	Optional: {}
`scalingAdapter` ScalingAdapter	scalingAdapter opts this component into using the DynamoGraphDeploymentScalingAdapter. When set (even as an empty object, `scalingAdapter: \{\}`), a DGDSA is created and owns the `replicas` field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via the Scale subresource. Omit the field to opt out.	Optional: {}
`eppConfig` EPPConfig	eppConfig holds EPP-specific configuration for Endpoint Picker Plugin components. Only meaningful when `type` is `epp`.	Optional: {}
`frontendSidecar` string	frontendSidecar optionally designates a container in `podTemplate.spec.containers` as the frontend sidecar. The value must match the `name` of a container in that list; the operator merges its frontend-sidecar defaults (auto-generated Dynamo env vars, ports, health probes) into that container the same way it merges into `"main"`. The full container definition (image, args, envFrom, env) lives in `podTemplate` — this eliminates the redundant `image`, `args`, `envFromSecret`, and `envs` fields from v1alpha1’s `FrontendSidecarSpec`. The validation webhook rejects values that do not match any container name in `podTemplate.spec.containers`.	Optional: {}
`compilationCache` CompilationCacheConfig	compilationCache configures a PVC-backed compilation cache. The operator handles backend-specific mount paths and environment variables, so users do not need to hand-wire them into `podTemplate`. Extracted from v1alpha1’s `volumeMount.useAsCompilationCache` flag.	Optional: {}
`topologyConstraint` TopologyConstraint	topologyConstraint applies to this component. `topologyConstraint.packDomain` is required. When both this and `spec.topologyConstraint.packDomain` are set, this field’s `packDomain` must be narrower than or equal to the spec-level value.	Optional: {}
`experimental` ExperimentalSpec	experimental groups opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases, including disappearing without a name-preserving graduation path. In v1beta1 this block holds `gpuMemoryService` and `failover` (which remain tightly coupled — failover requires GMS — and are expected to evolve together as the DRA-based GPU sharing story matures), and `checkpoint` (whose interaction with the standalone DynamoCheckpoint resource and identity-hash computation is still settling). Fields here are explicitly NOT covered by the normal v1beta1 deprecation policy; do not depend on them for production workloads.	Optional: {}

DynamoComponentDeploymentSpec

DynamoComponentDeploymentSpec defines the desired state of a DynamoComponentDeployment.

Appears in:

DynamoComponentDeployment

Field	Description	Validation
`backendFramework` string	backendFramework specifies the backend framework.	Enum: [sglang vllm trtllm]
`name` string	name is the stable logical identifier for this component within its DynamoGraphDeployment. It must be unique within the parent’s `spec.components` list. For standalone DynamoComponentDeployment objects, the defaulting webhook populates `name` from `metadata.name` on admission, so users typically do not need to set it explicitly. `name` is decoupled from the underlying Kubernetes resource name so that the operator can rename child workloads (e.g. suffixing worker DCDs with a hash during rolling updates) without losing the stable identity that downstream consumers (labels, status maps, DGDSA references, planner RBAC, EPP filters) depend on.	MaxLength: 63 MinLength: 1 Pattern: `^[A-Za-z0-9]([-A-Za-z0-9]*[A-Za-z0-9])?$` Required: {}
`type` ComponentType	type indicates the role of this component within a Dynamo graph. Drives port mapping, frontend detection, planner RBAC, and the pod label `nvidia.com/dynamo-component-type`. Because `prefill` and `decode` are first-class values, users can set them directly.	Enum: [frontend worker prefill decode planner epp] Optional: {}
`globalDynamoNamespace` boolean	globalDynamoNamespace places the component in the global Dynamo namespace rather than the per-deployment namespace derived from the DGD name.	Optional: {}
`podTemplate` PodTemplateSpec	podTemplate is the pod template used to create the component’s pods. The operator injects its defaults (image, command, env, ports, probes, resources, volume mounts) into the container named `"main"` inside `podTemplate.spec.containers`, merging user overrides by name. If no container named `"main"` is present, the operator auto-generates it with standard defaults. All other containers in `podTemplate.spec.containers` are treated as user-managed sidecars: the operator does not inject defaults into them, so sidecars must specify required fields (e.g. `image`) themselves. The validation webhook rejects pod templates where a non-`"main"` container is missing a required field such as `image`.	Optional: {}
`replicas` integer	replicas is the desired number of Pods for this component. When `scalingAdapter` is set on this component, this field is managed by the DynamoGraphDeploymentScalingAdapter and should not be modified directly.	Minimum: 0 Optional: {}
`multinode` MultinodeSpec	multinode configures multinode components.	Optional: {}
`sharedMemorySize` Quantity	sharedMemorySize controls the size of the tmpfs mounted at `/dev/shm`. `nil` selects the operator default (8Gi), a positive quantity sets a custom size, and `"0"` disables the shared-memory volume entirely. Simpler replacement for v1alpha1’s `SharedMemorySpec` struct with its `disabled bool` + `size Quantity` pattern.	Optional: {}
`modelRef` ModelReference	modelRef references a model served by this component. When specified, a headless service is created for endpoint discovery.	Optional: {}
`scalingAdapter` ScalingAdapter	scalingAdapter opts this component into using the DynamoGraphDeploymentScalingAdapter. When set (even as an empty object, `scalingAdapter: \{\}`), a DGDSA is created and owns the `replicas` field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via the Scale subresource. Omit the field to opt out.	Optional: {}
`eppConfig` EPPConfig	eppConfig holds EPP-specific configuration for Endpoint Picker Plugin components. Only meaningful when `type` is `epp`.	Optional: {}
`frontendSidecar` string	frontendSidecar optionally designates a container in `podTemplate.spec.containers` as the frontend sidecar. The value must match the `name` of a container in that list; the operator merges its frontend-sidecar defaults (auto-generated Dynamo env vars, ports, health probes) into that container the same way it merges into `"main"`. The full container definition (image, args, envFrom, env) lives in `podTemplate` — this eliminates the redundant `image`, `args`, `envFromSecret`, and `envs` fields from v1alpha1’s `FrontendSidecarSpec`. The validation webhook rejects values that do not match any container name in `podTemplate.spec.containers`.	Optional: {}
`compilationCache` CompilationCacheConfig	compilationCache configures a PVC-backed compilation cache. The operator handles backend-specific mount paths and environment variables, so users do not need to hand-wire them into `podTemplate`. Extracted from v1alpha1’s `volumeMount.useAsCompilationCache` flag.	Optional: {}
`topologyConstraint` TopologyConstraint	topologyConstraint applies to this component. `topologyConstraint.packDomain` is required. When both this and `spec.topologyConstraint.packDomain` are set, this field’s `packDomain` must be narrower than or equal to the spec-level value.	Optional: {}
`experimental` ExperimentalSpec	experimental groups opt-in preview features whose API shape and behavior may change in breaking ways between v1beta1 releases, including disappearing without a name-preserving graduation path. In v1beta1 this block holds `gpuMemoryService` and `failover` (which remain tightly coupled — failover requires GMS — and are expected to evolve together as the DRA-based GPU sharing story matures), and `checkpoint` (whose interaction with the standalone DynamoCheckpoint resource and identity-hash computation is still settling). Fields here are explicitly NOT covered by the normal v1beta1 deprecation policy; do not depend on them for production workloads.	Optional: {}

DynamoGraphDeployment

DynamoGraphDeployment is the Schema for the dynamographdeployments API.

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeployment`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentSpec	spec defines the desired state for this graph deployment.
`status` DynamoGraphDeploymentStatus	status reflects the current observed state of this graph deployment.

DynamoGraphDeploymentComponentRef

Appears in:

DynamoGraphDeploymentScalingAdapterSpec

Field	Description	Default	Validation
`name` string	name is the `metadata.name` of the target DynamoGraphDeployment.		MinLength: 1 Required: {}
`componentName` string	componentName is the `componentName` of the entry within the target DGD’s `spec.components` list to scale.		MinLength: 1 Required: {}

DynamoGraphDeploymentExperimentalSpec

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`kvTransferPolicy` KvTransferPolicy	kvTransferPolicy configures topology-aware routing for KV-cache transfers between prefill and decode workers.		Optional: {}

v1beta1 DynamoGraphDeploymentRequest

Lifecycle:

Pending: Spec validated, preparing for profiling
Profiling: Profiling job is running to discover optimal configurations
Ready: Profiling complete, generated DGD spec available in status
Deploying: DGD is being created and rolled out (when autoApply=true)
Deployed: DGD is running and healthy
Failed: An unrecoverable error occurred

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeploymentRequest`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentRequestSpec	Spec defines the desired state for this deployment request.
`status` DynamoGraphDeploymentRequestStatus	Status reflects the current observed state of this deployment request.

v1beta1 DynamoGraphDeploymentRequestSpec

DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Default	Validation
`model` string	Model specifies the model to deploy (e.g., “Qwen/Qwen3-0.6B”, “meta-llama/Llama-3-70b”). Can be a HuggingFace ID or a private model name.		MinLength: 1 Required: {}
`backend` BackendType	Backend specifies the inference backend to use for profiling and deployment.	auto	Enum: [auto sglang trtllm vllm] Optional: {}
`image` string	Image is the container image reference for the profiling job (planner image). Example: “nvcr.io/nvidia/ai-dynamo/dynamo-planner:1.1.1”. For Dynamo < 1.1.0, use dynamo-frontend.		Optional: {}
`modelCache` ModelCacheSpec	ModelCache provides optional PVC configuration for pre-downloaded model weights. When provided, weights are loaded from the PVC instead of downloading from HuggingFace.		Optional: {}
`hardware` HardwareSpec	Hardware describes the hardware resources available for profiling and deployment. Typically auto-filled by the operator from cluster discovery.		Optional: {}
`workload` WorkloadSpec	Workload defines the expected workload characteristics for SLA-based profiling.		Optional: {}
`sla` SLASpec	SLA defines service-level agreement targets that drive profiling optimization.		Optional: {}
`overrides` OverridesSpec	Overrides allows customizing the profiling job and the generated DynamoGraphDeployment.		Optional: {}
`features` FeaturesSpec	Features controls optional Dynamo platform features in the generated deployment.		Optional: {}
`searchStrategy` SearchStrategy	SearchStrategy controls the profiling search depth. ”rapid” performs a fast sweep; “thorough” explores more configurations.	rapid	Enum: [rapid thorough] Optional: {}
`autoApply` boolean	AutoApply indicates whether to automatically create a DynamoGraphDeployment after profiling completes. If false, the generated spec is stored in status for manual review and application.	true	Optional: {}

v1beta1 DynamoGraphDeploymentRequestStatus

DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.

Appears in:

DynamoGraphDeploymentRequest

Field	Description	Validation
`phase` DGDRPhase	Phase is the high-level lifecycle phase of the deployment request.	Enum: [Pending Profiling Ready Deploying Deployed Failed] Optional: {}
`profilingPhase` ProfilingPhase	ProfilingPhase indicates the current sub-phase of the profiling pipeline. Only meaningful when Phase is “Profiling”. Cleared when profiling completes or fails.	Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done] Optional: {}
`dgdName` string	DGDName is the name of the generated or created DynamoGraphDeployment.	Optional: {}
`profilingJobName` string	ProfilingJobName is the name of the Kubernetes Job running the profiler.	Optional: {}
`conditions` Condition array	Conditions contains the latest observed conditions of the deployment request. Standard condition types include: Succeeded, Validation, Profiling, SpecGenerated, DeploymentReady.	Optional: {}
`profilingResults` ProfilingResultsStatus	ProfilingResults contains the output of the profiling process including Pareto-optimal configurations and the selected deployment configuration.	Optional: {}
`deploymentInfo` DeploymentInfoStatus	DeploymentInfo tracks the state of the deployed DynamoGraphDeployment. Populated when a DGD has been created (either via autoApply or manually).	Optional: {}
`observedGeneration` integer	ObservedGeneration is the most recent generation observed by the controller.	Optional: {}

DynamoGraphDeploymentScalingAdapter

v1alpha1 remains the storage version; conversion between served versions is handled by the operator’s conversion webhook (see api/v1alpha1/dynamographdeploymentscalingadapter_conversion.go).

Field	Description	Default	Validation
`apiVersion` string	`nvidia.com/v1beta1`
`kind` string	`DynamoGraphDeploymentScalingAdapter`
`metadata` ObjectMeta	Refer to Kubernetes API documentation for fields of `metadata`.
`spec` DynamoGraphDeploymentScalingAdapterSpec
`status` DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterSpec

DynamoGraphDeploymentScalingAdapterSpec defines the desired state of a DynamoGraphDeploymentScalingAdapter.

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Default	Validation
`replicas` integer	replicas is the desired number of replicas for the target component. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users.		Minimum: 0 Required: {}
`dgdRef` DynamoGraphDeploymentComponentRef	dgdRef references the DynamoGraphDeployment and the specific component to scale.		Required: {}

DynamoGraphDeploymentScalingAdapterStatus

DynamoGraphDeploymentScalingAdapterStatus defines the observed state of a DynamoGraphDeploymentScalingAdapter.

Appears in:

DynamoGraphDeploymentScalingAdapter

Field	Description	Validation
`replicas` integer	replicas is the current number of replicas for the target component. This is synced from the DGD’s component replicas and is required for the scale subresource.
`selector` string	selector is a label selector string for the pods managed by this adapter. Required for HPA compatibility via the scale subresource.	Optional: {}
`lastScaleTime` Time	lastScaleTime is the last time the adapter scaled the target component.	Optional: {}

DynamoGraphDeploymentSpec

DynamoGraphDeploymentSpec defines the desired state of a DynamoGraphDeployment.

Appears in:

DynamoGraphDeployment

Field	Description	Validation
`annotations` object (keys:string, values:string)	annotations to propagate to all child resources (PCS, DCD, Deployments, and pod templates). Component-level (`podTemplate`) values take precedence on conflict.	Optional: {}
`labels` object (keys:string, values:string)	labels to propagate to all child resources. Same precedence rules as `annotations`.	Optional: {}
`components` DynamoComponentDeploymentSharedSpec array	components are the components deployed as part of this graph. Each entry carries its own stable logical `name`, and names must be unique within the list. Component types are generally repeatable, except `type: epp` which may appear at most once.	MaxItems: 25 Optional: {}
`env` EnvVar array	env is prepended to every component’s environment. Component-specific env entries with the same name take precedence and may reference values from this list.	Optional: {}
`backendFramework` string	backendFramework specifies the backend framework (e.g. “sglang”, “vllm”, “trtllm”).	Enum: [sglang vllm trtllm]
`restart` Restart	restart specifies the restart policy for the graph deployment.	Optional: {}
`topologyConstraint` SpecTopologyConstraint	topologyConstraint is the deployment-level topology constraint. When set, `spec.topologyConstraint.clusterTopologyName` names the ClusterTopology CR to use. `spec.topologyConstraint.packDomain` is optional at this level and can be omitted when only components carry constraints. Components without their own `topologyConstraint` inherit from this value.	Optional: {}
`experimental` DynamoGraphDeploymentExperimentalSpec	experimental groups graph-level preview features whose API shape and behavior may change in breaking ways between v1beta1 releases.	Optional: {}

DynamoGraphDeploymentStatus

DynamoGraphDeploymentStatus defines the observed state of a DynamoGraphDeployment. Unchanged between v1alpha1 and v1beta1.

Appears in:

DynamoGraphDeployment

Field	Description	Default	Validation
`observedGeneration` integer	observedGeneration is the most recent generation observed by the controller.		Optional: {}
`state` DGDState	state is a high-level textual status of the graph deployment lifecycle.	initializing	Enum: [initializing pending successful failed]
`conditions` Condition array	conditions contains the latest observed conditions of the graph deployment. Merged by type on patch updates.		Optional: {}
`components` object (keys:string, values:ComponentReplicaStatus)	components contains per-component replica status information, keyed by component name.		Optional: {}
`restart` RestartStatus	restart contains the status of a graph-level restart.		Optional: {}
`checkpoints` object (keys:string, values:ComponentCheckpointStatus)	checkpoints contains per-component checkpoint status, keyed by component name.		Optional: {}
`rollingUpdate` RollingUpdateStatus	rollingUpdate tracks the progress of operator-managed rolling updates. Currently only supported for single-node, non-Grove deployments (DCD/Deployment).		Optional: {}

EPPConfig

EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components.

Appears in:

Field	Description	Default	Validation
`configMapRef` ConfigMapKeySelector	configMapRef references a user-provided ConfigMap containing EPP configuration. Mutually exclusive with `config`.		Optional: {}
`config` EndpointPickerConfig	config allows specifying EPP `EndpointPickerConfig` directly as a structured object. The operator marshals this to YAML and creates a ConfigMap automatically. Mutually exclusive with `configMapRef`. One of `configMapRef` or `config` must be specified.		Type: object Optional: {}

ExperimentalSpec

Appears in:

Field	Description	Validation
`gpuMemoryService` GPUMemoryServiceSpec	gpuMemoryService configures the GPU Memory Service (GMS). When set, GPU access for GMS clients is managed via DRA.	Optional: {}
`failover` FailoverSpec	failover configures active-passive GPU failover for this component. Requires `gpuMemoryService` to also be set, and `failover.mode` must match `gpuMemoryService.mode` (enforced by the validation webhook).	Optional: {}
`checkpoint` ComponentCheckpointConfig	checkpoint configures container-image snapshotting and restore for this component. When set, the DGD controller can produce a DynamoCheckpoint CR from a running pod and later restore pods from that checkpoint for faster cold start. The user-facing shape of this field — especially its interaction with the standalone DynamoCheckpoint resource and the identity-hash computation — is still settling, which is why it lives under `experimental` in v1beta1 instead of at the top level.	Optional: {}

FailoverSpec

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` GPUMemoryServiceMode	mode selects the failover deployment topology. Must match `spec.experimental.gpuMemoryService.mode` (or `spec.components[*].experimental.gpuMemoryService.mode` inside a DynamoGraphDeployment).	IntraPod	Enum: [IntraPod InterPod] Optional: {}
`numShadows` integer	numShadows is the number of shadow (standby) engine containers per rank. Reserved for future use; the operator currently creates exactly one shadow.	1	Maximum: 1 Minimum: 1 Optional: {}

FeaturesSpec

FeaturesSpec controls optional Dynamo platform features in the generated deployment.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`planner` RawExtension	Planner is the raw SLA planner configuration passed to the planner service. Its schema is defined by dynamo.planner.config.planner_config.PlannerConfig. Go treats this as opaque bytes; the Planner service validates it at startup. The presence of this field (non-null) enables the planner in the generated DGD.		Type: object Optional: {}
`mocker` MockerSpec	Mocker configures the simulated (mocker) backend for testing without GPUs.		Optional: {}

GMSClientPodSpec

GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.

Appears in:

GPUMemoryServiceSpec

Field	Description	Default	Validation
`name` string	name identifies this client pod.		MaxLength: 63 MinLength: 1 Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$`
`podTemplate` PodTemplateSpec	podTemplate configures the pod to run as a GMS client.		Schemaless: {} Type: object

GPUMemoryServiceMode

Underlying type: string

GPUMemoryServiceMode selects the GMS deployment topology.

Appears in:

Field	Description
`IntraPod`	GMSModeIntraPod runs GMS as a sidecar within the same pod.
`InterPod`	GMSModeInterPod runs GMS as rank-local pods that share GPUs through DRA. Extra client pod rendering is reserved for a follow-up change.

GPUMemoryServiceSpec

Appears in:

ExperimentalSpec

Field	Description	Default	Validation
`mode` GPUMemoryServiceMode	mode selects the GMS deployment topology.	IntraPod	Enum: [IntraPod InterPod] Optional: {}
`deviceClassName` string	deviceClassName is the DRA `DeviceClass` to request GPUs from.	gpu.nvidia.com	Optional: {}
`extraClientContainers` string array	extraClientContainers lists additional user-declared containers that should be wired as GMS clients in service pods. Checkpoint Job clients are declared under checkpoint.job.gmsClientContainers. In each rendered pod, only matching container names are wired; absent names are ignored.		items:MaxLength: 63 items:MinLength: 1 items:Pattern: `^[a-z0-9]([-a-z0-9]*[a-z0-9])?$` Optional: {}
`extraClientPods` GMSClientPodSpec array	extraClientPods declares additional GMS client pods for inter-pod GMS. This field is reserved for future use and is rejected until inter-pod client orchestration is wired.		Optional: {}

GPUSKUType

Underlying type: string

GPUSKUType is the AIC hardware system identifier for a supported GPU.

Validation:

Enum: [gb200_sxm gb10 b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie a30 l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300]

Appears in:

HardwareSpec

Field	Description
`gb200_sxm`	--- Blackwell ---
`gb10`
`b200_sxm`
`h200_sxm`	--- Hopper ---
`h100_sxm`
`h100_pcie`
`a100_sxm`	--- Ampere ---
`a100_pcie`
`a30`
`l40s`	--- Ada ---
`l40`
`l4`
`v100_sxm`	--- Older NVIDIA ---
`v100_pcie`
`t4`
`mi200`	--- AMD ---
`mi300`

HardwareSpec

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`gpuSku` GPUSKUType	GPUSKU selects the GPU type to target. When omitted, auto-detected by selecting the GPU with the highest node count, then highest VRAM. In mixed-GPU clusters, set this to choose which GPU type to use. Discovery and totalGpus are then restricted to nodes matching this SKU.	Enum: [gb200_sxm gb10 b200_sxm h200_sxm h100_sxm h100_pcie a100_sxm a100_pcie a30 l40s l40 l4 v100_sxm v100_pcie t4 mi200 mi300] Optional: {}
`vramMb` float	VRAMMB is the VRAM per GPU in MiB. When omitted, auto-detected from cluster GPU nodes.	Optional: {}
`totalGpus` integer	TotalGPUs is the GPU budget for profiling and deployment. The profiler uses this to determine parallelism and replica count. When omitted, computed by counting GPUs on discovered nodes (filtered by gpuSku when set), temporarily capped at 32 to limit profiler search space. This cap may be removed in a future release. Set this field explicitly to override.	Optional: {}
`numGpusPerNode` integer	NumGPUsPerNode is the number of GPUs per node. When omitted, auto-detected from cluster GPU nodes.	Optional: {}
`interconnect` string	Interconnect describes the primary GPU-to-GPU interconnect within a node. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT configure or enable any GPU interconnect; it only describes what is available/assumed. - When omitted, the operator may attempt best-effort discovery (currently distinguishes “nvlink” vs “pcie” based on DCGM NVLink link count). If discovery is unavailable, it may remain empty. Impact of wrong / missing values: - If set more optimistically than reality (e.g., “nvlink” when only PCIe is present), performance models may overestimate intra-node bandwidth and choose overly aggressive parallelism or layouts, resulting in degraded performance compared to expectations. - If set more pessimistically than reality (e.g., “pcie” when NVLink is present), the system may choose conservative plans and leave performance on the table. - If unset and undiscovered, consumers should treat the interconnect as unknown and fall back to conservative assumptions. Example values: “pcie”, “nvlink”. Other values may be accepted but may not be auto-detected.	Optional: {}
`rdma` boolean	RDMA indicates whether the cluster has RDMA-capable networking available for Dynamo data movement. Semantics / usage: - This is capability metadata used for profiling, planning, and deployment decisions. - It does NOT install, enable, or configure RDMA (e.g., drivers, SR-IOV, NVIDIA network operator, GPUDirect settings). It only expresses availability/intent. - When omitted, the operator may attempt best-effort discovery (e.g., via node labels indicating RDMA/SR-IOV capability and/or presence of NVIDIA network-operator RDMA components). If discovery is unavailable, it may remain unset. Impact of wrong / missing values: - False positive (set true when RDMA is not actually usable end-to-end) may cause plans or deployments to assume RDMA is available; depending on the runtime transport selection and fallback behavior, this can lead to connection/setup failures or performance regressions. - False negative (set false when RDMA is available) will typically avoid RDMA-optimized paths and fall back to non-RDMA transports, usually remaining functional but potentially slower. - If unset and undiscovered, consumers should treat RDMA availability as unknown and use conservative defaults / fallback transports.	Optional: {}

KvTransferEnforcement

Underlying type: string

KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.

Validation:

Enum: [required preferred]

Appears in:

KvTransferPolicy

Field	Description
`required`	KvTransferEnforcementRequired enforces same-domain decode worker selection.
`preferred`	KvTransferEnforcementPreferred biases decode worker selection toward the same domain.

KvTransferPolicy

Appears in:

DynamoGraphDeploymentExperimentalSpec

Field	Description	Default	Validation
`labelKey` string	labelKey is a Kubernetes node label key (e.g. ”topology.kubernetes.io/zone”) whose value identifies the topology domain for each worker. The operator copies the node label onto worker pods so the runtime can publish it as worker metadata. The label should correspond to the topology level named in `domain`.		MaxLength: 317 MinLength: 1 Pattern: `^(([a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)(\.[a-z0-9]([-a-z0-9]\{0,61\}[a-z0-9])?)*/)?([A-Za-z0-9]([-A-Za-z0-9_.]\{0,61\}[A-Za-z0-9])?)$` Optional: {}
`domain` TopologyDomain	domain is the logical name for the topology level to enforce (e.g. “zone”, “rack”). The router uses this to match workers that share the same value for the label identified by `labelKey`.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`
`enforcement` KvTransferEnforcement	enforcement controls how the selected prefill worker’s topology is applied to decode routing. “required” only allows decode workers in the same topology domain as the selected prefill worker. “preferred” keeps all decode workers eligible, but biases selection toward workers in the same topology domain. Defaults to “required”.	required	Enum: [required preferred] Optional: {}
`preferredWeight` float	preferredWeight is required and used only when enforcement is ”preferred”. Higher values create a stronger same-domain routing preference, but do not guarantee same-domain selection. The value is not a probability; worker selection still depends on load and other routing inputs. A value of 0 disables the topology preference; 1 is the strongest supported preference.		Maximum: 1 Minimum: 0 Optional: {}

MockerSpec

MockerSpec configures the simulated (mocker) backend.

Appears in:

FeaturesSpec

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates whether to deploy mocker workers instead of real inference workers. Useful for large-scale testing without GPUs.		Optional: {}

ModelCacheSpec

ModelCacheSpec references a PVC containing pre-downloaded model weights.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`pvcName` string	PVCName is the name of the PersistentVolumeClaim containing model weights. The PVC must exist in the same namespace as the DGDR.		Optional: {}
`pvcModelPath` string	PVCModelPath is the path to the model checkpoint directory within the PVC (e.g. “deepseek-r1” or “models/Llama-3.1-405B-FP8”).		Optional: {}
`pvcMountPath` string	PVCMountPath is the mount path for the PVC inside the container.	/opt/model-cache	Optional: {}

ModelReference

ModelReference identifies a model served by a component. When specified, a headless service is created for endpoint discovery.

Appears in:

Field	Description	Default	Validation
`name` string	name is the base model identifier (e.g. “llama-3-70b-instruct-v1”).		MinLength: 1 Required: {}
`revision` string	revision is the model revision/version.		Optional: {}

MultinodeSpec

MultinodeSpec configures a multinode component.

Appears in:

Field	Description	Default	Validation
`nodeCount` integer	nodeCount is the number of nodes to deploy for the multinode component. Total GPUs used is `nodeCount * container GPU request`.	2	Minimum: 2 Optional: {}

OptimizationType

Underlying type: string

OptimizationType defines the optimization target for SLA-based profiling.

Validation:

Enum: [latency throughput]

Appears in:

SLASpec

Field	Description
`latency`
`throughput`

OverridesSpec

OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`profilingJob` JobSpec	ProfilingJob allows overriding the profiling Job specification. Fields set here are merged into the controller-generated Job spec.		Optional: {}
`dgd` RawExtension	DGD allows providing a full or partial nvidia.com/v1alpha1 DynamoGraphDeployment to use as the base for the generated deployment. Fields from profiling results are merged on top. Use this to override backend worker images. The field is stored as a raw embedded resource rather than a typed *v1alpha1.DynamoGraphDeployment to avoid a circular import: v1alpha1 already imports v1beta1 as the conversion hub and Go does not allow import cycles. The EmbeddedResource marker tells the API server to validate that the value is a well-formed Kubernetes object (has apiVersion/kind), but does not enforce that it is specifically a DynamoGraphDeployment. Full type validation (correct apiVersion, kind, and field schema) is performed by the controller during reconciliation.		EmbeddedResource: {} Optional: {}

ParetoConfig

ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.

Appears in:

ProfilingResultsStatus

Field	Description	Default	Validation
`config` RawExtension	Config is the full deployment configuration for this Pareto point.		Type: object

ProfilingPhase

Underlying type: string

ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.

Validation:

Enum: [Initializing SweepingPrefill SweepingDecode SelectingConfig BuildingCurves GeneratingDGD Done]

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description
`Initializing`	Profiler is loading the DGD template, detecting GPU hardware, and resolving the model architecture from HuggingFace.
`SweepingPrefill`	Sweeping parallelization strategies (TP/TEP/DEP) across GPU counts for prefill, measuring TTFT at each configuration.
`SweepingDecode`	Sweeping parallelization strategies and concurrency levels for decode, measuring ITL at each configuration.
`SelectingConfig`	Filtering results against SLA targets and selecting the most cost-efficient configuration that meets TTFT/ITL requirements.
`BuildingCurves`	Building detailed interpolation curves (ISL→TTFT for prefill, KV-usage×context-length→ITL for decode) using the selected configs.
`GeneratingDGD`	Packaging profiling data into a ConfigMap and generating the final DGD YAML with planner integration.
`Done`	Profiling pipeline finished successfully.

ProfilingResultsStatus

ProfilingResultsStatus contains the output of the profiling process.

Appears in:

DynamoGraphDeploymentRequestStatus

Field	Description	Default	Validation
`pareto` ParetoConfig array	Pareto is the list of Pareto-optimal deployment configurations discovered during profiling. Each entry represents a different cost/performance trade-off.		Optional: {}
`selectedConfig` RawExtension	SelectedConfig is the recommended configuration chosen by the profiler based on the SLA targets. This is the configuration used for deployment when autoApply is true.		Type: object Optional: {}

Restart

Restart specifies the restart policy for a graph deployment.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`id` string	id is an arbitrary string that triggers a restart when changed. Any modification to this value initiates a restart of the graph deployment according to the configured strategy.		MinLength: 1 Required: {}
`strategy` RestartStrategy	strategy specifies the restart strategy for the graph deployment.		Optional: {}

RestartPhase

Underlying type: string

RestartPhase enumerates phases of a graph-level restart.

Appears in:

RestartStatus

Field	Description
`Pending`
`Restarting`
`Completed`
`Failed`
`Superseded`

RestartStatus

RestartStatus contains the status of a graph-level restart.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`observedID` string	observedID is the restart ID currently being processed. Matches `Restart.id` in the spec.
`phase` RestartPhase	phase is the phase of the restart.
`inProgress` string array	inProgress contains the names of the components currently being restarted.	Optional: {}

RestartStrategy

RestartStrategy defines how components are restarted.

Appears in:

Restart

Field	Description	Default	Validation
`type` RestartStrategyType	type specifies the restart strategy type.	Sequential	Enum: [Sequential Parallel] Optional: {}
`order` string array	order is the complete ordered set of component names for sequential restarts. Omit or leave empty to use the controller’s default order. This field must not be set for parallel restarts.		Optional: {}

RestartStrategyType

Underlying type: string

RestartStrategyType enumerates restart strategies.

Appears in:

RestartStrategy

Field	Description
`Sequential`
`Parallel`

RollingUpdatePhase

Underlying type: string

RollingUpdatePhase represents the current phase of a rolling update.

Validation:

Enum: [Pending InProgress Completed Failed ]

Appears in:

RollingUpdateStatus

Field	Description
`Pending`
`InProgress`
`Completed`
`Failed`
“

RollingUpdateStatus

RollingUpdateStatus tracks the progress of an operator-managed rolling update.

Appears in:

DynamoGraphDeploymentStatus

Field	Description	Validation
`phase` RollingUpdatePhase	phase indicates the current phase of the rolling update.	Enum: [Pending InProgress Completed Failed ] Optional: {}
`startTime` Time	startTime is when the rolling update began.	Optional: {}
`endTime` Time	endTime is when the rolling update completed (successfully or failed).	Optional: {}
`updatedComponents` string array	updatedComponents is the list of components that have completed the rolling update.	Optional: {}

SLASpec

SLASpec defines the service-level agreement targets for profiling optimization.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Validation
`ttft` float	TTFT is the Time To First Token target in milliseconds.	Optional: {}
`itl` float	ITL is the Inter-Token Latency target in milliseconds.	Optional: {}
`e2eLatency` float	E2ELatency is the target end-to-end request latency in milliseconds. Alternative to specifying TTFT + ITL.	Optional: {}
`optimizationType` OptimizationType	OptimizationType is the optimization target for SLA profiling. Valid values: latency, throughput.	Enum: [latency throughput] Optional: {}

ScalingAdapter

Appears in:

SearchStrategy

Underlying type: string

SearchStrategy controls the profiling search depth.

Validation:

Enum: [rapid thorough]

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description
`rapid`
`thorough`

SpecTopologyConstraint

SpecTopologyConstraint defines deployment-level topology placement requirements.

Appears in:

DynamoGraphDeploymentSpec

Field	Description	Default	Validation
`clusterTopologyName` string	clusterTopologyName is the name of the ClusterTopology resource that defines the topology hierarchy for this deployment.		MinLength: 1
`packDomain` TopologyDomain	packDomain is the default topology domain to pack pods within. Optional; omit when only components carry constraints.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` Optional: {}

TopologyConstraint

TopologyConstraint defines component-level topology placement requirements. The topology profile is inherited from the deployment-level SpecTopologyConstraint.

Appears in:

Field	Description	Default	Validation
`packDomain` TopologyDomain	packDomain is the topology domain to pack pods within. Must match a domain defined in the referenced ClusterTopology CR.		Pattern: `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$`

TopologyDomain

Underlying type: string

Validation:

Pattern: ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$

Appears in:

WorkloadSpec

WorkloadSpec defines the workload characteristics for SLA-based profiling.

Appears in:

DynamoGraphDeploymentRequestSpec

Field	Description	Default	Validation
`isl` integer	ISL is the Input Sequence Length (number of tokens).	4000	Optional: {}
`osl` integer	OSL is the Output Sequence Length (number of tokens).	1000	Optional: {}
`concurrency` float	Concurrency is the target concurrency level. Required (or RequestRate) when the planner is disabled.		Optional: {}
`requestRate` float	RequestRate is the target request rate (req/s). Required (or Concurrency) when the planner is disabled.		Optional: {}

operator.config.dynamo.nvidia.com/v1alpha1

Resource Types

OperatorConfiguration

CertProvisionMode

Underlying type: string

CertProvisionMode controls how webhook TLS certificates are managed.

Appears in:

WebhookServer

Field	Description
`auto`	CertProvisionModeAuto uses the built-in cert-controller to generate and rotate certificates.
`manual`	CertProvisionModeManual expects certificates to be provided externally (e.g., cert-manager, admin).

CheckpointConfiguration

CheckpointConfiguration holds checkpoint/restore settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled indicates if checkpoint functionality is enabled
`seccomp` CheckpointSeccompConfiguration	Seccomp controls the localhost seccomp profile applied to checkpoint and restore pods. A nil value means “use the default profile”; set Seccomp.Disabled=true to disable seccomp injection entirely.
`storage` CheckpointStorageConfiguration	Storage optionally configures the namespace-local checkpoint PVC that workload pods mount. When omitted, the operator preserves the legacy behavior of discovering storage from a snapshot-agent DaemonSet in the workload namespace.

CheckpointOCIConfig

Deprecated: CheckpointOCIConfig is retained for compatibility and ignored by the current snapshot flow.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`uri` string	URI is the legacy OCI URI (oci://registry/repository).
`credentialsSecretRef` string	CredentialsSecretRef is the legacy docker config secret name.

CheckpointPVCConfig

CheckpointPVCConfig configures the namespace-local PVC mounted into checkpoint and restore workload pods.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`pvcName` string	PVCName is the PVC name in each workload namespace.
`basePath` string	BasePath is the mount path inside checkpoint and restore workload pods.
`create` boolean	Create tells the operator to create the PVC in workload namespaces when it is missing. When false, the PVC must already exist.
`size` string	Size is the storage request used when Create is true.
`storageClassName` string	StorageClassName is the optional StorageClass name used when Create is true.
`accessMode` string	AccessMode is the PVC access mode used when Create is true.

CheckpointS3Config

Deprecated: CheckpointS3Config is retained for compatibility and ignored by the current snapshot flow.

Appears in:

CheckpointStorageConfiguration

Field	Description	Default	Validation
`uri` string	URI is the legacy S3 URI (s3://[endpoint/]bucket/prefix).
`credentialsSecretRef` string	CredentialsSecretRef is the legacy credentials secret name.

CheckpointSeccompConfiguration

Appears in:

CheckpointConfiguration

Field	Description	Default	Validation
`disabled` boolean	Disabled, when true, suppresses seccomp profile injection entirely. Use this for clusters where custom localhost profiles are not allowed (e.g. OpenShift’s restricted-v2 SCC) or for CRIU builds that handle io_uring natively.
`profile` string	Profile is the localhost seccomp profile path. Empty falls back to DefaultSeccompProfile. Ignored when Disabled is true.

CheckpointStorageConfiguration

CheckpointStorageConfiguration configures checkpoint storage for operator pod mutations. Only PVC storage is implemented today.

Appears in:

CheckpointConfiguration

Field	Description	Default	Validation
`type` string	Type is the storage backend type. Only pvc is implemented today.
`pvc` CheckpointPVCConfig	PVC configuration for pvc-based settings.
`s3` CheckpointS3Config	Deprecated: S3 is retained for compatibility and ignored.
`oci` CheckpointOCIConfig	Deprecated: OCI is retained for compatibility and ignored.

DRAConfiguration

DRAConfiguration holds Dynamic Resource Allocation (resource.k8s.io/v1) settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection of the resource.k8s.io/v1 API. nil = auto-detect. Setting true requires detection to also succeed (the operator will exit at startup otherwise).

DiscoveryBackend

Underlying type: string

DiscoveryBackend is the type for the discovery backend.

Appears in:

DiscoveryConfiguration

Field	Description
`kubernetes`	DiscoveryBackendKubernetes is the Kubernetes discovery backend
`etcd`	DiscoveryBackendEtcd is the etcd discovery backend

DiscoveryConfiguration

DiscoveryConfiguration holds discovery backend settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`backend` DiscoveryBackend	Backend is the discovery backend: “kubernetes” or “etcd”	kubernetes

GPUConfiguration

GPUConfiguration holds GPU discovery settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`discoveryEnabled` boolean	DiscoveryEnabled indicates whether GPU discovery is enabled	true

GroveConfiguration

GroveConfiguration holds Grove orchestrator settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.
`terminationDelay` Duration	TerminationDelay configures the termination delay for Grove PodCliqueSets	15m

InfrastructureConfiguration

InfrastructureConfiguration holds service mesh and backend addresses.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`natsAddress` string	NATSAddress is the address of the NATS server
`etcdAddress` string	ETCDAddress is the address of the etcd server
`modelExpressURL` string	ModelExpressURL is the URL of the Model Express server to inject into all pods
`prometheusEndpoint` string	PrometheusEndpoint is the URL of the Prometheus endpoint to use for metrics

IngressConfiguration

IngressConfiguration holds ingress settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`virtualServiceGateway` string	VirtualServiceGateway is the name of the Istio virtual service gateway
`controllerClassName` string	ControllerClassName is the ingress controller class name
`controllerTLSSecretName` string	ControllerTLSSecretName is the TLS secret for the ingress controller
`hostSuffix` string	HostSuffix is the suffix for ingress hostnames

IstioMeshConfiguration

IstioMeshConfiguration holds Istio-specific mesh settings.

Appears in:

ServiceMeshConfiguration

Field	Description	Default	Validation
`tlsMode` string	TLSMode is the Istio TLS mode for DestinationRules. Supported values: “DISABLE”, “SIMPLE”, “ISTIO_MUTUAL”, “MUTUAL”. Defaults to “SIMPLE”.
`insecureSkipVerify` boolean	InsecureSkipVerify skips TLS certificate verification in DestinationRules. Defaults to true (matching upstream GAIE behavior with self-signed certs).
`clientCertificate` string	ClientCertificate is the path (in the istio-proxy sidecar’s filesystem) to the file holding the client-side TLS certificate used for mTLS. REQUIRED when TLSMode is “MUTUAL”; ignored for other modes.
`privateKey` string	PrivateKey is the path (in the istio-proxy sidecar’s filesystem) to the file holding the client-side TLS private key used for mTLS. REQUIRED when TLSMode is “MUTUAL”; ignored for other modes.
`caCertificates` string	CaCertificates is the optional path (in the istio-proxy sidecar’s filesystem) to the file holding CA certificates used to verify the server certificate. Used only when TLSMode is “MUTUAL”; for other modes the field is ignored.

KaiSchedulerConfiguration

KaiSchedulerConfiguration holds Kai-scheduler settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.

LWSConfiguration

LWSConfiguration holds LWS orchestrator settings.

Appears in:

OrchestratorConfiguration

Field	Description	Default	Validation
`enabled` boolean	Enabled overrides auto-detection. nil = auto-detect.

LeaderElectionConfiguration

LeaderElectionConfiguration holds leader election settings.

Appears in:

OperatorConfiguration

Field	Description	Default
`enabled` boolean	Enabled enables leader election for controller manager	false
`id` string	ID is the leader election resource identity
`namespace` string	Namespace is the namespace for the leader election resource

LoggingConfiguration

LoggingConfiguration holds logging settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`level` string	Level is the log level (e.g., “info”, “debug”)	info
`format` string	Format is the log format (e.g., “json”, “text”)	json

MPIConfiguration

MPIConfiguration holds MPI SSH secret settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`sshSecretName` string	SSHSecretName is the name of the secret containing the SSH key for MPI
`sshSecretNamespace` string	SSHSecretNamespace is the namespace where the MPI SSH secret is located

MetricsServer

MetricsServer extends Server with secure serving option.

Appears in:

ServerConfiguration

Field	Description	Default	Validation
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on
`secure` boolean	Secure enables secure serving for the metrics endpoint. nil = default to true (secure by default).

NamespaceConfiguration

NamespaceConfiguration determines operator namespace mode.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`restricted` string	Deprecated: Namespace-restricted mode is deprecated and will be removed in a future release. Use cluster-wide mode (leave Restricted empty) instead.
`scope` NamespaceScopeConfiguration	Deprecated: Scope is only used in namespace-restricted mode, which is deprecated.

NamespaceScopeConfiguration

Deprecated: NamespaceScopeConfiguration is used only by the deprecated namespace-restricted mode and will be removed in a future release.

Appears in:

NamespaceConfiguration

Field	Description	Default	Validation
`leaseDuration` Duration	LeaseDuration is the duration of namespace scope marker lease before expiration	30s
`leaseRenewInterval` Duration	LeaseRenewInterval is the interval for renewing namespace scope marker lease	10s

OperatorConfiguration

OperatorConfiguration is the Schema for the operator configuration.

Field	Description	Default	Validation
`apiVersion` string	`operator.config.dynamo.nvidia.com/v1alpha1`
`kind` string	`OperatorConfiguration`
`server` ServerConfiguration	Server configuration (metrics, health probes, webhooks)
`leaderElection` LeaderElectionConfiguration	Leader election configuration
`namespace` NamespaceConfiguration	Namespace configuration (restricted vs cluster-wide)
`orchestrators` OrchestratorConfiguration	Orchestrator configuration with optional overrides
`dra` DRAConfiguration	DRA (Dynamic Resource Allocation) settings with optional override
`infrastructure` InfrastructureConfiguration	Service mesh and infrastructure addresses
`ingress` IngressConfiguration	Ingress configuration
`serviceMesh` ServiceMeshConfiguration	ServiceMesh configures automatic generation of service-mesh resources (e.g., Istio DestinationRules) for EPP components.
`rbac` RBACConfiguration	RBAC configuration for cross-namespace resource management (cluster-wide mode)
`mpi` MPIConfiguration	MPI SSH secret configuration
`checkpoint` CheckpointConfiguration	Checkpoint/restore configuration
`discovery` DiscoveryConfiguration	Discovery backend configuration
`gpu` GPUConfiguration	GPU discovery configuration
`logging` LoggingConfiguration	Logging configuration
`security` SecurityConfiguration	HTTP/2 and TLS settings

OrchestratorConfiguration

OrchestratorConfiguration holds orchestrator override settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`grove` GroveConfiguration	Grove orchestrator configuration
`lws` LWSConfiguration	LWS orchestrator configuration
`kaiScheduler` KaiSchedulerConfiguration	KaiScheduler configuration

RBACConfiguration

RBACConfiguration holds RBAC settings for cluster-wide mode.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`plannerClusterRoleName` string	PlannerClusterRoleName is the ClusterRole for planner
`dgdrProfilingClusterRoleName` string	DGDRProfilingClusterRoleName is the ClusterRole for DGDR profiling jobs
`eppClusterRoleName` string	EPPClusterRoleName is the ClusterRole for EPP

SecurityConfiguration

SecurityConfiguration holds HTTP/2 and TLS settings.

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`enableHTTP2` boolean	EnableHTTP2 enables HTTP/2 for metrics and webhook servers	false

Server

Server holds a bind address and port.

Appears in:

Field	Description	Default	Validation
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on

ServerConfiguration

ServerConfiguration holds server bind addresses and ports.

Appears in:

OperatorConfiguration

Field	Description	Default
`metrics` MetricsServer	Metrics server configuration	{ bindAddress:0.0.0.0 port:8080 secure:true }
`healthProbe` Server	Health probe server configuration	{ bindAddress:0.0.0.0 port:8081 }
`webhook` WebhookServer	Webhook server configuration	{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 }

ServiceMeshConfiguration

Appears in:

OperatorConfiguration

Field	Description	Default	Validation
`provider` string	Provider selects the service mesh implementation. Supported: “istio”, "". Empty string disables service mesh resource generation.
`istio` IstioMeshConfiguration	Istio holds Istio-specific settings. Only used when Provider is “istio”.

WebhookServer

WebhookServer extends Server with host and certificate directory.

Appears in:

ServerConfiguration

Field	Description	Default
`bindAddress` string	BindAddress is the address the server binds to
`port` integer	Port is the port the server listens on
`host` string	Host is the address the webhook server binds to
`certDir` string	CertDir is the directory containing TLS certificates
`certProvisionMode` CertProvisionMode	CertProvisionMode controls certificate management: “auto” (built-in cert-controller) or “manual” (external)	auto
`secretName` string	SecretName is the name of the Kubernetes Secret holding webhook TLS certificates	webhook-server-cert
`serviceName` string	ServiceName is the name of the Kubernetes Service fronting the webhook server. Used to generate certificate SANs. Set by the Helm chart.

Operator Default Values Injection

The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:

Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).

Pod Specification Defaults

All components receive the following pod-level defaults unless overridden:

terminationGracePeriodSeconds: 60 seconds
restartPolicy: Always

Security Context

The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:

fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumes

This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:

Model downloads and caching
Compilation cache directories
Persistent volume claims (PVCs)
SSH key generation in multinode deployments

Overriding Security Context

To override the default security context, specify your own securityContext in the extraPodSpec of your component:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         fsGroup: 2000  # Custom group ID
6         runAsUser: 1000
7         runAsGroup: 1000
8         runAsNonRoot: true

OpenShift and Security Context Constraints

In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:

1 services:
2   YourWorker:
3     extraPodSpec:
4       securityContext:
5         # Omit fsGroup to let OpenShift assign it based on SCC
6         # OpenShift will inject the appropriate UID range

Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.

Shared Memory Configuration

Shared memory is enabled by default for all components:

Enabled: true (unless explicitly disabled via sharedMemory.disabled)
Size: 8Gi
Mount Path: /dev/shm
Volume Type: emptyDir with memory medium

To disable shared memory or customize the size, use the sharedMemory field in your component specification.

Health Probes by Component Type

The operator applies different default health probes based on the component type.

Frontend Components

Frontend components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /health
Port: http (8000)
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Readiness Probe:

Type: Exec command
Command: curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""
Initial Delay: 60 seconds
Period: 60 seconds
Timeout: 30 seconds
Failure Threshold: 10

Worker Components

Worker components receive the following probe configurations:

Liveness Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 5 seconds
Timeout: 30 seconds
Failure Threshold: 1

Readiness Probe:

Type: HTTP GET
Path: /health
Port: system (9090)
Period: 10 seconds
Timeout: 30 seconds
Failure Threshold: 60

Startup Probe:

Type: HTTP GET
Path: /live
Port: system (9090)
Period: 10 seconds
Timeout: 5 seconds
Failure Threshold: 720 (allows up to 2 hours for startup: 10s × 720 = 7200s)

Multinode Deployment Probe Modifications

For multinode deployments, the operator modifies probes based on the backend framework and node role:

VLLM Backend

The operator automatically selects between two deployment modes based on parallelism configuration:

Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):

Uses Ray for distributed execution (--distributed-executor-backend ray)
Leader nodes: Starts Ray head and runs vLLM; all probes remain active
Worker nodes: Run Ray agents only; all probes (liveness, readiness, startup) are removed

Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):

Worker nodes: All probes (liveness, readiness, startup) are removed
Leader nodes: All probes remain active

SGLang Backend

Worker nodes: All probes (liveness, readiness, startup) are removed

TensorRT-LLM Backend

Leader nodes: All probes remain unchanged
Worker nodes:
- Liveness and startup probes are removed
- Readiness probe is replaced with a TCP socket check on SSH port (2222):
  - Initial Delay: 20 seconds
  - Period: 20 seconds
  - Timeout: 5 seconds
  - Failure Threshold: 10

Environment Variables

All Components

These environment variables are injected into every component container regardless of type.

Variable	Purpose	Default	Type	Source
`DYN_NAMESPACE`	Dynamo service namespace used for service discovery and routing	Derived from DGD spec	`string`	Downward API annotation on checkpoint-restored pods
`DYN_COMPONENT`	Identifies the component type for runtime behavior	One of: `frontend`, `worker`, `prefill`, `decode`, `planner`, `epp`	`string`	Set from component spec
`DYN_PARENT_DGD_K8S_NAME`	Kubernetes name of the parent DynamoGraphDeployment resource	—	`string`	Set from DGD metadata
`DYN_PARENT_DGD_K8S_NAMESPACE`	Kubernetes namespace of the parent DynamoGraphDeployment resource	—	`string`	Set from DGD metadata
`POD_NAME`	Current pod name	—	`string`	Downward API (`metadata.name`)
`POD_NAMESPACE`	Current pod namespace	—	`string`	Downward API (`metadata.namespace`)
`POD_UID`	Current pod UID	—	`string`	Downward API (`metadata.uid`)
`DYN_DISCOVERY_BACKEND`	Service discovery backend for inter-component communication	`kubernetes`	`string`	Options: `kubernetes`, `etcd`

Infrastructure (Conditional)

These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.

Variable	Purpose	Default	Type	Condition
`NATS_SERVER`	NATS messaging server address	—	`string`	Set when `infrastructure.natsAddress` is configured
`ETCD_ENDPOINTS`	etcd endpoint addresses for distributed state	—	`string`	Set when `infrastructure.etcdAddress` is configured
`MODEL_EXPRESS_URL`	Model Express service URL for model management	—	`string`	Set when `infrastructure.modelExpressURL` is configured
`PROMETHEUS_ENDPOINT`	Prometheus endpoint for metrics collection	—	`string`	Set when `infrastructure.prometheusEndpoint` is configured

Frontend Components

Variable	Purpose	Default	Type
`DYNAMO_PORT`	HTTP port the frontend listens on	`8000`	`int`
`DYN_HTTP_PORT`	HTTP port for the frontend service (alias)	`8000`	`int`
`DYN_NAMESPACE_PREFIX`	Namespace prefix used for frontend request routing	Same as `DYN_NAMESPACE`	`string`

Worker Components

Variable	Purpose	Default	Type
`DYN_SYSTEM_ENABLED`	Enables the system HTTP server for health checks and metrics	`true`	`string` (boolean)
`DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS`	Endpoints whose health status is used for readiness	`["generate"]`	`string` (JSON array)
`DYN_SYSTEM_PORT`	Port for the system HTTP server (health, metrics)	`9090`	`int`
`DYN_HEALTH_CHECK_ENABLED`	Disables the legacy health check mechanism in favor of the system server	`false`	`string` (boolean)
`NIXL_TELEMETRY_ENABLE`	Enables or disables NIXL telemetry collection	`n`	`string`
`NIXL_TELEMETRY_EXPORTER`	Telemetry exporter format for NIXL metrics	`prometheus`	`string`
`NIXL_TELEMETRY_PROMETHEUS_PORT`	Port for NIXL Prometheus metrics endpoint	`19090`	`int`
`DYN_NAMESPACE_WORKER_SUFFIX`	Hash suffix appended to worker namespace for rolling updates	—	`string`

Planner Components

Variable	Purpose	Default	Type
`PLANNER_PROMETHEUS_PORT`	Port for the planner’s Prometheus metrics endpoint	`9085`	`int`

EPP (Endpoint Picker Plugin) Components

Variable	Purpose	Default	Type
`USE_STREAMING`	Enables streaming mode for inference request proxying	`true`	`string` (boolean)
`RUST_LOG`	Rust log level and filter configuration	`debug,dynamo_llm::kv_router=trace`	`string`

VLLM Backend

Variable	Purpose	Default	Type	Condition
`VLLM_CACHE_ROOT`	Directory for vLLM compilation cache artifacts	—	`string`	Set when a volume mount has `useAsCompilationCache: true`
`VLLM_NIXL_SIDE_CHANNEL_HOST`	Host IP for the NIXL side channel in multiprocessing mode	Pod IP	`string`	Multinode mp backend only (Downward API: `status.podIP`)

TensorRT-LLM Backend

Variable	Purpose	Default	Type	Condition
`OMPI_MCA_orte_keep_fqdn_hostnames`	Instructs OpenMPI to preserve FQDN hostnames for inter-node communication	`1`	`string`	Multinode deployments only

Service Accounts

The following component types automatically receive dedicated service accounts:

Planner: planner-serviceaccount
EPP: epp-serviceaccount

Image Pull Secrets

The operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:

Scans all Kubernetes secrets of type kubernetes.io/dockerconfigjson in the component’s namespace
Extracts the docker registry server URLs from each secret’s authentication configuration
Matches the container image’s registry host against the discovered registry URLs
Automatically injects matching secrets as imagePullSecrets in the pod specification

To disable automatic image pull secret discovery for a specific component, add the following annotation:

1 annotations:
2   nvidia.com/disable-image-pull-secret-discovery: "true"

Autoscaling Defaults

When autoscaling is enabled but no metrics are specified, the operator applies:

Default Metric: CPU utilization
Target Average Utilization: 80%

Port Configurations

Default container ports are configured based on component type:

Frontend Components

Port: 8000
Protocol: TCP
Name: http

Worker Components

Port: 9090 (system)
Protocol: TCP
Name: system
Port: 19090 (NIXL)
Protocol: TCP
Name: nixl

Planner Components

Port: 9085
Protocol: TCP
Name: metrics

EPP Components

Port: 9002 (gRPC)
Protocol: TCP
Name: grpc
Port: 9003 (gRPC health)
Protocol: TCP
Name: grpc-health
Port: 9090 (metrics)
Protocol: TCP
Name: metrics

Backend-Specific Configurations

VLLM

Ray Head Port: 6379 (for Ray cluster coordination in multinode TP/PP deployments)
Data Parallel RPC Port: 13445 (for data parallel multinode deployments)

SGLang

Distribution Init Port: 29500 (for multinode deployments)

TensorRT-LLM

SSH Port: 2222 (for multinode MPI communication)
OpenMPI Environment: OMPI_MCA_orte_keep_fqdn_hostnames=1

Implementation Reference

For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:

Health Probes, Security Context & Pod Specifications: internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
Component-Specific Defaults:
- internal/dynamo/component_common.go - Base container and pod spec shared by all component types
- internal/dynamo/component_frontend.go
- internal/dynamo/component_worker.go
- internal/dynamo/component_planner.go
- internal/dynamo/component_epp.go
Image Pull Secrets: internal/secrets/docker.go - Implements the docker secret indexer and automatic discovery
Backend-Specific Behavior:
Checkpoint / Restore:
- internal/checkpoint/podspec.go - Checkpoint env var injection and volume setup
- internal/checkpoint/resolve.go - Checkpoint resolution logic
- internal/checkpoint/resource.go - Checkpoint resource management
Constants & Annotations: internal/consts/consts.go - Defines annotation keys and other constants

Notes

All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
User-specified probes (via livenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaults
For security context, if you provide any securityContext in extraPodSpec, no defaults will be injected, giving you full control
For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
The extraPodSpec.mainContainer field can be used to override probe configurations set by the operator

1	services:
2	YourWorker:
3	extraPodSpec:
4	securityContext:
5	fsGroup: 2000 # Custom group ID
6	runAsUser: 1000
7	runAsGroup: 1000
8	runAsNonRoot: true

1	annotations:
2	nvidia.com/disable-image-pull-secret-discovery: "true"