⚠️ Important: This documentation is automatically generated from source code. Do not edit this file directly.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
Deprecated: This field is deprecated and ignored. Use DynamoGraphDeploymentScalingAdapter with HPA, KEDA, or Planner for autoscaling instead. See docs/kubernetes/autoscaling.md for migration guidance. This field will be removed in a future API version.
Appears in:
Underlying type: string
CheckpointMode defines how checkpoint creation is handled
Validation:
Appears in:
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource.
Validation:
Appears in:
ConfigMapKeySelector selects a specific key from a ConfigMap. Used to reference external configuration data stored in ConfigMaps.
Appears in:
Underlying type: string
Validation:
Appears in:
Underlying type: string
Validation:
Appears in:
DeploymentOverridesSpec allows users to customize metadata for auto-created DynamoGraphDeployments. When autoApply is enabled, these overrides are applied to the generated DGD resource.
Appears in:
DeploymentStatus tracks the state of an auto-created DynamoGraphDeployment. This status is populated when autoApply is enabled and a DGD is created.
Appears in:
DynamoCheckpoint is the Schema for the dynamocheckpoints API It represents a container checkpoint that can be used to restore pods to a warm state
DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence Two checkpoints with the same identity hash are considered equivalent
Appears in:
DynamoCheckpointJobConfig defines the configuration for the checkpoint creation Job
Appears in:
Underlying type: string
DynamoCheckpointPhase represents the current phase of the checkpoint lifecycle
Validation:
Appears in:
DynamoCheckpointSpec defines the desired state of DynamoCheckpoint
Appears in:
DynamoCheckpointStatus defines the observed state of DynamoCheckpoint
Appears in:
Underlying type: string
Deprecated: StorageType is retained for compatibility with older DynamoCheckpoint status consumers. The current checkpoint flow publishes PVC-backed artifacts discovered from the snapshot-agent DaemonSet.
Validation:
Appears in:
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API
Appears in:
DynamoComponentDeploymentSpec defines the desired state of DynamoComponentDeployment
Appears in:
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
DynamoGraphDeploymentExperimentalSpec groups graph-level opt-in preview features. Component-level experimental features are represented separately on component specs.
Appears in:
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It serves as the primary interface for users to request model deployments with specific performance and resource constraints, enabling SLA-driven deployments.
Lifecycle:
The spec becomes immutable once profiling starts. Users must delete and recreate the DGDR to modify configuration after this point.
DEPRECATION NOTICE: v1alpha1 DynamoGraphDeploymentRequest is deprecated. Please migrate to nvidia.com/v1beta1 DynamoGraphDeploymentRequest. v1alpha1 will be removed in a future release.
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. This CRD serves as the primary interface for users to request model deployments with specific performance constraints and resource requirements, enabling SLA-driven deployments.
Appears in:
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest. The controller updates this status as the DGDR progresses through its lifecycle.
Appears in:
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual services within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s service replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of DynamoGraphDeploymentScalingAdapter
Appears in:
DynamoGraphDeploymentServiceRef identifies a specific service within a DynamoGraphDeployment
Appears in:
DynamoGraphDeploymentSpec defines the desired state of DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentStatus defines the observed state of DynamoGraphDeployment.
Appears in:
DynamoModel is the Schema for the dynamo models API
DynamoModelSpec defines the desired state of DynamoModel
Appears in:
DynamoModelStatus defines the observed state of DynamoModel
Appears in:
EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components. EPP is responsible for intelligent endpoint selection and KV-aware routing.
Appears in:
EndpointInfo represents a single endpoint (pod) serving the model
Appears in:
Appears in:
Appears in:
FailoverSpec configures active-passive failover for a worker component. For intraPod mode: requires gpuMemoryService.enabled; the main container is cloned into engine containers (active + standby) within the same pod. For interPod mode: the operator creates a dedicated GMS weight server pod and multiple engine pods per rank that share GPUs via DRA resource claims.
Appears in:
FrontendSidecarSpec configures the auto-generated frontend sidecar container. The operator uses these fields together with built-in frontend defaults (command, probes, ports, and Dynamo env vars) to produce a fully configured sidecar container.
Appears in:
GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.
Appears in:
Underlying type: string
GPUMemoryServiceMode selects the GMS deployment topology.
Appears in:
GPUMemoryServiceSpec configures the GPU Memory Service (GMS) for a worker component.
Appears in:
Appears in:
Appears in:
Underlying type: string
KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.
Validation:
Appears in:
KvTransferPolicy configures topology-aware routing for KV-cache transfers
between prefill and decode workers. This graph-wide policy lives under
spec.experimental while the API is incubating.
Appears in:
ModelReference identifies a model served by this component
Appears in:
ModelSource defines the source location of a model
Appears in:
Appears in:
Appears in:
ProfilingConfigSpec defines configuration for the profiling process. This structure maps directly to the profile_sla.py config format. See dynamo/profiler/utils/profiler_argparse.py for the complete schema.
Appears in:
Appears in:
Resources defines requested and limits for a component, including CPU, memory, GPUs/devices, and any runtime-specific resources.
Appears in:
Appears in:
Underlying type: string
Appears in:
RestartStatus contains the status of the restart of the graph deployment.
Appears in:
Appears in:
Underlying type: string
Appears in:
Underlying type: string
RollingUpdatePhase represents the current phase of a rolling update.
Validation:
Appears in:
RollingUpdateStatus tracks the progress of a rolling update.
Appears in:
ScalingAdapter configures whether a service uses the DynamoGraphDeploymentScalingAdapter for replica management. When enabled, the DGDSA owns the replicas field and external autoscalers (HPA, KEDA, Planner) can control scaling via the Scale subresource.
Appears in:
ServiceCheckpointConfig configures checkpointing for a DGD service
Appears in:
ServiceCheckpointJobConfig customizes the checkpoint Job created for a DGD service.
Appears in:
ServiceCheckpointStatus contains checkpoint information for a single service.
Appears in:
ServiceReplicaStatus contains replica information for a single service.
Appears in:
Appears in:
SpecTopologyConstraint defines deployment-level topology placement requirements. It carries both the topology profile (which ClusterTopology CR to use) and an optional default pack domain that services without their own constraint inherit.
Appears in:
TopologyConstraint defines service-level topology placement requirements. The topology profile is inherited from the deployment-level SpecTopologyConstraint; only the pack domain is specified here.
Appears in:
Underlying type: string
TopologyDomain is a free-form topology level identifier.
Common examples: “region”, “zone”, “datacenter”, “block”, “rack”, “host”, “numa”.
When used with a ClusterTopology CR, domain names are defined in the CR’s
hierarchy; when used with spec.experimental.kvTransferPolicy.labelKey
alone, the value is a user-chosen logical name for the topology level.
Must match ^[a-z0-9]([a-z0-9-]*[a-z0-9])?$ (lowercase alphanumeric,
may contain hyphens but must not start or end with one).
Validation:
^[a-z0-9]([a-z0-9-]*[a-z0-9])?$Appears in:
VolumeMount references a PVC defined at the top level for volumes to be mounted by the component
Appears in:
Package v1beta1 contains API Schema definitions for the nvidia.com v1beta1 API group.
Underlying type: string
BackendType specifies the inference backend.
Validation:
Appears in:
Underlying type: string
CheckpointMode defines how checkpoint creation is handled.
Validation:
Appears in:
CompilationCacheConfig configures a PVC-backed compilation cache for a component. The operator handles backend-specific mount paths and environment variables so users do not need to hand-wire them into the pod template.
Appears in:
ComponentCheckpointConfig configures checkpointing for a DGD component.
Appears in:
ComponentCheckpointJobConfig customizes the checkpoint Job created for a DGD component.
Appears in:
ComponentCheckpointStatus contains checkpoint information for a single component.
Appears in:
Underlying type: string
ComponentKind represents the type of underlying Kubernetes resource backing a DGD component.
Validation:
Appears in:
ComponentReplicaStatus contains replica information for a single component.
Appears in:
Underlying type: string
ComponentType identifies the role of a Dynamo component within a graph.
In v1beta1 this is a strict enum. Unlike v1alpha1 (where subComponentType
was used as a workaround for disaggregated serving), prefill and decode
are first-class values: users can set them directly and downstream consumers
(e.g., the EPP) can filter on the pod label nvidia.com/dynamo-component-type.
Validation:
Appears in:
Underlying type: string
DGDRPhase represents the lifecycle phase of a DynamoGraphDeploymentRequest.
Validation:
Appears in:
Underlying type: string
DGDState is the high-level lifecycle state of a DynamoGraphDeployment.
Validation:
Appears in:
DeploymentInfoStatus tracks the state of the deployed DynamoGraphDeployment.
Appears in:
DynamoCheckpointIdentity defines the inputs that determine checkpoint equivalence.
Two checkpoints with the same identity hash are considered equivalent.
Duplicated from v1alpha1 to keep the v1beta1 type graph self-contained. The
DynamoCheckpoint resource itself is not graduating in this MR; this type is
only used as a sub-field of ComponentCheckpointConfig.
Appears in:
DynamoComponentDeployment is the Schema for the dynamocomponentdeployments API.
v1beta1 is a served version: the API server accepts reads and writes against it, and transparently converts to/from v1alpha1 (still the storage version until a later MR flips it). Conversion goes through the operator’s conversion webhook; see api/v1alpha1/*_conversion.go.
DynamoComponentDeploymentSharedSpec is the shared configuration used by both standalone DCDs and by the components embedded in a DynamoGraphDeployment.
In v1beta1 the ten per-component pod-configuration fields that existed in
v1alpha1 (resources, envs, envFromSecret, livenessProbe, readinessProbe,
volumeMounts, annotations, labels, extraPodMetadata, extraPodSpec) are
replaced with a single podTemplate field holding a native
corev1.PodTemplateSpec. The operator injects its defaults into the
container named "main" and merges user overrides using strategic-merge-by-name
semantics. Users can add sidecars, init containers, and pod-level configuration
directly in podTemplate without any extraPodSpec-style escape hatch.
Appears in:
DynamoComponentDeploymentSpec defines the desired state of a DynamoComponentDeployment.
Appears in:
DynamoGraphDeployment is the Schema for the dynamographdeployments API.
v1beta1 is a served version: the API server accepts reads and writes against it, and transparently converts to/from v1alpha1 (still the storage version until a later MR flips it). Conversion goes through the operator’s conversion webhook; see api/v1alpha1/*_conversion.go.
DynamoGraphDeploymentComponentRef identifies a specific component within a
DynamoGraphDeployment. Renamed from v1alpha1’s DynamoGraphDeploymentServiceRef
to align with the v1beta1 services -> components and
serviceName -> componentName renames.
Appears in:
DynamoGraphDeploymentExperimentalSpec groups graph-level opt-in preview
features whose API shape and behavior may change in breaking ways between
v1beta1 releases. Component-level experimental features live under
spec.components[*].experimental.
Appears in:
DynamoGraphDeploymentRequest is the Schema for the dynamographdeploymentrequests API. It provides a simplified, SLA-driven interface for deploying inference models on Dynamo. Users specify a model and optional performance targets; the controller handles profiling, configuration selection, and deployment.
Lifecycle:
DynamoGraphDeploymentRequestSpec defines the desired state of a DynamoGraphDeploymentRequest. Only the Model field is required; all other fields are optional and have sensible defaults.
Appears in:
DynamoGraphDeploymentRequestStatus represents the observed state of a DynamoGraphDeploymentRequest.
Appears in:
DynamoGraphDeploymentScalingAdapter provides a scaling interface for individual components within a DynamoGraphDeployment. It implements the Kubernetes scale subresource, enabling integration with HPA, KEDA, and custom autoscalers.
The adapter acts as an intermediary between autoscalers and the DGD, ensuring that only the adapter controller modifies the DGD’s component replicas. This prevents conflicts when multiple autoscaling mechanisms are in play.
v1alpha1 remains the storage version; conversion between served versions is handled by the operator’s conversion webhook (see api/v1alpha1/dynamographdeploymentscalingadapter_conversion.go).
DynamoGraphDeploymentScalingAdapterSpec defines the desired state of a DynamoGraphDeploymentScalingAdapter.
Appears in:
DynamoGraphDeploymentScalingAdapterStatus defines the observed state of a DynamoGraphDeploymentScalingAdapter.
Appears in:
DynamoGraphDeploymentSpec defines the desired state of a DynamoGraphDeployment.
Appears in:
DynamoGraphDeploymentStatus defines the observed state of a DynamoGraphDeployment. Unchanged between v1alpha1 and v1beta1.
Appears in:
EPPConfig contains configuration for EPP (Endpoint Picker Plugin) components.
Appears in:
ExperimentalSpec groups opt-in preview features whose API shape and behavior
may change in breaking ways between v1beta1 releases (including disappearing
without a name-preserving graduation path). Fields placed under
experimental are explicitly NOT covered by the normal v1beta1 deprecation
policy and should not be relied on for production workloads. Features
graduate out of this block (and become first-class fields on the shared
spec) once their API is considered stable.
Appears in:
FailoverSpec configures active-passive failover for a worker component.
The main container is cloned into two engine containers (active + standby)
sharing GPUs via DRA, and the standby acquires the flock when the active
engine fails. Failover requires that gpuMemoryService is also set, and that
failover.mode matches gpuMemoryService.mode. Also requires the
nvidia.com/dynamo-kube-discovery-mode: container annotation on the DGD.
See ExperimentalSpec for the stability caveat.
Appears in:
FeaturesSpec controls optional Dynamo platform features in the generated deployment.
Appears in:
GMSClientPodSpec declares an additional GMS client pod for inter-pod GMS.
Appears in:
Underlying type: string
GPUMemoryServiceMode selects the GMS deployment topology.
Appears in:
GPUMemoryServiceSpec configures the GPU Memory Service (GMS) for a
worker component. The operator injects GMS wiring and replaces the main
container’s GPU resources with a DRA ResourceClaim for shared GPU access.
See ExperimentalSpec for the stability caveat.
Appears in:
Underlying type: string
GPUSKUType is the AIC hardware system identifier for a supported GPU.
Validation:
Appears in:
HardwareSpec describes the GPU hardware for profiling and deployment. All fields are auto-detected from cluster GPU nodes when omitted (requires cluster-wide mode with GPU discovery enabled). gpuSku is a selector (restricts which nodes are considered); the other fields are pure overrides passed to the profiler. If all four fields are set, discovery is skipped.
Appears in:
Underlying type: string
KvTransferEnforcement controls how the selected prefill worker’s topology is applied to decode routing.
Validation:
Appears in:
KvTransferPolicy configures topology-aware routing for KV-cache transfers
between prefill and decode workers. This is a graph-wide concern placed
under spec.experimental while the API is incubating.
Appears in:
MockerSpec configures the simulated (mocker) backend.
Appears in:
ModelCacheSpec references a PVC containing pre-downloaded model weights.
Appears in:
ModelReference identifies a model served by a component. When specified, a headless service is created for endpoint discovery.
Appears in:
MultinodeSpec configures a multinode component.
Appears in:
Underlying type: string
OptimizationType defines the optimization target for SLA-based profiling.
Validation:
Appears in:
OverridesSpec allows customizing the profiling job and the generated DynamoGraphDeployment.
Appears in:
ParetoConfig represents a single Pareto-optimal deployment configuration discovered during profiling.
Appears in:
Underlying type: string
ProfilingPhase represents a sub-phase within the profiling pipeline. When the DGDR Phase is “Profiling”, this value indicates which step of the profiling pipeline is currently executing.
Validation:
Appears in:
ProfilingResultsStatus contains the output of the profiling process.
Appears in:
Restart specifies the restart policy for a graph deployment.
Appears in:
Underlying type: string
RestartPhase enumerates phases of a graph-level restart.
Appears in:
RestartStatus contains the status of a graph-level restart.
Appears in:
RestartStrategy defines how components are restarted.
Appears in:
Underlying type: string
RestartStrategyType enumerates restart strategies.
Appears in:
Underlying type: string
RollingUpdatePhase represents the current phase of a rolling update.
Validation:
Appears in:
RollingUpdateStatus tracks the progress of an operator-managed rolling update.
Appears in:
SLASpec defines the service-level agreement targets for profiling optimization.
Appears in:
ScalingAdapter opts a component into using the DynamoGraphDeploymentScalingAdapter
(DGDSA). When scalingAdapter is set on a component (even as an empty
object, scalingAdapter: {}), the DGDSA is created and owns the replicas
field so that external autoscalers (HPA/KEDA/Planner) can drive scaling via
the Scale subresource. Omitting the field opts the component out.
Appears in:
Underlying type: string
SearchStrategy controls the profiling search depth.
Validation:
Appears in:
SpecTopologyConstraint defines deployment-level topology placement requirements.
Appears in:
TopologyConstraint defines component-level topology placement requirements.
The topology profile is inherited from the deployment-level
SpecTopologyConstraint.
Appears in:
Underlying type: string
TopologyDomain is a free-form topology level identifier.
Common examples: “region”, “zone”, “datacenter”, “block”, “rack”, “host”, “numa”.
When used with a ClusterTopology CR, domain names are defined in the CR’s
hierarchy; when used with spec.experimental.kvTransferPolicy.labelKey
alone, the value is a user-chosen logical name for the topology level.
Validation:
^[a-z0-9]([a-z0-9-]*[a-z0-9])?$Appears in:
WorkloadSpec defines the workload characteristics for SLA-based profiling.
Appears in:
Underlying type: string
CertProvisionMode controls how webhook TLS certificates are managed.
Appears in:
CheckpointConfiguration holds checkpoint/restore settings.
Appears in:
Deprecated: CheckpointOCIConfig is retained for compatibility and ignored by the current snapshot flow.
Appears in:
CheckpointPVCConfig configures the namespace-local PVC mounted into checkpoint and restore workload pods.
Appears in:
Deprecated: CheckpointS3Config is retained for compatibility and ignored by the current snapshot flow.
Appears in:
CheckpointSeccompConfiguration controls the localhost seccomp profile applied to checkpoint and restore pods. The profile blocks io_uring syscalls (which CRIU cannot dump). Default behavior (zero-value substruct, or absent substruct) applies DefaultSeccompProfile. Set Disabled=true on OpenShift (custom localhost profiles require privileged SCC) or when using a CRIU build with io_uring support. Set Profile to override the default path.
Appears in:
CheckpointStorageConfiguration configures checkpoint storage for operator pod mutations. Only PVC storage is implemented today.
Appears in:
DRAConfiguration holds Dynamic Resource Allocation (resource.k8s.io/v1) settings.
NOTE: auto-detection here only verifies that the resource.k8s.io/v1 API is
registered on the apiserver (Kubernetes 1.34+). It does NOT verify that a
GPU-specific DRA resource driver (e.g. nvidia/k8s-dra-driver-gpu) is
installed, that its DeviceClass exists, or that node-level GPU drivers are
compatible. An admin can use enabled: false to force-off DRA integration
on clusters where the API is present but the GPU driver stack is not wired
up — this makes the operator fail GMS / inter-pod failover admissions early
with a clear error instead of letting pods Pend with a confusing
“resourceclaim not found” at schedule time.
Appears in:
Underlying type: string
DiscoveryBackend is the type for the discovery backend.
Appears in:
DiscoveryConfiguration holds discovery backend settings.
Appears in:
GPUConfiguration holds GPU discovery settings.
Appears in:
GroveConfiguration holds Grove orchestrator settings.
Appears in:
InfrastructureConfiguration holds service mesh and backend addresses.
Appears in:
IngressConfiguration holds ingress settings.
Appears in:
IstioMeshConfiguration holds Istio-specific mesh settings.
Appears in:
KaiSchedulerConfiguration holds Kai-scheduler settings.
Appears in:
LWSConfiguration holds LWS orchestrator settings.
Appears in:
LeaderElectionConfiguration holds leader election settings.
Appears in:
LoggingConfiguration holds logging settings.
Appears in:
MPIConfiguration holds MPI SSH secret settings.
Appears in:
MetricsServer extends Server with secure serving option.
Appears in:
NamespaceConfiguration determines operator namespace mode.
Appears in:
Deprecated: NamespaceScopeConfiguration is used only by the deprecated namespace-restricted mode and will be removed in a future release.
Appears in:
OperatorConfiguration is the Schema for the operator configuration.
OrchestratorConfiguration holds orchestrator override settings.
Appears in:
RBACConfiguration holds RBAC settings for cluster-wide mode.
Appears in:
SecurityConfiguration holds HTTP/2 and TLS settings.
Appears in:
Server holds a bind address and port.
Appears in:
ServerConfiguration holds server bind addresses and ports.
Appears in:
ServiceMeshConfiguration holds service mesh integration settings. The operator uses this to generate mesh-specific resources (e.g., Istio DestinationRules) for EPP components so that sidecar proxies connect correctly without double-TLS issues.
Appears in:
WebhookServer extends Server with host and certificate directory.
Appears in:
The Dynamo operator automatically applies default values to various fields when they are not explicitly specified in your deployments. These defaults include:
Health Probes: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
Security Context: All components receive fsGroup: 1000 by default to ensure proper file permissions for mounted volumes. This can be overridden via the extraPodSpec.securityContext field.
Shared Memory: All components receive an 8Gi shared memory volume mounted at /dev/shm by default (can be disabled or resized via the sharedMemory field).
Environment Variables: Components automatically receive environment variables like DYN_NAMESPACE, DYN_PARENT_DGD_K8S_NAME, DYNAMO_PORT, and backend-specific variables.
Pod Configuration: Default terminationGracePeriodSeconds of 60 seconds and restartPolicy: Always.
Autoscaling: When enabled without explicit metrics, defaults to CPU-based autoscaling with 80% target utilization.
Backend-Specific Behavior: For multinode deployments, probes are automatically modified or removed for worker nodes depending on the backend framework (VLLM, SGLang, or TensorRT-LLM).
All components receive the following pod-level defaults unless overridden:
terminationGracePeriodSeconds: 60 secondsrestartPolicy: AlwaysThe operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
fsGroup: 1000 - Sets the group ownership of mounted volumes and any files created in those volumesThis default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The fsGroup setting is particularly important for:
To override the default security context, specify your own securityContext in the extraPodSpec of your component:
Important: When you provide any securityContext object in extraPodSpec, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting runAsNonRoot or setting it to false).
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift’s admission controllers to assign them dynamically:
Alternatively, if you want to keep the default fsGroup: 1000 behavior and are certain your cluster allows it, you don’t need to specify anything - the operator defaults will work.
Shared memory is enabled by default for all components:
true (unless explicitly disabled via sharedMemory.disabled)8Gi/dev/shmemptyDir with memory mediumTo disable shared memory or customize the size, use the sharedMemory field in your component specification.
The operator applies different default health probes based on the component type.
Frontend components receive the following probe configurations:
Liveness Probe:
/healthhttp (8000)Readiness Probe:
curl -s http://localhost:${DYNAMO_PORT}/health | jq -e ".status == \"healthy\""Worker components receive the following probe configurations:
Liveness Probe:
/livesystem (9090)Readiness Probe:
/healthsystem (9090)Startup Probe:
/livesystem (9090):::{note}
For larger models (typically >70B parameters) or slower storage systems, you may need to increase the failureThreshold to allow more time for model loading. Calculate the required threshold based on your expected startup time: failureThreshold = (expected_startup_seconds / period). Override the startup probe in your component specification if the default 2-hour window is insufficient.
:::
For multinode deployments, the operator modifies probes based on the backend framework and node role:
The operator automatically selects between two deployment modes based on parallelism configuration:
Tensor/Pipeline Parallel Mode (when world_size > GPUs_per_node):
--distributed-executor-backend ray)Data Parallel Mode (when world_size × data_parallel_size > GPUs_per_node):
The operator automatically injects environment variables into component containers based on component type, backend framework, and operator configuration. User-provided envs values always take precedence over operator defaults.
These environment variables are injected into every component container regardless of type.
These are injected into all components when the corresponding infrastructure service is configured in the operator’s OperatorConfiguration.
The following component types automatically receive dedicated service accounts:
planner-serviceaccountepp-serviceaccountThe operator automatically discovers and injects image pull secrets for container images. When a component specifies a container image, the operator:
kubernetes.io/dockerconfigjson in the component’s namespaceimagePullSecrets in the pod specificationThis eliminates the need to manually specify image pull secrets for each component. The operator maintains an internal index of docker secrets and their associated registries, refreshing this index periodically.
To disable automatic image pull secret discovery for a specific component, add the following annotation:
When autoscaling is enabled but no metrics are specified, the operator applies:
80%Default container ports are configured based on component type:
httpsystemnixlmetricsgrpcgrpc-healthmetricsOMPI_MCA_orte_keep_fqdn_hostnames=1For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
internal/dynamo/graph.go - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurationsinternal/dynamo/component_common.go - Base container and pod spec shared by all component typesinternal/dynamo/component_frontend.gointernal/dynamo/component_worker.gointernal/dynamo/component_planner.gointernal/dynamo/component_epp.gointernal/secrets/docker.go - Implements the docker secret indexer and automatic discoveryinternal/checkpoint/podspec.go - Checkpoint env var injection and volume setupinternal/checkpoint/resolve.go - Checkpoint resolution logicinternal/checkpoint/resource.go - Checkpoint resource managementinternal/consts/consts.go - Defines annotation keys and other constantslivenessProbe, readinessProbe, or startupProbe fields) take precedence over operator defaultssecurityContext in extraPodSpec, no defaults will be injected, giving you full controlextraPodSpec.mainContainer field can be used to override probe configurations set by the operator