Component Catalog
AICR recipes are composed of components — the individual software packages that make up a GPU-accelerated Kubernetes runtime. This page lists every component that can appear in a recipe.
Note: Components are included as appropriate in recipes. Not every component listed here will appear in a recipe.
The source of truth is recipes/registry.yaml. Each entry in the registry defines the component’s Helm chart (or Kustomize source), default version, namespace, and node scheduling configuration. If a component is not listed there, it cannot appear in a recipe.
Components
How Components Are Selected
Not every component appears in every recipe. The recipe engine selects components based on the overlay chain for your environment:
- Base components (cert-manager, kube-prometheus-stack) appear in most recipes.
- Cloud-specific components (aws-efa, aws-ebs-csi-driver) are added when the service matches.
- Intent-specific components (agentgateway, agentgateway-crds) are added based on workload intent (e.g., inference recipes include the inference gateway).
- Platform-specific components (slinky-slurm-operator, slinky-slurm, kubeflow-trainer, dynamo-platform) are added when the recipe selects a matching
--platform. For--platform slurm, all three Slinky pieces (slinky-slurm-operator-crds,slinky-slurm-operator,slinky-slurm) are declared inline per slurm leaf overlay — the same shapedynamo-platformuses across*-inference-dynamoleaves. Leaves that want the operator only inline the CRDs + operator and omit theslinky-slurmcomponentRef. - Accelerator/OS-specific tuning (nodewright-customizations, nvidia-dra-driver-gpu) varies by hardware and OS combination.
NFD Topology Updater
Production GPU leaf recipes (H100, GB200, RTX Pro 6000 on EKS / AKS / GKE / OKE / LKE) enable the NFD Topology Updater. It publishes per-node NodeResourceTopology CRDs that describe NUMA zones, GPU-to-NUMA affinity, and NIC-to-NUMA affinity. Runtime consumers (NUMA-aware schedulers, debugging via kubectl get noderesourcetopologies) can read these CRDs without further configuration.
The Topology Updater requires the kubelet podResources gRPC socket. The KubeletPodResources feature gate has been on by default since Kubernetes 1.15 (Beta) and reached GA in Kubernetes 1.28; AICR’s recipe constraints on the affected leaves require K8s ≥ 1.30 or higher, so this is satisfied in practice. Recipes targeting Kubernetes < 1.15 must enable the feature gate explicitly. Kind / KWOK simulated clusters do not run a real kubelet and therefore leave the Topology Updater disabled — kind-based recipes will not see NodeResourceTopology CRDs.
See the upstream Topology Updater docs for runtime consumer examples.
To see exactly which components appear in a given recipe, generate one:
The output lists every component with its pinned version and configuration values.
Adding Components
New components are added declaratively in recipes/registry.yaml — no Go code required. See the Contributing Guide and Bundler Development docs for details.
Upgrade Notes
Migration steps when upgrading from a prior AICR-generated bundle to a newer one that changes how a component delivers its Kubernetes resources.
A generated recipe is a point-in-time artifact of the AICR binary that produced it: the embedded registry, overlays, manifest paths, and chart pins are part of that binary’s surface. When upgrading AICR, regenerate the recipe from scratch with the new binary (aicr recipe ...) before re-bundling. aicr bundle --recipe <old-file> against a newer binary may fail if the saved recipe references manifest paths the new release has moved or removed (see Bundle Generation Fails for the specific error).
gpu-operator: dcgm-exporter ConfigMap moved into the main release
Earlier bundles shipped the dcgm-exporter ConfigMap as a post-manifest in a separate Helm release named gpu-operator-post. The in-cluster ConfigMap therefore carries ownership annotations pointing at that release:
Newer bundles render the ConfigMap directly from the main gpu-operator chart’s dcgmExporter.config.data values. On upgrade, Helm 3 refuses to claim the existing ConfigMap because its annotations point at a different release:
Fresh installs are not affected. To migrate an existing cluster, remove the stale gpu-operator-post release before applying the new bundle.
Raw Helm (per-component bundle / deploy.sh):
helm uninstall removes the ConfigMap it owns; the next gpu-operator upgrade re-creates it from values.
Helmfile — the new bundle no longer references gpu-operator-post, so helmfile apply will not prune it on its own. Run the helm uninstall above first, then helmfile apply.
Argo CD — delete the stale Application (it will not self-prune unless an ApplicationSet was managing it), then sync the updated gpu-operator application:
Flux — delete the stale HelmRelease so Flux uninstalls the release and removes the ConfigMap, then reconcile the updated gpu-operator HelmRelease. The example below assumes the Flux control plane runs in flux-system; substitute the namespace where your Flux installation lives:
After migration, confirm the ConfigMap is owned by the gpu-operator release: