Recipe Development Guide

View as Markdown

This guide covers how to create, modify, and validate recipe metadata.

Quick Start: Contributing a Recipe

New to recipe development? Follow these minimal steps to contribute:

1. Copy an existing overlay (details)

$cp recipes/overlays/h100-eks-ubuntu-training.yaml recipes/overlays/gb200-eks-ubuntu-training.yaml

2. Edit criteria and components (criteria, components)

1# recipes/overlays/gb200-eks-ubuntu-training.yaml
2spec:
3 base: eks-training # Inherit from intermediate recipe
4 criteria:
5 service: eks
6 accelerator: gb200 # Changed from h100
7 os: ubuntu
8 intent: training
9 componentRefs:
10 - name: gpu-operator
11 version: v26.3.2
12 valuesFile: components/gpu-operator/eks-gb200-training.yaml
13 overrides:
14 driver:
15 version: "580.82.07" # GB200-specific driver

3. Run tests (details)

$make test # Validates schema, criteria, references, constraints
$make qualify # Includes end-to-end tests before submitting

4. Open PR (best practices)

  • Include test output showing recipe generation works
  • Explain why the recipe is needed (new hardware, workload, platform)

Overview

Recipe metadata files define component configurations for GPU-accelerated Kubernetes deployments using a base-plus-overlay architecture with three composition mechanisms — single-parent inheritance, explicit mixin composition, and criteria-wildcard matching:

  • Base values (overlays/base.yaml) - universal defaults
  • Intermediate recipes (eks.yaml, eks-training.yaml) - shared configurations for categories
  • Leaf recipes (gb200-eks-ubuntu-training.yaml) - hardware/workload-specific overrides
  • Mixins (mixins/*.yaml) - composable fragments (OS constraints, platform components) that leaf overlays reference via spec.mixins instead of duplicating content
  • Criteria-wildcard overlays (gb200-any.yaml) - cross-cutting overlays picked up automatically by the resolver when their wildcard criteria match the query, without being referenced via spec.base or spec.mixins
  • Inline overrides - per-recipe customization without new files

Recipe files in recipes/ are embedded at compile time. Integrators can extend or override using the --data flag (see Advanced Topics).

For query matching and overlay merging internals, see Data Architecture.

Recipe Structure

Multi-Level Inheritance

Recipes use spec.base to inherit configurations. Chains progress from general (base) to specific (leaf):

base.yaml → eks.yaml → eks-training.yaml → gb200-eks-ubuntu-training.yaml

Intermediate recipes (partial criteria) capture shared configs:

1# eks-training.yaml
2spec:
3 base: eks
4 criteria:
5 service: eks
6 intent: training # Partial - no accelerator/OS
7 componentRefs:
8 - name: gpu-operator
9 valuesFile: components/gpu-operator/values-eks-training.yaml

Leaf recipes (complete criteria) match user queries:

1# gb200-eks-ubuntu-training.yaml
2spec:
3 base: eks-training # Inherits from intermediate
4 criteria:
5 service: eks
6 accelerator: gb200
7 os: ubuntu
8 intent: training # Complete
9 componentRefs:
10 - name: gpu-operator
11 overrides:
12 driver:
13 version: "580.82.07" # Hardware-specific override

Leaf recipes with mixins compose shared fragments:

1# h100-eks-ubuntu-training-kubeflow.yaml
2spec:
3 base: h100-eks-ubuntu-training
4 mixins:
5 - os-ubuntu # Shared Ubuntu constraints (from recipes/mixins/)
6 - platform-kubeflow # Kubeflow trainer component (from recipes/mixins/)
7 criteria:
8 service: eks
9 accelerator: h100
10 os: ubuntu
11 intent: training
12 platform: kubeflow

Mixins use kind: RecipeMixin and carry only constraints and componentRefs. They live in recipes/mixins/ and are applied after inheritance chain merging. See Data Architecture for details.

Some platforms declare their full component stack inline per leaf overlay rather than via a platform mixin. This is the case for --platform slurm and --platform dynamo, where each leaf carries hardware-specific tuning (GPU GRES strings, accelerator resource limits) that the mixin merge path cannot represent cleanly. Other platforms like --platform kubeflow and --platform inference still use the platform-kubeflow / platform-inference mixins shown above, since their leaf-specific tuning is minimal.

For example, --platform slurm leaves inline three componentRefs:

  • slinky-slurm-operator-crds — SchedMD Slinky CRDs
  • slinky-slurm-operator — the operator and admission webhook
  • slinky-slurm — the Slinky-managed Slurm cluster instance (Controller / LoginSet / NodeSet / RestApi), with leaf-specific overrides (e.g. H100 GRES wiring on the nodesets.slinky map)

This is the same shape dynamo-platform uses across the *-inference-dynamo leaves. See recipes/overlays/h100-eks-ubuntu-training-slurm.yaml for the full example.

When authoring a recipe targeting Talos (criteria.os: talos), append the os-talos mixin to your overlay’s spec.mixins list (e.g. spec.mixins: [os-talos], or [platform-kubeflow, os-talos] if you already mix in a non-OS fragment). OS-scoped mixins are mutually exclusive — combining os-ubuntu and os-talos in one overlay is a recipe authoring error, not a supported composition. The mixin overrides namespaces for affected components and supplies PSA-privileged Namespace manifests via componentRefs[].preManifestFiles, which are applied before each chart — see Talos integration for the component list and labels.

Cross-cutting overlays with wildcard criteria apply across one criteria dimension without being referenced via spec.base or listed in spec.mixins. The resolver can return multiple independent maximal-leaf overlays for a single query, so a service: any overlay is picked up alongside the service-specific maximal leaf and its inheritance chain:

1# gb200-any.yaml — applies to every GB200 query (any service, any intent)
2spec:
3 base: base
4 criteria:
5 service: any # Wildcard — matches eks, oke, gke, etc.
6 accelerator: gb200
7 validation:
8 deployment:
9 checks:
10 - operator-health
11 - expected-resources
12 - gpu-operator-version
13 - check-nvidia-smi
14 constraints:
15 - name: Deployment.gpu-operator.version
16 value: ">= v25.10.0"

Only use this pattern when the content is truly uniform across the wildcard dimension — if values diverge per service, keep them inline in each service-specific overlay. NCCL performance thresholds, for example, are explicitly not a good fit for this pattern: each service has a different network fabric (EFA, TCPXO, RoCE, etc.) and the same bandwidth number is rarely correct across two fabrics. The intent-scoped gb200-any-training.yaml shape that previously carried a cross-service NCCL threshold was retired in #1052 in favor of per-leaf performance blocks. See Data Architecture for when to use wildcard overlays vs mixins.

Merge order: base.yaml (lowest) → intermediate → leaf → mixins (highest)

Merge rules:

  • Constraints: same-named overridden, new added
  • ComponentRefs: same-named merged field-by-field, new added
  • validation.<phase> blocks merge per-field: checks and constraints union and deduplicate when non-empty (constraints by name, overlay wins on same-name); an explicit empty list (checks: [] / constraints: []) clears the inherited list, while an omitted/null field inherits it; nodeSelection replaced wholesale when set; timeout/infrastructure overlay-wins-if-non-empty
  • Criteria: not inherited (each recipe defines its own)
  • Mixin constraints/components must not conflict with the inheritance chain or other mixins

Inference performance constraints

The inference-perf performance check reads named entries from validation.performance.constraints. Two are pass/fail thresholds (comparator values, 10% tolerance applied by the evaluator) and the rest are optional inputs that tune the benchmark per accelerator (bare values, no comparator):

1validation:
2 performance:
3 checks: [inference-perf]
4 constraints:
5 - name: inference-throughput # >= only; output tokens/sec
6 value: ">= 50000"
7 - name: inference-ttft-p99 # <= only; TTFT p99 in ms
8 value: "<= 2000"
9 - name: inference-model # optional; HF model ID
10 value: Qwen/Qwen3-8B
11 - name: inference-concurrency-per-gpu # optional; positive integer
12 value: "256"
13 - name: inference-routing-mode # optional; dynamo-router or gateway-epp
14 value: dynamo-router

inference-model and inference-concurrency-per-gpu resolve with precedence recipe constraint > AICR_INFERENCE_PERF_* catalog env > compiled default (Qwen3-8B at 256/GPU). Set them per overlay to pick the right model and load for each accelerator — exactly as the throughput/TTFT thresholds already vary per overlay — while the compiled defaults cover overlays that omit them. Because the thresholds are only meaningful at a specific model + concurrency, pin all four together in an overlay rather than relying on the global defaults for the inputs. inference-routing-mode resolves from the recipe only, defaulting to dynamo-router; set gateway-epp to validate the GAIE/EPP path through the AICR-managed inference gateway.

Component Types

Helm components (most common):

1componentRefs:
2 - name: gpu-operator
3 type: Helm
4 version: v26.3.2
5 valuesFile: components/gpu-operator/values.yaml
6 overrides:
7 driver:
8 version: "580.82.07"

Kustomize components

1componentRefs:
2 - name: my-app
3 type: Kustomize
4 source: https://github.com/example/my-app
5 tag: v1.0.0
6 path: deploy/production

A component must have either helm OR kustomize configuration, not both.

Component Configuration

Configuration Patterns

Pattern 1: ValuesFile only (large, reusable configs)

1componentRefs:
2 - name: cert-manager
3 valuesFile: components/cert-manager/eks-values.yaml

Pattern 2: Overrides only (small, recipe-specific configs)

1componentRefs:
2 - name: nvsentinel
3 overrides:
4 namespace: nvsentinel
5 sentinel:
6 enabled: true

Pattern 3: Hybrid (shared base + recipe tweaks)

1componentRefs:
2 - name: gpu-operator
3 valuesFile: components/gpu-operator/eks-gb200-training.yaml
4 overrides:
5 driver:
6 version: "580.82.07" # Override just this field

Value Merge Precedence

Values merge from lowest to highest precedence:

Base → ValuesFile → Overrides → CLI --set flags

Deep merge: only specified fields replaced, unspecified preserved. Arrays replaced entirely (not element-by-element).

Example:

1# Base: driver.version="550.54.15", driver.repository="nvcr.io/nvidia"
2# ValuesFile: driver.version="570.86.16"
3# Override: driver.version="580.13.01"
4# Result: driver.version="580.13.01", driver.repository="nvcr.io/nvidia" (preserved)

File Naming Conventions

File names are for human readability—matching uses spec.criteria, not file names.

Overlay naming: {accelerator}-{service}-{os}-{intent}-{platform}.yaml (platform always last)

File TypePatternExample
Service{service}.yamleks.yaml
Service + intent{service}-{intent}.yamleks-training.yaml
Full criteria{accel}-{service}-{os}-{intent}.yamlgb200-eks-ubuntu-training.yaml
+ platform{accel}-{service}-{os}-{intent}-{platform}.yamlgb200-eks-ubuntu-training-kubeflow.yaml
Mixin (OS)os-{os}.yamlos-ubuntu.yaml
Mixin (platform)platform-{platform}.yamlplatform-kubeflow.yaml
Component valuesvalues-{service}-{intent}.yamlvalues-eks-training.yaml

Constraints and Validation

Constraints

Constraints validate deployment requirements against cluster snapshots:

1constraints:
2 - name: K8s.server.version
3 value: ">= 1.32.4"
4 - name: OS.release.ID
5 value: ubuntu
6 - name: OS.release.VERSION_ID
7 value: "24.04"

Common measurement paths

PathExample
K8s.server.version1.32.4
OS.release.IDubuntu, rhel
OS.release.VERSION_ID24.04
GPU.hardware.modelh100, l40s

Operators: >=, <=, >, <, ==, !=, or exact match (no operator)

Add constraints when: recipe needs specific K8s features, driver versions, OS capabilities, or hardware. Skip when universal or redundant with component self-checks.

Validation Phases

Optional multi-phase validation beyond basic constraints:

1# expectedResources are declared on componentRefs, not under validation
2componentRefs:
3 - name: gpu-operator
4 type: Helm
5 expectedResources:
6 - kind: Deployment
7 name: gpu-operator
8 namespace: gpu-operator
9 - kind: DaemonSet
10 name: nvidia-driver-daemonset
11 namespace: gpu-operator
12
13validation:
14 # Readiness phase has no checks — constraints are evaluated inline from snapshot.
15 deployment:
16 checks: [expected-resources]
17 performance:
18 infrastructure: nccl-doctor
19 checks: [nccl-bandwidth-test]

Phases: deployment, performance, conformance (readiness constraints are evaluated implicitly)

Testing

$# Validate constraints
$aicr validate --recipe recipe.yaml --snapshot snapshot.yaml
$
$# Phase-specific
$aicr validate --recipe recipe.yaml --snapshot snapshot.yaml --phase deployment
$
$# Run validation tests
$go test -v ./pkg/recipe/... -run TestConstraintPathsUseValidMeasurementTypes

Working with Recipes

Adding a New Recipe

When: new platform, hardware, workload type, or combined criteria

Steps:

  1. Create overlay in recipes/overlays/ with criteria and componentRefs
  2. If the recipe shares OS constraints or platform components with other overlays, reference existing mixins via spec.mixins instead of duplicating (or create new mixins in recipes/mixins/)
  3. Create component values files if using valuesFile
  4. Run tests: make test
  5. Test generation: aicr recipe --service eks --accelerator gb200 --format yaml

Example:

1# recipes/overlays/gb200-eks-ubuntu-training.yaml
2apiVersion: aicr.nvidia.com/v1alpha1
3kind: RecipeMetadata
4metadata:
5 name: gb200-eks-ubuntu-training
6spec:
7 base: eks-training
8 criteria:
9 service: eks
10 accelerator: gb200
11 os: ubuntu
12 intent: training
13 componentRefs:
14 - name: gpu-operator
15 version: v26.3.2
16 valuesFile: components/gpu-operator/eks-gb200-training.yaml

Updating Recipes

Updating versions:

1# Update component version
2componentRefs:
3 - name: gpu-operator
4 version: v26.3.2 # Changed from v26.3.1

Adding components:

1componentRefs:
2 - name: new-component
3 version: v1.0.0
4 valuesFile: components/new-component/values.yaml
5 dependencyRefs: [existing-component] # Optional

Test changes: aicr recipe --service eks --accelerator gb200 --format yaml

Adding a Component Readiness Gate

A component can declare a readiness gate so that, when a bundle is built with aicr bundle --readiness-hooks, the deploy blocks until a component-specific signal is actually healthy — not just until the chart’s own resources report Ready. This matters for operators whose true readiness lives in a custom resource the deployer can’t assess natively (e.g. gpu-operator’s ClusterPolicy reaching status.state: ready).

Convention: drop a Chainsaw Test at recipes/components/<name>/readiness.yaml. There is no registry field to set — the bundler discovers the file by path. Components without one are simply not gated.

1# recipes/components/gpu-operator/readiness.yaml
2apiVersion: chainsaw.kyverno.io/v1alpha1
3kind: Test
4metadata:
5 name: gpu-operator-readiness
6spec:
7 # The gate CLI owns the outer retry/poll loop, so a single assert only needs
8 # a short window to confirm the current state.
9 timeouts:
10 assert: 30s
11 steps:
12 - name: clusterpolicy-ready
13 try:
14 - assert:
15 resource:
16 apiVersion: nvidia.com/v1
17 kind: ClusterPolicy
18 status:
19 state: ready

When --readiness-hooks is set, the bundler wraps this test into a NNN-<name>-readiness/ folder containing a Job that runs the gate CLI (ghcr.io/nvidia/aicr-gate, which embeds Chainsaw). The deploy blocks on that Job — via helm --wait for the helm deployer (the gate Job is a post-install,post-upgrade hook, and --wait blocks on hook completion regardless of --wait-for-jobs), or via Argo CD’s built-in batch/Job health on the next sync-wave for the argocd/argocd-helm deployers. Keep spec.timeouts.assert shorter than the gate’s per-test timeout so a single poll can’t outlast one gate iteration. See Readiness Gates for the deploy-time behavior.

Best Practices

Do:

  • Use minimum criteria fields needed for matching
  • Keep base recipe universal and conservative
  • Use mixins for shared OS constraints or platform components instead of duplicating across leaf overlays
  • Always explain why settings exist (1-2 sentences)
  • Follow naming conventions ({accel}-{service}-{os}-{intent}-{platform})
  • Run make test before committing
  • Test recipe generation after changes

Don’t:

  • Add environment-specific settings to base
  • Over-specify criteria (too narrow = fewer matches)
  • Create duplicate criteria combinations
  • Duplicate OS or platform content across leaf overlays (use mixins instead)
  • Skip validation tests
  • Forget to update context when values change

Testing and Validation

Automated Tests

Tests in pkg/recipe/yaml_test.go validate:

  • Schema conformance (YAML structure)
  • Criteria enum values (service, accelerator, intent, OS, platform)
  • File references (valuesFile, dependencyRefs)
  • Constraint syntax (measurement paths, operators)
  • No duplicate criteria
  • Merge consistency
  • No dependency cycles

Running Tests

$make test # All tests
$go test -v ./pkg/recipe/... # Recipe tests only
$go test -v ./pkg/recipe/... -run TestAllMetadataFilesConformToSchema # Specific test

Test Workflow

  1. Create recipe file in recipes/
  2. Run make test to validate
  3. Test generation: aicr recipe --service eks --accelerator gb200 --format yaml
  4. Inspect bundle: aicr bundle -r recipe.yaml -o ./test-bundles

Tests run automatically on PRs, main pushes, and release builds.

Advanced Topics

External Data Sources

Integrators can extend or override embedded recipe data using the --data flag without modifying the OSS codebase. This enables:

  • Custom recipes for proprietary hardware
  • Private component values with organization-specific settings
  • Extended registries with internal Helm charts
  • Rapid iteration without rebuilding binaries
  • New criteria values (service / accelerator / OS / intent / platform) admitted at runtime via the catalog-driven criteria registry — no rebuild required

See Data Extension for the full walkthrough (folder layout, registry rules, strict mode, debugging). The summary below is for quick reference.

Directory structure

./my-data/
├── registry.yaml # Extends/overrides component registry
├── overlays/
│ └── custom-recipe.yaml # New or override existing recipe
├── mixins/
│ └── os-custom.yaml # Custom mixin fragments
└── components/
└── my-operator/
└── values.yaml # Component values

Usage:

$# Recipe generation
$aicr recipe --service eks --accelerator gb200 --data ./my-data --output recipe.yaml
$
$# Bundle generation
$aicr bundle --recipe recipe.yaml --data ./my-data --deployer argocd --output ./bundle
$
$# Debug loading
$aicr --debug recipe --service eks --data ./my-data

Precedence: Embedded data (lowest) → External data (highest)

Behavior:

  • Overlays: Same metadata.name replaces embedded
  • Registry: Merged; same-named components replaced
  • Values: External valuesFile references take precedence
  • Criteria values: External overlays’ spec.criteria values become valid CLI / API inputs at runtime via the criteria registry; --criteria-strict (or AICR_CRITERIA_STRICT=1) rejects external-only values for OSS CI gates

Validation:

$aicr --debug recipe --service eks --data ./my-data --dry-run
$aicr recipe --service eks --data ./my-data --output /dev/stdout

Regional registry overrides

A handful of components ship images from regional, account-scoped container registries rather than a single public URI. The clearest example today is the AWS EFA device plugin, whose canonical home is <account>.dkr.ecr.<region>.amazonaws.com/eks/aws-efa-k8s-device-plugin — a per-region private ECR that every EKS node is auto-authorized to pull from. AWS publishes these add-ons regionally for three reasons: pulls go over the AWS internal backbone (no NAT egress), no Docker Hub / public-registry rate limits, and the image stays available even when the public internet or another region is degraded.

AICR ships a sensible default for each such image (e.g., us-west-2 for aws-efa), but customers deploying in a different region need to override the registry’s region segment. Two override paths cover the common cases:

Bundle-time override (single region per bundle). Use --set to bake a specific region into the bundle:

$aicr bundle --recipe recipe.yaml \
> --set awsefa:image.repository=602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-efa-k8s-device-plugin \
> -o ./bundle

Install-time override (one bundle, many regions). Use --dynamic to declare the path as install-time-fillable, then provide the value via helm install --set (or your GitOps tool):

$aicr bundle --recipe recipe.yaml \
> --dynamic awsefa:image.repository \
> --deployer helm \
> -o ./bundle
$
$# Per-cluster install
$helm install ... --set image.repository=602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/aws-efa-k8s-device-plugin

--dynamic is supported with helm, argocd-helm, and flux deployers; argocd does not support it (use argocd-helm instead). See Dynamic Install-Time Values for the broader pattern.

Partition-aware variants. Standard AWS uses account ID 602401143452. GovCloud and China use different accounts and URI suffixes:

PartitionAccount IDURI shape
aws (standard)602401143452<account>.dkr.ecr.<region>.amazonaws.com
aws-us-gov (GovCloud)013241004608<account>.dkr.ecr.<region>.amazonaws.com
aws-cn (China)961992271922<account>.dkr.ecr.<region>.amazonaws.com.cn

Substitute the appropriate account and suffix in the --set / install-time value.

Troubleshooting

Debug overlay matching:

$aicr recipe --service eks --accelerator gb200 --format json | jq '.metadata.appliedOverlays'
$aicr recipe --service eks --accelerator gb200 --format json | jq '.componentRefs[].version'

Common issues:

IssueSolution
Test: “duplicate criteria”Combine overlays or differentiate criteria
Test: “valuesFile not found”Create file or fix path in recipe
Test: “unknown component”Use registered bundler name
Recipe returns emptyCheck criteria fields match query
Wrong values in bundleVerify merge precedence (base → valuesFile → overrides)

Validation:

$make qualify # Full qualification
$make test # All tests
$aicr recipe --service eks --accelerator gb200 --format yaml # Test generation

Submitting Your Recipe

Recipes that target hardware AICR maintainers cannot independently re-run require an evidence bundle so a reviewer can verify the recipe without owning the hardware. The bundle is a signed, OCI-distributed artifact that captures the resolved recipe, the cluster snapshot, the validator phase results, a CycloneDX BOM, and a manifest of per-file hashes. It is produced by adding two flags to the same aicr validate invocation you already use to check the recipe — no separate build step.

When You Need Evidence

You need an evidence bundle when your PR adds or changes a recipe whose criteria reach hardware or a service that AICR maintainers cannot independently re-run — most non-H100 GPUs, non-EKS services, and specialty fabrics fall into this bucket. The recipe-evidence CI gate posts a sticky Markdown comment on every PR touching recipes/** and fails closed when a touched recipe has no matching recipes/evidence/<recipe>.yaml pointer.

Non-material edits (comments, formatting, displayName, description, key-order) produce the same material-slice digest and do not require a fresh bundle — the existing pointer stays valid. The CI gate’s canonicalizer collapses these to the same digest, so the gate passes without re-attestation. See ADR-007 § Material-slice canonicalization for the slice definition.

Producing the Bundle

Run aicr validate against the cluster that exercises your recipe and add --emit-attestation (writes the bundle to disk) and --push (signs and uploads the OCI artifact):

$# 1. Capture snapshot and resolve the recipe you're contributing.
>aicr snapshot --output snapshot.yaml
>aicr recipe --service eks --accelerator gb200 --os ubuntu \
> --intent training --output recipe.yaml
>
># 2. Validate with attestation emission. Replace the OCI ref with a
># registry you control (GHCR, GitLab Container Registry, Harbor,
># AWS ECR, Google Artifact Registry, Azure Container Registry,
># or JFrog Artifactory — any OCI 1.1 registry with Referrers API
># support).
>aicr validate \
> --recipe recipe.yaml \
> --snapshot snapshot.yaml \
> --emit-attestation ./out \
> --push ghcr.io/<owner>/aicr-evidence
>
># 3. Commit the pointer. The bundle bytes live in OCI; the repo
># only stores the locator.
>mkdir -p recipes/evidence
>cp ./out/pointer.yaml recipes/evidence/<recipe-name>.yaml
>git add recipes/evidence/<recipe-name>.yaml

--push signs the bundle (cosign keyless via Sigstore) and attaches it to the OCI artifact as a Sigstore Bundle referrer. The tag is just a label — the bundle is pinned by its sha256: digest — so omitting it lets aicr derive a unique per-recipe tag (<recipe-slug>-<short-fingerprint>, e.g. gb200-eks-ubuntu-training-3f9a1c2b4d5e).

For the full bundle layout, flag reference, tag derivation, OIDC token precedence, and registry compatibility notes, see Emitting recipe evidence. For the end-to-end producer-and-consumer walkthrough, see the Recipe Evidence Demo.

Self-Verifying Before You Open the PR

Run the verifier locally — it is the same code the CI gate runs against the committed pointer, so failures here will block merge:

$aicr evidence verify recipes/evidence/<recipe-name>.yaml

Exit 0 means signature, schema, inventory, manifest hashes, fingerprint match against the recipe’s criteria, and BOM cross-reference all passed. A non-zero exit writes a structured Markdown report describing the specific check that failed. See aicr evidence verify for the full check list and exit-code semantics.

What to Include in the PR

The recipe-evidence CI gate posts a Markdown summary as a sticky comment, so you do not need to inline the verifier output. The PR template asks for three additional pieces of context the verifier cannot infer:

  • The OCI ref of the pushed bundle, digest-pinned, so a maintainer can audit it directly: ghcr.io/<owner>/aicr-evidence@sha256:<digest>.
  • The cluster you attested from — cloud, accelerator SKU, OS, Kubernetes version, node count. The fingerprint dimensions are in the predicate, but the human description is what the maintainer reads first.
  • Evidence disposition. If aicr evidence verify reported a non-zero exit with a 1 in the JSON output’s exit field (signature valid, recorded phase results show failures), include a short justification in the PR template’s “Evidence disposition” section. The maintainer either applies the evidence/known-failure label and merges, or requests changes. See Exit-1 Review Process for what counts as an acceptable reason — broadly: optional check not applicable to your hardware, performance ceiling limited by your test bed, or a validator under known active rework.

If You Cannot Push to a Registry

You can still produce a bundle locally without --push. The resulting ./out/summary-bundle/ directory is unsigned but otherwise complete:

$aicr validate --recipe recipe.yaml --snapshot snapshot.yaml \
> --emit-attestation ./out
$aicr evidence verify ./out/summary-bundle

The verifier records the signature step as “skipped (unsigned)” and the manifest-hash chain becomes self-consistency only — useful for catching accidental corruption during development, but not acceptable for the CI gate, which requires a signed bundle bound to a committed pointer.

  • For mechanical changes that touch recipes/** but carry no recipe semantics (file renames, comment-only changes, license header sweeps, self-bootstrapping evidence-pipeline changes), ask a maintainer to apply evidence/exempt per the bypass policy. Self-applying that label is not appropriate.

“I don’t have the hardware right now, please merge” is not a valid exempt path — see the bypass policy’s “Inappropriate uses.”

Reference


See Also