Recipe Health

View as Markdown

This page reports the structural health of every recipe AICR can resolve — one row per leaf criteria combination (service × accelerator × OS × intent × platform). It answers “across the whole matrix, what is the current structural state of each recipe?” and is the catalog-wide complement to per-recipe conformance evidence.

The matrix is computed hermetically and offline: every signal is a pure read of the resolved recipe — no Helm render, no GPU, no cluster, no network. It is regenerated from the recipe catalog by make recipe-health-docs and is kept current by a weekly bot PR. make recipe-health-check is an advisory staleness check (it is not wired into make qualify or the merge gate). The full design is recorded in ADR-009.

What the columns mean

Status is the rolled-up structural verdict per recipe:

  • pass — the recipe is structurally sound.
  • warn — a non-fatal structural concern was surfaced.
  • fail — a graded structural signal failed (e.g. the recipe does not resolve).
  • unknown — a transient resolver error (a re-runnable timeout) prevented a confident verdict; the recipe is held rather than penalized. unknown is never silently read as pass.

Structural soundness is not a validation verdict. A recipe that resolves cleanly is structurally sound, not validated and performant. Runtime/validation claims come only from signed conformance evidence, which is out of scope for this matrix today (see the Evidence column below).

chart_pinned (folded into Status). One of the graded signals behind Status checks that every resolved Helm component references an explicit chart version, per ADR-006. This is layer 1 only — the chart-version pin — not image-digest pinning, and it is a render-free read of the resolved recipe (it does not pull or template the chart).

Coverage is a descriptor — it is never graded, so a deliberately minimal recipe is never penalized for declaring fewer checks. It is a compact per-phase summary of the declared validation checks, in the form R:n D:n P:n C:n — the count of named checks declared for the readiness, deployment, performance, and conformance phases respectively.

Evidence is a literal pending for every recipe today. No conformance attestations exist yet, so the column is honestly uniform: it reports the absence of evidence rather than overstating what is known. A differentiated, evidence-derived column lands once the first signed attestation does.

Summary

  • Recipes: 37
  • Pass: 37 · Warn: 0 · Fail: 0 · Unknown: 0

Recipes

RecipeServiceAcceleratorOSIntentPlatformStatusCoverageEvidence
a100-anya100passR:0 D:4 P:0 C:0pending
b200-anyb200passR:0 D:4 P:0 C:0pending
gb200-anygb200passR:0 D:4 P:0 C:0pending
h100-anyh100passR:0 D:4 P:0 C:0pending
h200-anyh200passR:0 D:4 P:0 C:0pending
rtx-pro-6000-anyrtx-pro-6000passR:0 D:4 P:0 C:0pending
monitoring-hpapassR:0 D:0 P:0 C:0pending
a100-aks-ubuntu-training-kubeflowaksa100ubuntutrainingkubeflowpassR:0 D:4 P:0 C:10pending
h100-aks-ubuntu-inference-dynamoaksh100ubuntuinferencedynamopassR:0 D:4 P:1 C:11pending
h100-aks-ubuntu-training-kubeflowaksh100ubuntutrainingkubeflowpassR:0 D:4 P:0 C:10pending
bcm-inferencebcminferencepassR:0 D:0 P:0 C:5pending
h100-bcm-ubuntu-trainingbcmh100ubuntutrainingpassR:0 D:4 P:0 C:5pending
a100-eks-ubuntu-training-kubefloweksa100ubuntutrainingkubeflowpassR:0 D:4 P:0 C:10pending
gb200-eks-ubuntu-inference-dynamoeksgb200ubuntuinferencedynamopassR:0 D:4 P:1 C:10pending
gb200-eks-ubuntu-training-kubefloweksgb200ubuntutrainingkubeflowpassR:0 D:4 P:2 C:8pending
h100-eks-ubuntu-inference-dynamoeksh100ubuntuinferencedynamopassR:0 D:4 P:1 C:11pending
h100-eks-ubuntu-inference-nimeksh100ubuntuinferencenimpassR:0 D:4 P:0 C:11pending
h100-eks-ubuntu-training-kubefloweksh100ubuntutrainingkubeflowpassR:0 D:4 P:1 C:10pending
h100-eks-ubuntu-training-slurmeksh100ubuntutrainingslurmpassR:0 D:4 P:0 C:10pending
h200-eks-inferenceeksh200inferencepassR:0 D:4 P:0 C:5pending
h200-eks-trainingeksh200trainingpassR:0 D:4 P:1 C:10pending
rtx-pro-6000-eks-ubuntu-inference-dynamoeksrtx-pro-6000ubuntuinferencedynamopassR:0 D:4 P:1 C:11pending
rtx-pro-6000-eks-ubuntu-inference-nimeksrtx-pro-6000ubuntuinferencenimpassR:0 D:4 P:0 C:11pending
a100-gke-cos-training-kubeflowgkea100costrainingkubeflowpassR:0 D:4 P:0 C:10pending
b200-gke-cos-inference-dynamogkeb200cosinferencedynamopassR:0 D:4 P:0 C:11pending
b200-gke-cos-training-kubeflowgkeb200costrainingkubeflowpassR:0 D:4 P:0 C:10pending
h100-gke-cos-inference-dynamogkeh100cosinferencedynamopassR:0 D:4 P:1 C:11pending
h100-gke-cos-training-kubeflowgkeh100costrainingkubeflowpassR:0 D:4 P:1 C:10pending
h100-gke-cos-training-slurmgkeh100costrainingslurmpassR:0 D:4 P:0 C:10pending
h100-kind-inference-dynamokindh100inferencedynamopassR:0 D:4 P:0 C:11pending
h100-kind-training-kubeflowkindh100trainingkubeflowpassR:0 D:4 P:0 C:10pending
h100-kind-training-slurmkindh100trainingslurmpassR:0 D:4 P:0 C:9pending
rtx-pro-6000-lke-ubuntu-inferencelkertx-pro-6000ubuntuinferencepassR:0 D:4 P:0 C:8pending
rtx-pro-6000-lke-ubuntu-traininglkertx-pro-6000ubuntutrainingpassR:0 D:4 P:0 C:8pending
a100-oke-ubuntu-training-kubeflowokea100ubuntutrainingkubeflowpassR:0 D:4 P:0 C:8pending
gb200-oke-ubuntu-inference-dynamookegb200ubuntuinferencedynamopassR:0 D:4 P:1 C:10pending
gb200-oke-ubuntu-training-kubeflowokegb200ubuntutrainingkubeflowpassR:0 D:4 P:1 C:8pending