AICR Contributor Guide

View as Markdown

You’ve landed in the contributor entry point. Its job is to give you, in five minutes, a clear answer to four questions:

  1. What is AICR, and what shape does it have?
  2. Is the change I want to make in or out of scope?
  3. Which file or package do I touch?
  4. Where do I go next?

For dev-environment setup, run make tools-setup and read DEVELOPMENT.md. For contribution mechanics (DCO, CI, signing), see CONTRIBUTING.md. For the coding rules every PR is graded against, see CLAUDE.md.

What AICR Is

AICR is a design-time tool. Given a description of a target environment — cluster, accelerator, intent, OS, platform — it generates validated GPU-cluster configuration artifacts that an established deployment tool (Helm, Argo CD, Flux, Helmfile) consumes.

ArtifactRoleProduced by
SnapshotNormalized state of an existing cluster (input)aicr snapshot or the in-cluster Job
RecipeDeclarative spec resolved from registry, criteria, overlays, and mixinsaicr recipe
Validation reportRecipe constraints evaluated against a snapshot or live clusteraicr validate
BundlePer-component deployment artifact in a tool-specific formataicr bundle
EvidenceSigned conformance attestation for a validated recipeaicr evidence

Each stage produces a serializable artifact (file, stdout, or ConfigMap) and is independently invocable. Reproducibility — same inputs, same outputs — is non-negotiable.

┌──────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
│ Snapshot │───▶│ Recipe │───▶│ Validate │───▶│ Bundle │
└──────────┘ └────────┘ └──────────┘ └────────┘
capture generate check emit
cluster optimized constraints deployment
state config vs. actual artifacts

Stages can be invoked individually or chained. Inputs and outputs flow through files, stdout, or Kubernetes ConfigMaps (cm://namespace/name), which lets the snapshot agent hand off to a CLI or API server running outside the cluster.

What AICR Is Not

AICR is not a deployment engine. The boundary matters: half of all “can AICR do X?” questions resolve against this list.

AICR doesAICR does not
Generate values.yaml, Application, HelmRelease, etc.Run kubectl apply or helm install
Validate constraints against a snapshotWait for resources to become ready
Sign and verify evidence bundlesImplement uninstall, rollback, or upgrade
Capture cluster state via a one-shot JobReconcile drift or run as a controller
Emit per-component artifacts in known formatsOrchestrate cross-component dependencies at runtime

The deployment tool that consumes AICR’s output (Helm, Argo CD, Flux, Helmfile) owns release reconciliation and lifecycle.

On terminology. Code under pkg/bundler/deployer includes things we call deployers. They are output adapters that serialize a bundle in a tool-specific format. They do not perform deployment.

The only in-cluster component is the snapshot agent — a one-shot Kubernetes Job that captures state into a ConfigMap and exits. It is an input collector, not a runtime component, and is not part of the deployed system.

Is My Change In Scope?

The single most useful question to answer before writing code. If you land in the right-hand column, file an issue or ADR first — the review will not get past make qualify regardless of code quality.

In scope (artifact generation)Out of scope (deployment-time)
New recipe overlay or mixinApply / wait / uninstall logic embedded in AICR
New registry entry for an upstream chartDrift detection or reconciliation loops
New snapshot collector dimensionIn-cluster controllers or operators owned by AICR
New constraint operatorCustom deployment mechanisms (e.g., a “direct” deployer that calls kubectl apply)
New bundle output for a community-standard toolCustom or proprietary delivery pipelines
Supply-chain provenance (SBOM, attestation, signing)Anything that keeps AICR running past artifact generation

A rule of thumb: if a feature requires AICR to keep running after artifact generation, or to drive kubectl and direct API calls to deploy what it produces, it belongs in a deployment tool — not AICR.

Where Does My Change Go?

The contributor decision matrix. Find the row that matches your intent; the linked page has the walkthrough.

I want to…TouchGuide
Make an existing Helm or Kustomize chart available to recipesrecipes/registry.yaml entry/aicr/contributor-guide/components
Pin a chart version, set values, or define scheduling for a specific cluster shapeRecipe overlay in recipes/overlays//aicr/contributor-guide/recipes-overlays-and-mixins
Share OS or platform fragments across overlaysRecipe mixin in recipes/mixins//aicr/contributor-guide/recipes-overlays-and-mixins
Capture a new dimension of cluster / OS / GPU stateNew collector in pkg/collector/<kind>//aicr/contributor-guide/collectors
Add a new declarative constraint operator (>=, tolerance, etc.)pkg/constraints/aicr/contributor-guide/validators
Add a container-per-validator check (NCCL variant, perf benchmark)validators/<phase>/ + recipes/validators/catalog.yaml/aicr/contributor-guide/validators
Warn or block on a component misconfiguration at bundle timepkg/bundler/validations/checks.go + registry.yaml/aicr/contributor-guide/validators
Verify a deployed component is healthy via chainsawrecipes/checks/<name>/health-check.yaml/aicr/contributor-guide/validators
Add or change a CLI flag or subcommandpkg/cli/<name>.go + register in pkg/cli/root.go/aicr/contributor-guide/cli
Add an HTTP endpointpkg/server/<name>_handler.go + api/aicr/v1/server.yaml/aicr/contributor-guide/api-server
Add a new community-standard bundle output formatpkg/bundler/deployer/<name>/Open an issue first — discuss before coding
Cut a releasetools/release + git push origin <tag>/aicr/contributor-guide/maintaining-aicr
Run unit / chainsaw / KWOK / e2e testsmake qualify and friends/aicr/contributor-guide/testing

Your First Contribution

A typical first PR path:

  1. Read CONTRIBUTING.md and CLAUDE.md.
  2. Run make tools-setup then make qualify. If qualify is green on main, your environment works.
  3. Find your change in the decision matrix. Open the linked guide and follow its walkthrough.
  4. Write the change. Run make qualify (or the package-scoped subset) until it passes. Lint and test failures must be fixed locally — do not rely on CI.
  5. If you changed registry.yaml, a component values file, or a chart pin, run make bom-docs and commit the regenerated docs/user/container-images.md. CLAUDE.md treats this as a hard rule.
  6. Sign the commit (git commit -S), open a PR, and let CI run.

Use the PR template in .github/PULL_REQUEST_TEMPLATE.md as-is — do not inline a modified copy. Reviewers grade against CLAUDE.md; familiarity with the anti-patterns table is the difference between “shipped” and “round trip.”

Codebase Shape

Three layers. The separation is the single most enforced rule in review.

┌─────────────────────┐ ┌─────────────────────┐
│ pkg/cli │ │ pkg/server │ user interaction
│ (CLI commands) │ │ (HTTP handlers) │ — no business logic
└──────────┬──────────┘ └──────────┬──────────┘
│ │
▼ ▼
┌──────────────────────────────┐
│ pkg/client/v1 │ shared facade
│ (aicr.Client) │
└────────────────┬─────────────┘
┌───────────────────────────────────────────────────┐
│ pkg/recipe pkg/collector pkg/validator │ functional packages
│ pkg/bundler pkg/snapshotter pkg/evidence │ — business logic lives here
│ pkg/constraints pkg/serializer pkg/manifest │
└───────────────────────────────────────────────────┘

pkg/cli and pkg/server parse input, validate it, and format output. They contain no business logic. All business logic lives in functional packages, composed by the pkg/client/v1 facade so both entry points share it. Adding business logic to pkg/cli or pkg/server handlers is a boundary violation and will be rejected.

Packages

PackageResponsibility
User interaction
pkg/cliCLI flags, output formatting, exit-code mapping. /aicr/contributor-guide/cli
pkg/serverHTTP server: middleware chain + REST handlers (thin adapters). /aicr/contributor-guide/api-server
pkg/client/v1aicr.Client facade — shared SDK used by CLI, server, and external Go callers
Recipe and data
pkg/recipeRecipe resolution, overlay/mixin composition, registry. /aicr/contributor-guide/recipes-overlays-and-mixins
pkg/recipe/oskindSingle source of truth for OS criterion string values (ubuntu, rhel, cos, amazonlinux, talos). Imported by pkg/recipe, pkg/collector, pkg/snapshotter, and the CLI.
pkg/constraintsDeclarative constraint operators (>=, <=, tolerance) and evaluation
pkg/measurementSchema for collector output and validator input
pkg/serializerDeterministic YAML/JSON for evidence and bundles
pkg/configCLI/server config file (--config) loader
Collection and validation
pkg/collectorParallel system state collection. /aicr/contributor-guide/collectors
pkg/snapshotterOrchestrates collectors, aggregates measurements
pkg/validatorConstraint evaluation; container-per-validator runner. /aicr/contributor-guide/validators
pkg/fingerprintCluster shape fingerprint for caching and provenance
Bundle generation
pkg/bundlerPer-component bundle generation entry point. /aicr/contributor-guide/components
pkg/bundler/deployerOutput adapters: helm, helmfile, argocd, argocdhelm, flux
pkg/bundler/validationsBundle-time component validation checks. /aicr/contributor-guide/validators
pkg/componentBundler utilities and test helpers
pkg/manifest, pkg/helm, pkg/bomManifest rendering, chart inspection, BOM extraction
Supply chain
pkg/evidenceConformance evidence capture, signing, verification
pkg/ociOCI artifact push/pull for evidence and bundles
pkg/mirrorAir-gap mirror for charts and images
pkg/trustSigstore trust root management
pkg/buildBuild provenance metadata
Cross-cutting
pkg/k8s/clientSingleton Kubernetes clientset (in-cluster + kubeconfig)
pkg/k8s/podShared K8s Job/Pod helpers (wait, logs, ConfigMap URI parsing)
pkg/errorsStructured errors with codes; HTTP and exit-code mapping
pkg/defaultsCentralized timeouts, limits, configuration constants
pkg/loggingStructured slog setup with TTY / NO_COLOR detection
pkg/header, pkg/version, pkg/diffAPI negotiation headers, build version, snapshot/recipe diff

Community-Standard Deployment Targets

AICR emits artifacts in formats consumed by community-standard deployment tools:

DeployerOutputUse case
helmPer-component values.yaml + install scriptDirect Helm install
helmfilehelmfile.yaml declarative release manifestGitOps with Helmfile
argocdArgo CD Application manifests with sync-wavesArgo CD GitOps
argocdhelmArgo CD Application referencing per-component Helm chartsArgo CD + upstream Helm
fluxFlux HelmRelease + Kustomization manifestsFlux GitOps

We are open to adding additional community-standard targets when there is demonstrated demand. We do not add custom or proprietary deployment mechanisms: they pull deployment-time orchestration into AICR — the boundary we are explicitly maintaining.

Deployment Topologies

AICR can be invoked in three shapes. None are runtime components in the deployed cluster — all are design-time tooling.

  • CLI. Single binary. Local development, CI pipelines, troubleshooting.
  • API server. Stateless HTTP service for programmatic recipe and bundle generation. Scales horizontally behind a load balancer. Returns artifacts; does not deploy them. See /aicr/contributor-guide/api-server.
  • Snapshot agent (one-shot Job). A Kubernetes Job that runs once, captures state into a ConfigMap, and exits. The CLI or API server reads the ConfigMap as input. No reconcile loop, no controller.

Architectural Principles

Coding rules and anti-patterns live in CLAUDE.md. The four principles that shape the architecture itself:

  • Metadata is separate from how it is consumed. Validated configuration exists independent of how it is rendered, packaged, or deployed. A recipe is a value, not a procedure.
  • Correctness must be reproducible. Same inputs → same outputs. This rules out hidden state, implicit defaults, and non-deterministic serialization. serializer.MarshalYAMLDeterministic is required wherever output feeds a digest, signature, or fingerprint.
  • Recipe specialization requires explicit intent. Generic intent must never silently resolve to a specialized configuration. If two recipes both match, the user explicitly chooses; AICR does not guess.
  • Trust requires verifiable provenance. Every released artifact carries verifiable, non-falsifiable proof of where it came from. See SECURITY.md.

Key Engineering Decisions

These five decisions shape how the codebase is laid out. Code that fights them tends to read as a re-architecture.

  • Concurrent collection with errgroup. Collectors run in parallel; failure of any collector cancels the rest via context. Fail-fast is the default; best-effort partial collection would hide systemic problems behind partial data.
  • Pluggable collectors via factory. Collectors implement a common interface and are constructed by pkg/collector.Factory, then wired into the snapshot run by pkg/snapshotter. Adding a state source means a new Factory method and one g.Go(...) line in the snapshotter — no edits to existing collectors. See /aicr/contributor-guide/collectors.
  • Immutable recipe store. Read-only after init. Mutations on per-request clones. No locks; API server is safe for concurrent reads.
  • Singleton Kubernetes client. pkg/k8s/client caches a single clientset to avoid connection exhaustion. Both in-cluster and out-of-cluster auth are supported transparently.
  • Watch over poll. Long-running K8s operations use the watch API rather than polling loops. See pkg/k8s/pod.

Where to Next

By contributor task:

By reference: