AICR Contributor Guide
You’ve landed in the contributor entry point. Its job is to give you, in five minutes, a clear answer to four questions:
- What is AICR, and what shape does it have?
- Is the change I want to make in or out of scope?
- Which file or package do I touch?
- Where do I go next?
For dev-environment setup, run make tools-setup and read
DEVELOPMENT.md.
For contribution mechanics (DCO, CI, signing), see
CONTRIBUTING.md.
For the coding rules every PR is graded against, see
CLAUDE.md.
What AICR Is
AICR is a design-time tool. Given a description of a target environment — cluster, accelerator, intent, OS, platform — it generates validated GPU-cluster configuration artifacts that an established deployment tool (Helm, Argo CD, Flux, Helmfile) consumes.
Each stage produces a serializable artifact (file, stdout, or ConfigMap) and is independently invocable. Reproducibility — same inputs, same outputs — is non-negotiable.
Stages can be invoked individually or chained. Inputs and outputs
flow through files, stdout, or Kubernetes ConfigMaps
(cm://namespace/name), which lets the snapshot agent hand off to a
CLI or API server running outside the cluster.
What AICR Is Not
AICR is not a deployment engine. The boundary matters: half of all “can AICR do X?” questions resolve against this list.
The deployment tool that consumes AICR’s output (Helm, Argo CD, Flux, Helmfile) owns release reconciliation and lifecycle.
On terminology. Code under pkg/bundler/deployer includes things
we call deployers. They are output adapters that serialize a
bundle in a tool-specific format. They do not perform deployment.
The only in-cluster component is the snapshot agent — a one-shot Kubernetes Job that captures state into a ConfigMap and exits. It is an input collector, not a runtime component, and is not part of the deployed system.
Is My Change In Scope?
The single most useful question to answer before writing code. If you
land in the right-hand column, file an issue or ADR first — the
review will not get past make qualify regardless of code quality.
A rule of thumb: if a feature requires AICR to keep running after
artifact generation, or to drive kubectl and direct API calls to
deploy what it produces, it belongs in a deployment tool — not AICR.
Where Does My Change Go?
The contributor decision matrix. Find the row that matches your intent; the linked page has the walkthrough.
Your First Contribution
A typical first PR path:
- Read CONTRIBUTING.md and CLAUDE.md.
- Run
make tools-setupthenmake qualify. If qualify is green onmain, your environment works. - Find your change in the decision matrix. Open the linked guide and follow its walkthrough.
- Write the change. Run
make qualify(or the package-scoped subset) until it passes. Lint and test failures must be fixed locally — do not rely on CI. - If you changed
registry.yaml, a component values file, or a chart pin, runmake bom-docsand commit the regenerateddocs/user/container-images.md. CLAUDE.md treats this as a hard rule. - Sign the commit (
git commit -S), open a PR, and let CI run.
Use the PR template in .github/PULL_REQUEST_TEMPLATE.md as-is — do
not inline a modified copy. Reviewers grade against
CLAUDE.md;
familiarity with the anti-patterns table is the difference between
“shipped” and “round trip.”
Codebase Shape
Three layers. The separation is the single most enforced rule in review.
pkg/cli and pkg/server parse input, validate it, and format
output. They contain no business logic. All business logic lives
in functional packages, composed by the pkg/client/v1 facade so
both entry points share it. Adding business logic to pkg/cli or
pkg/server handlers is a boundary violation and will be rejected.
Packages
Community-Standard Deployment Targets
AICR emits artifacts in formats consumed by community-standard deployment tools:
We are open to adding additional community-standard targets when there is demonstrated demand. We do not add custom or proprietary deployment mechanisms: they pull deployment-time orchestration into AICR — the boundary we are explicitly maintaining.
Deployment Topologies
AICR can be invoked in three shapes. None are runtime components in the deployed cluster — all are design-time tooling.
- CLI. Single binary. Local development, CI pipelines, troubleshooting.
- API server. Stateless HTTP service for programmatic recipe and bundle generation. Scales horizontally behind a load balancer. Returns artifacts; does not deploy them. See /aicr/contributor-guide/api-server.
- Snapshot agent (one-shot Job). A Kubernetes Job that runs once, captures state into a ConfigMap, and exits. The CLI or API server reads the ConfigMap as input. No reconcile loop, no controller.
Architectural Principles
Coding rules and anti-patterns live in CLAUDE.md. The four principles that shape the architecture itself:
- Metadata is separate from how it is consumed. Validated configuration exists independent of how it is rendered, packaged, or deployed. A recipe is a value, not a procedure.
- Correctness must be reproducible. Same inputs → same outputs.
This rules out hidden state, implicit defaults, and
non-deterministic serialization.
serializer.MarshalYAMLDeterministicis required wherever output feeds a digest, signature, or fingerprint. - Recipe specialization requires explicit intent. Generic intent must never silently resolve to a specialized configuration. If two recipes both match, the user explicitly chooses; AICR does not guess.
- Trust requires verifiable provenance. Every released artifact carries verifiable, non-falsifiable proof of where it came from. See SECURITY.md.
Key Engineering Decisions
These five decisions shape how the codebase is laid out. Code that fights them tends to read as a re-architecture.
- Concurrent collection with
errgroup. Collectors run in parallel; failure of any collector cancels the rest via context. Fail-fast is the default; best-effort partial collection would hide systemic problems behind partial data. - Pluggable collectors via factory. Collectors implement a
common interface and are constructed by
pkg/collector.Factory, then wired into the snapshot run bypkg/snapshotter. Adding a state source means a newFactorymethod and oneg.Go(...)line in the snapshotter — no edits to existing collectors. See /aicr/contributor-guide/collectors. - Immutable recipe store. Read-only after init. Mutations on per-request clones. No locks; API server is safe for concurrent reads.
- Singleton Kubernetes client.
pkg/k8s/clientcaches a single clientset to avoid connection exhaustion. Both in-cluster and out-of-cluster auth are supported transparently. - Watch over poll. Long-running K8s operations use the watch API
rather than polling loops. See
pkg/k8s/pod.
Where to Next
By contributor task:
- Adding recipes, overlays, mixins, components → /aicr/contributor-guide/recipes-overlays-and-mixins
- Adding a registry entry (Helm or Kustomize chart) → /aicr/contributor-guide/components
- Adding a CLI command → /aicr/contributor-guide/cli
- Adding an HTTP endpoint → /aicr/contributor-guide/api-server
- Adding a snapshot collector → /aicr/contributor-guide/collectors
- Adding a validator check → /aicr/contributor-guide/validators
- Adding a bundle-time component validation → /aicr/contributor-guide/validators
- Maintaining recipes and cutting releases → /aicr/contributor-guide/maintaining-aicr
- Writing or running tests (unit, chainsaw, KWOK, e2e) → /aicr/contributor-guide/testing
- Using the project’s Claude skills (snapshot analysis, docs audit, demos, decks, OpenVEX, release notes) → /aicr/contributor-guide/claude-skills
By reference:
- CONTRIBUTING.md — contribution process, DCO, CI/CD, E2E testing
- DEVELOPMENT.md — dev environment setup and Make targets
- RELEASING.md — release process for maintainers
- SECURITY.md — supply-chain security, attestation verification
- CLAUDE.md — coding rules, error wrapping, context, HTTP, logging, K8s patterns
- docs/design/ — accepted ADRs
- docs/integrator/ — embedding AICR in your platform
- docs/user/ — end-user reference (CLI flags, API endpoints, component catalog)