Using AICR as a Go library

View as Markdown

AICR ships as both a CLI and a Go library. External projects that need to resolve validated recipes, generate bundles, or collect observed state can import AICR directly. This page is for those consumers.

Which package to import

Import the github.com/NVIDIA/aicr/pkg/client/v1 package. This is the stable facade.

1import aicr "github.com/NVIDIA/aicr/pkg/client/v1"

The facade provides a single Client type with constructors for the supported recipe sources. Internally it delegates to the functional packages under pkg/*.

You may also import pkg/* subpackages directly, but their APIs are not covered by the same stability guarantees — see the public API surface for the details.

Installing

$go get github.com/NVIDIA/aicr@latest

For reproducibility in downstream projects, pin a specific tag:

$go get github.com/NVIDIA/aicr@v0.11.1

Quick start

1package main
2
3import (
4 "context"
5 "log"
6 "time"
7
8 aicr "github.com/NVIDIA/aicr/pkg/client/v1"
9)
10
11func main() {
12 // FilesystemSource layers an external recipe directory over the
13 // embedded recipe data. Use this in production today; OCISource
14 // is reserved but not yet implemented (NewClient returns
15 // ErrCodeUnavailable when given one — see the constructor's
16 // godoc for the current state).
17 client, err := aicr.NewClient(
18 aicr.WithRecipeSource(
19 aicr.FilesystemSource("/etc/aicr/recipes"),
20 ),
21 )
22 if err != nil {
23 log.Fatal(err)
24 }
25 // Always Close when done — releases this Client's cached
26 // metadata store and component registry from the recipe
27 // package's per-DataProvider caches.
28 defer client.Close()
29
30 ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
31 defer cancel()
32
33 result, err := client.ResolveRecipe(ctx, aicr.RecipeRequest{
34 Service: "eks", // K8s flavour, not cloud vendor — map aws→eks etc. on your side
35 Region: "us-east-1",
36 Accelerator: "h100",
37 Nodes: 8, // worker-node count, not GPU count
38 OS: "ubuntu", // REQUIRED to reach the OS-pinned kubeflow overlay; see "Recipe sources" below
39 Intent: "training",
40 Platform: "kubeflow",
41 })
42 if err != nil {
43 log.Fatalf("resolve recipe: %v", err)
44 }
45
46 log.Printf("resolved recipe %s (%d components)", result.Name, len(result.Components))
47}

Snapshotting and validation

Beyond recipe resolution, the facade exposes the rest of the Snapshot → Validate workflow. Both methods are stateless w.r.t. the Client’s recipe source; they are surfaced through the Client only to keep the facade uniform and leave room for future per-Client telemetry hooks.

1// CollectSnapshot deploys a snapshotter Job to the target cluster and
2// returns the resulting Snapshot. cfg is a facade-owned struct that
3// mirrors most fields of the underlying pkg/snapshotter.AgentConfig
4// (the in-pod network-discovery fields ClusterConfigPath and
5// DiscoverNetwork are not surfaced on the facade).
6snap, err := client.CollectSnapshot(ctx, &aicr.AgentConfig{
7 Kubeconfig: "/path/to/target-kubeconfig",
8 Namespace: "aicr-snapshot",
9 Image: "ghcr.io/nvidia/aicr:v0.11.1",
10 ServiceAccountName: "aicr-agent",
11 Timeout: 5 * time.Minute,
12 Cleanup: true,
13})
14if err != nil {
15 log.Fatalf("collect snapshot: %v", err)
16}
17
18// ValidateState runs the validation phases against the resolved recipe +
19// observed snapshot. With no WithValidationPhases option it runs all three
20// phases (Deployment, Conformance, Performance) in canonical order.
21results, err := client.ValidateState(ctx, result, snap)
22if err != nil {
23 log.Fatalf("validate state: %v", err)
24}
25for _, r := range results {
26 log.Printf("phase=%s status=%s duration=%s", r.Phase, r.Status, r.Duration)
27}

The recipe argument to ValidateState MUST be the *RecipeResult returned by the same Client’s ResolveRecipe (or LoadRecipe) call — the unexported internal recipe state is required for constraint evaluation.

To restrict the run to specific phases, pass WithValidationPhases in the order you want them executed:

1results, err := client.ValidateState(ctx, result, snap,
2 aicr.WithValidationPhases(aicr.PhaseDeployment, aicr.PhaseConformance))

Valid phase values are PhaseDeployment, PhaseConformance, and PhasePerformance (canonical execution order). An unrecognized phase is rejected with ErrCodeInvalidRequest before any cluster work, so a typo cannot silently degrade to an empty run.

Loading an existing recipe

When a recipe has already been resolved and persisted (for example a recipe file checked into a GitOps repo, or a cm:// ConfigMap URI), load it back through the same Client with LoadRecipe instead of re-resolving from criteria:

1result, err := client.LoadRecipe(ctx, "/etc/aicr/recipe.yaml", "")
2if err != nil {
3 log.Fatalf("load recipe: %v", err)
4}

LoadRecipe hydrates overlay inputs (kind: RecipeMetadata) against the Client’s own data provider and returns a Client-owned *RecipeResult ready for ValidateState / BundleComponents — it passes the same ownership check as a ResolveRecipe result. An already-hydrated RecipeResult file is returned with its provider bound to the Client. The kubeconfig argument (third parameter) is only needed when the recipe path (first argument) is a cm:// ConfigMap URI.

For unit tests that exercise the facade surface without a live cluster, pass aicr.WithValidationNoCluster(true): every check reports as “skipped - no-cluster mode” and no Kubernetes resources are created. Other facade options (WithValidationNamespace, WithValidationRunID, WithValidationCleanup, WithValidationImagePullSecrets, WithValidationTolerations, WithValidationNodeSelector) cover the production-controller knobs.

Recipe sources

AICR exposes one production recipe source today; pick it via aicr.WithRecipeSource:

SourceConstructorStatus
Embeddedaicr.EmbeddedSource()Production. Uses only AICR’s built-in recipe data with no external overlay.
Local filesystemaicr.FilesystemSource(path)Production. Use a directory containing a registry.yaml (layered over the embedded recipe data).
OCI registryaicr.OCISource(registry, tag)Reserved — not yet implemented. NewClient returns ErrCodeUnavailable when this source is selected.

EmbeddedSource resolves against the recipe data compiled into the AICR binary — no filesystem path required. Use it when you want AICR’s bundled recipe data and no local overrides. FilesystemSource layers an external directory over that same embedded data, so files in the directory override their embedded equivalents.

Client options

Beyond WithRecipeSource, NewClient accepts these functional options:

1allowLists, err := aicr.ParseAllowListsFromEnv()
2if err != nil {
3 log.Fatal(err)
4}
5
6client, err := aicr.NewClient(
7 aicr.WithRecipeSource(aicr.EmbeddedSource()),
8 aicr.WithVersion("1.2.3"),
9 aicr.WithAllowLists(allowLists),
10)
  • WithVersion(version string) stamps the given version string into resolved recipe metadata (accessible via result.Resolved().Metadata.Version). Typically the consuming binary’s build version.
  • WithAllowLists(al *AllowLists) fences which criteria values the Client’s resolve path accepts. A resolve whose criteria fall outside the allowlist is rejected before the recipe is built. Pass nil (or omit the option) to allow all values.
  • ParseAllowListsFromEnv() builds an AllowLists from the AICR_ALLOWED_ACCELERATORS, AICR_ALLOWED_SERVICES, AICR_ALLOWED_INTENTS, and AICR_ALLOWED_OS environment variables. It returns nil when none are set — WithAllowLists treats a nil AllowLists as allow-all, so the result is always safe to pass straight to WithAllowLists.

AllowLists is a facade-owned struct whose Accelerators, Services, Intents, and OSTypes fields are plain []string slices, so callers can construct one directly without depending on pkg/recipe’s enum identifiers. When you already hold a pkg/recipe.AllowLists, use aicr.WrapAllowLists to project it onto the facade shape.

Resolving from criteria

ResolveRecipe takes the stable RecipeRequest shape and returns the facade RecipeResult — a deliberately small struct exposing the Name, Version, and Components of the resolved recipe. When you already hold an *aicr.Criteria value — for example, a REST handler that parsed criteria from an incoming HTTP request and wrapped them with aicr.WrapCriteria — use ResolveRecipeFromCriteria. It returns the same facade *RecipeResult; call result.Resolved() when you need the complete underlying *pkg/recipe.RecipeResult (constraints, deployment order, validation config, metadata):

1rec, err := client.ResolveRecipeFromCriteria(ctx, aicr.WrapCriteria(criteria))
2if err != nil {
3 log.Fatalf("resolve recipe: %v", err)
4}
5
6// Facade surface — Name, Version, Components.
7log.Printf("recipe %s components: %d", rec.Name, len(rec.Components))
8
9// Full upstream shape, when needed.
10resolved := rec.Resolved()
11log.Printf("recipe constraints: %d", len(resolved.Constraints))

The returned *RecipeResult carries:

  • Name, Version, TranslatedAt — stable identity
  • Components[]ComponentRef (Name, Kind, Version, Source, Chart, Namespace)
  • Resolved() — the upstream *pkg/recipe.RecipeResult for callers that need constraints, deployment order, validation config, or metadata (e.g., evidence emission). Do not mutate; do not retain past the facade RecipeResult’s lifetime — marshal first if persistence is needed.

Criteria is a facade-owned struct whose enum-typed fields project to plain strings, decoupling the public surface from pkg/recipe’s enum identifiers. Construct one directly or wrap an upstream *pkg/recipe.Criteria via aicr.WrapCriteria. Allowlist enforcement (WithAllowLists) applies here just as it does on ResolveRecipe; a nil Client, nil context, or nil criteria each return ErrCodeInvalidRequest, and the same facade-level timeout bounds the resolve.

To extract a single value from a resolved recipe, use SelectFromRecipe with a dot-path selector. It hydrates the recipe’s component values and returns the value at the path; an empty selector returns the entire hydrated structure, and a nil *RecipeResult returns ErrCodeInvalidRequest. This mirrors the aicr query CLI command:

1v, err := aicr.SelectFromRecipe(rec, "components.gpu-operator.values.driver.version")
2if err != nil {
3 log.Fatalf("select: %v", err)
4}
5log.Printf("driver version: %v", v)

Errors

All errors returned by the facade are *pkg/errors.StructuredError values carrying an ErrorCode. Use errors.As to inspect:

1import (
2 stderrors "errors"
3 aicr "github.com/NVIDIA/aicr/pkg/client/v1"
4 aicrerrors "github.com/NVIDIA/aicr/pkg/errors"
5)
6
7_, err := client.ResolveRecipe(ctx, req)
8var se *aicrerrors.StructuredError
9if stderrors.As(err, &se) && se.Code == aicrerrors.ErrCodeInvalidRequest {
10 // handle invalid input
11}

Context handling

ResolveRecipe (and every other context-aware facade method) honours context cancellation. Each facade entry point unconditionally wraps the caller’s context with context.WithTimeout against its per-operation cap. The effective deadline is the smaller of the caller’s deadline and the facade cap, per context.WithTimeout semantics — a caller passing a tighter deadline keeps it; a caller passing context.Background() gets the facade cap.

Per-operation caps:

  • ResolveRecipe / BundleComponents: defaults.RecipeOperationTimeout
  • CollectSnapshot: caller-controlled via AgentConfig.Timeout, falling back to defaults.SnapshotOperationTimeout when unset
  • ValidateState: defaults.ValidationOperationTimeout
  • MakeBundle: opt-in via BundleOptions.Timeout. When unset (0) the caller’s context governs unchanged — large bundles, --vendor-charts, and attestation/signing can exceed any fixed cap. The REST /v1/bundle handler sets it to defaults.BundleHandlerTimeout; the CLI bundle command leaves it 0.

Passing a nil context.Context returns ErrCodeInvalidRequest. Use context.Background() (or a deadline-bounded child) for unbounded callers.

Compatibility

The facade’s exported API follows Semantic Versioning:

  • Major bumps may rename, remove, or change the shape of exported types and function signatures.
  • Minor bumps may add new exported types, fields, or methods.
  • Patch bumps are bug-fix-only.

Today AICR is pre-1.0. Pin a patch version in your go.mod and audit diffs on upgrade.

See also