API Server Architecture

View as Markdown

The aicrd provides HTTP REST API access to AICR configuration recipe generation and bundle creation capabilities.

Overview

The API server provides HTTP REST access to Steps 2 and 4 of the AICR workflow — recipe generation (GET /v1/recipe) and bundle creation (POST /v1/bundle). Built on Go’s net/http with middleware for rate limiting, metrics, request tracking, and graceful shutdown.

Four-Step Workflow Context

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │─────▶│ Recipe │─────▶│ Validate │─────▶│ Bundle │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
CLI/Agent only API Server CLI only API Server

API Server scope:

  • Recipe generation (Step 2, query mode only — no snapshot analysis) and bundle creation (Step 4)
  • Health, readiness, and Prometheus metrics endpoints
  • SLSA Build Level 3 attestations on released images
  • No snapshot capture, no validation, no ConfigMap I/O — use the CLI for those

API Server configuration:

  • Criteria allowlists for accelerator, service, intent, OS via AICR_ALLOWED_* env vars
  • Value overrides on /v1/bundle via ?set=bundler:path=value and ?dynamic=component:path (helm and argocd-helm deployers)
  • Node scheduling via ?system-node-selector and ?accelerated-node-selector

For the complete workflow (snapshot → recipe → validate → bundle, ConfigMap I/O via cm://namespace/name, agent deployment, Chainsaw E2E in tests/chainsaw/cli/), use the CLI.

Architecture Diagram

Request Flow

Complete Request Flow with Middleware

Component Details

Entry Point: cmd/aicrd/main.go

Minimal entry point:

1package main
2
3import (
4 "log"
5 "github.com/NVIDIA/aicr/pkg/api"
6)
7
8func main() {
9 if err := api.Serve(); err != nil {
10 log.Fatal(err)
11 }
12}

API Package: pkg/api/server.go

Responsibilities: initialize structured logging; parse criteria allowlists; create recipe builder, query handler, and bundle handler with allowlist configuration; install signal handling; run server with middleware; handle graceful shutdown.

Key Features: version info injected via ldflags (version, commit, date); routes /v1/recipe, /v1/query, /v1/bundle; allowlists from AICR_ALLOWED_* env vars; production defaults; graceful shutdown on SIGINT/SIGTERM.

Initialization Flow

1func Serve() error {
2 // Signal handling spans pre-Run setup and request handling
3 ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
4 defer stop()
5
6 logging.SetDefaultStructuredLogger(name, version)
7
8 allowLists, err := recipe.ParseAllowListsFromEnv()
9 if err != nil {
10 return errors.Wrap(errors.ErrCodeInternal, "failed to parse allowlists from environment", err)
11 }
12
13 rb := recipe.NewBuilder(recipe.WithVersion(version), recipe.WithAllowLists(allowLists))
14 bb, err := bundler.New(bundler.WithAllowLists(allowLists))
15 if err != nil {
16 return errors.Wrap(errors.ErrCodeInternal, "failed to create bundler", err)
17 }
18
19 s := server.New(
20 server.WithName(name),
21 server.WithVersion(version),
22 server.WithHandler(map[string]http.HandlerFunc{
23 "/v1/recipe": rb.HandleRecipes,
24 "/v1/query": rb.HandleQuery,
25 "/v1/bundle": bb.HandleBundles,
26 }),
27 )
28 return s.Run(ctx)
29}

Server Infrastructure: pkg/server/

Production-ready HTTP server implementation. Core files:

server.go — Server struct (config, HTTP server, rate limiter, ready state); functional options; graceful shutdown via signal.NotifyContext and errgroup; default root handler listing routes.

config.go — Configuration struct with defaults; PORT env var; read/write/idle/shutdown timeouts; rate-limit parameters.

middleware.go — Middleware chain builder; request ID (UUID generation/validation), rate limiting (token bucket), panic recovery, structured logging.

health.go/health (liveness, always 200) and /ready (readiness, 503 when not ready); JSON status + timestamp.

errors.go — Standardized error response struct, error codes (RATE_LIMIT_EXCEEDED, INTERNAL_ERROR, …), WriteError helper with request ID.

metrics.go — Prometheus metrics:

  • aicr_http_requests_total (counter; method, path, status)
  • aicr_http_request_duration_seconds (histogram; method, path)
  • aicr_http_requests_in_flight (gauge)
  • aicr_rate_limit_rejects_total (counter)
  • aicr_panic_recoveries_total (counter)

context.go — Context key type for request ID storage.

doc.go — Package documentation: usage, endpoints, error handling, deployment.

Request Processing Pipeline

Recipe Handler: pkg/recipe/handler.go

HTTP handler for recipe generation endpoint. Supports both GET (query parameters) and POST (criteria body) methods.

Handler Flow

1func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request) {
2 var criteria *Criteria
3 var err error
4
5 // 1. Route based on HTTP method
6 switch r.Method {
7 case http.MethodGet:
8 // 2a. Parse query parameters for GET
9 criteria, err = ParseCriteriaFromRequest(r)
10 case http.MethodPost:
11 // 2b. Parse request body for POST (JSON or YAML)
12 criteria, err = ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"))
13 defer r.Body.Close()
14 default:
15 // Reject other methods
16 w.Header().Set("Allow", "GET, POST")
17 return 405
18 }
19
20 // 3. Validate criteria format
21 if err := criteria.Validate(); err != nil {
22 return 400 with error details
23 }
24
25 // 4. Validate against allowlists (if configured)
26 if b.AllowLists != nil {
27 if err := b.AllowLists.ValidateCriteria(criteria); err != nil {
28 return 400 with allowed values in error details
29 }
30 }
31
32 // 5. Build recipe
33 recipe, err := b.BuildFromCriteria(r.Context(), criteria)
34 if err != nil {
35 return 500
36 }
37
38 // 6. Set cache headers
39 w.Header().Set("Cache-Control", "public, max-age=600")
40
41 // 7. Respond with JSON
42 serializer.RespondJSON(w, http.StatusOK, recipe)
43}

POST Request Body Format

POST requests accept a RecipeCriteria resource (Kubernetes-style):

1kind: RecipeCriteria
2apiVersion: aicr.nvidia.com/v1alpha1
3metadata:
4 name: my-criteria
5spec:
6 service: eks
7 accelerator: gb200
8 os: ubuntu
9 intent: training

Supported content types:

  • application/json - JSON format
  • application/x-yaml - YAML format

Query Parameter Parsing

ParameterTypeValidationExample
serviceServiceTypeEnum: eks, gke, aks, oke, kind, lke, anyservice=eks
acceleratorAcceleratorTypeEnum: h100, gb200, b200, a100, l40, rtx-pro-6000, anyaccelerator=h100
gpuAcceleratorTypeAlias for acceleratorgpu=h100
intentIntentTypeEnum: training, inference, anyintent=training
osOSTypeEnum: ubuntu, rhel, cos, amazonlinux, talos, anyos=ubuntu
nodesint>= 0nodes=8

Recipe Builder: pkg/recipe/builder.go

Shared with CLI - same logic as described in CLI architecture.

API Endpoints

Recipe Generation

Endpoints GET /v1/recipe (query parameters) and POST /v1/recipe (criteria body, application/json or application/x-yaml). See Query Parameter Parsing above for the GET parameter table and POST Request Body Format above for the body schema.

Response: 200 OK

1{
2 "apiVersion": "aicr.nvidia.com/v1alpha1",
3 "kind": "Recipe",
4 "metadata": {
5 "version": "v1.0.0",
6 "created": "2025-12-25T12:00:00Z",
7 "appliedOverlays": [
8 "base",
9 "eks",
10 "eks-training",
11 "gb200-eks-training"
12 ],
13 "excludedOverlays": [
14 {
15 "name": "h100-eks-ubuntu-training",
16 "reason": "mixin-constraint-failed"
17 }
18 ],
19 "constraintWarnings": [
20 {
21 "overlay": "h100-eks-ubuntu-training",
22 "constraint": "OS.sysctl./proc/sys/kernel/osrelease",
23 "expected": ">= 6.8",
24 "actual": "5.15.0",
25 "reason": "mixin-constraint-failed: expected >= 6.8, got 5.15.0"
26 }
27 ]
28 },
29 "criteria": {
30 "service": "eks",
31 "accelerator": "gb200",
32 "intent": "training",
33 "os": "any"
34 },
35 "componentRefs": [
36 {
37 "name": "gpu-operator",
38 "version": "v25.3.3",
39 "order": 1
40 }
41 ],
42 "constraints": {
43 "driver": {
44 "version": "580.82.07"
45 }
46 }
47}

metadata.excludedOverlays is optional. When present, it contains structured {name, reason} entries so API consumers can distinguish direct constraint failures from post-compose mixin fallback.

Error Response: 400 Bad Request

1{
2 "code": "INVALID_REQUEST",
3 "message": "Invalid recipe criteria",
4 "details": {
5 "error": "[INVALID_REQUEST] invalid accelerator parameter: [INVALID_REQUEST] invalid accelerator type: invalid-gpu"
6 },
7 "requestId": "550e8400-e29b-41d4-a716-446655440000",
8 "timestamp": "2025-12-25T12:00:00Z",
9 "retryable": false
10}

Rate Limited: 429 Too Many Requests

1{
2 "code": "RATE_LIMIT_EXCEEDED",
3 "message": "Rate limit exceeded",
4 "details": {
5 "limit": 100,
6 "burst": 200
7 },
8 "requestId": "550e8400-e29b-41d4-a716-446655440000",
9 "timestamp": "2025-12-25T12:00:00Z",
10 "retryable": true
11}

Headers:

  • X-Request-Id - Unique request identifier
  • X-RateLimit-Limit - Total requests allowed per second
  • X-RateLimit-Remaining - Requests remaining in current window
  • X-RateLimit-Reset - Unix timestamp when window resets
  • Cache-Control - Caching policy (public, max-age=300)

Health Check

Endpoint: GET /health

Response: 200 OK

1{
2 "status": "healthy",
3 "timestamp": "2025-12-25T12:00:00Z"
4}

Readiness Check

Endpoint: GET /ready

Response: 200 OK (ready) or 503 Service Unavailable (not ready)

1{
2 "status": "ready",
3 "timestamp": "2025-12-25T12:00:00Z"
4}

Metrics

Endpoint: GET /metrics

Response: Prometheus text format

# HELP aicr_http_requests_total Total number of HTTP requests
# TYPE aicr_http_requests_total counter
aicr_http_requests_total{method="GET",path="/v1/recipe",status="200"} 1234
# HELP aicr_http_request_duration_seconds HTTP request latency in seconds
# TYPE aicr_http_request_duration_seconds histogram
aicr_http_request_duration_seconds_bucket{method="GET",path="/v1/recipe",le="0.005"} 1000
aicr_http_request_duration_seconds_sum{method="GET",path="/v1/recipe"} 12.34
aicr_http_request_duration_seconds_count{method="GET",path="/v1/recipe"} 1234
# HELP aicr_http_requests_in_flight Current number of HTTP requests being processed
# TYPE aicr_http_requests_in_flight gauge
aicr_http_requests_in_flight 5
# HELP aicr_rate_limit_rejects_total Total number of requests rejected due to rate limiting
# TYPE aicr_rate_limit_rejects_total counter
aicr_rate_limit_rejects_total 42
# HELP aicr_panic_recoveries_total Total number of panics recovered in HTTP handlers
# TYPE aicr_panic_recoveries_total counter
aicr_panic_recoveries_total 0

Root

Endpoint: GET /

Response: 200 OK

1{
2 "service": "aicrd",
3 "version": "v1.0.0",
4 "routes": [
5 "/v1/recipe"
6 ]
7}

Usage Examples

cURL Examples

$# Basic recipe request
$curl "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
$
$# Full specification
$curl "http://localhost:8080/v1/recipe?os=ubuntu&service=eks&accelerator=gb200&intent=training&nodes=8"
$
$# With request ID
$curl -H "X-Request-Id: 550e8400-e29b-41d4-a716-446655440000" \
> "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
$
$# Health check
$curl http://localhost:8080/health
$
$# Readiness check
$curl http://localhost:8080/ready
$
$# Metrics
$curl http://localhost:8080/metrics

Demo API Server Deployment

Note: This section describes the demonstration deployment of the aicrd API server for testing and development purposes only. It is not a production service. Users should self-host the aicrd API server in their own infrastructure for production use. See the Kubernetes Deployment section below for deployment guidance.

Example: Google Cloud Run

The demo API server is deployed to Google Cloud Run as an example of how to deploy aicrd:

Demo Configuration:

  • Platform: Google Cloud Run (fully managed serverless)
  • Authentication: Public access (for demo purposes)
  • Auto-scaling: 0-100 instances based on load
  • Region: us-west1

CI/CD Pipeline (on-tag.yaml):

Supply Chain Security:

  • SLSA Build Level 3 compliance
  • Signed SBOMs in SPDX format
  • Attestations logged in Rekor transparency log
  • Verification: gh attestation verify oci://ghcr.io/nvidia/aicrd:TAG --owner nvidia

Demo Monitoring:

  • Health endpoint: /health
  • Readiness endpoint: /ready
  • Prometheus metrics: /metrics
  • Request tracing with X-Request-Id headers

Scaling Behavior (demo):

  • Min instances: 0 (scales to zero when idle)
  • Max instances: 100 (automatic scaling)
  • Cold start: 2-3 seconds
  • Request timeout: 30 seconds
  • Concurrency: 80 requests per instance

Cloud Run Benefits (for reference):

  • Zero operational overhead
  • Automatic HTTPS with managed certificates
  • Built-in DDoS protection
  • Pay-per-use pricing (scales to zero)
  • Global load balancing

Client Libraries

Go Client:

1import (
2 "encoding/json"
3 "fmt"
4 "net/http"
5 "net/url"
6)
7
8func getRecipe(os, gpu string) (*Recipe, error) {
9 baseURL := "http://localhost:8080/v1/recipe"
10 params := url.Values{}
11 params.Add("os", os)
12 params.Add("gpu", gpu)
13
14 resp, err := http.Get(baseURL + "?" + params.Encode())
15 if err != nil {
16 return nil, err
17 }
18 defer resp.Body.Close()
19
20 if resp.StatusCode != http.StatusOK {
21 return nil, fmt.Errorf("unexpected status: %d", resp.StatusCode)
22 }
23
24 var recipe Recipe
25 if err := json.NewDecoder(resp.Body).Decode(&recipe); err != nil {
26 return nil, err
27 }
28
29 return &recipe, nil
30}

Python Client:

1import requests
2
3def get_recipe(os, gpu):
4 url = "http://localhost:8080/v1/recipe"
5 params = {"os": os, "gpu": gpu}
6
7 response = requests.get(url, params=params)
8 response.raise_for_status()
9
10 return response.json()
11
12# Usage
13recipe = get_recipe("ubuntu", "h100")
14print(f"Applied overlays: {recipe['metadata']['appliedOverlays']}")

Kubernetes Deployment

Deployment Manifest

1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: aicrd
5 namespace: aicr-system
6spec:
7 replicas: 3
8 selector:
9 matchLabels:
10 app: aicrd
11 template:
12 metadata:
13 labels:
14 app: aicrd
15 spec:
16 containers:
17 - name: server
18 image: ghcr.io/nvidia/aicrd:v1.0.0
19 ports:
20 - containerPort: 8080
21 name: http
22 env:
23 - name: PORT
24 value: "8080"
25 resources:
26 requests:
27 cpu: 100m
28 memory: 128Mi
29 limits:
30 cpu: 500m
31 memory: 512Mi
32 livenessProbe:
33 httpGet:
34 path: /health
35 port: http
36 initialDelaySeconds: 10
37 periodSeconds: 10
38 readinessProbe:
39 httpGet:
40 path: /ready
41 port: http
42 initialDelaySeconds: 5
43 periodSeconds: 5
44---
45apiVersion: v1
46kind: Service
47metadata:
48 name: aicrd
49 namespace: aicr-system
50spec:
51 selector:
52 app: aicrd
53 ports:
54 - port: 80
55 targetPort: http
56 type: ClusterIP
57---
58apiVersion: v1
59kind: ServiceMonitor
60metadata:
61 name: aicrd
62 namespace: aicr-system
63spec:
64 selector:
65 matchLabels:
66 app: aicrd
67 endpoints:
68 - port: http
69 path: /metrics
70 interval: 30s

Ingress with TLS

1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: aicrd
5 namespace: aicr-system
6 annotations:
7 cert-manager.io/cluster-issuer: letsencrypt-prod
8spec:
9 tls:
10 - hosts:
11 - api.aicr.nvidia.com
12 secretName: aicr-api-tls
13 rules:
14 - host: api.aicr.nvidia.com
15 http:
16 paths:
17 - path: /
18 pathType: Prefix
19 backend:
20 service:
21 name: aicrd
22 port:
23 number: 80

HorizontalPodAutoscaler

1apiVersion: autoscaling/v2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: aicrd
5 namespace: aicr-system
6spec:
7 scaleTargetRef:
8 apiVersion: apps/v1
9 kind: Deployment
10 name: aicrd
11 minReplicas: 3
12 maxReplicas: 10
13 metrics:
14 - type: Resource
15 resource:
16 name: cpu
17 target:
18 type: Utilization
19 averageUtilization: 70
20 - type: Pods
21 pods:
22 metric:
23 name: aicr_http_requests_in_flight
24 target:
25 type: AverageValue
26 averageValue: "50"

Performance Characteristics

Throughput

  • Rate Limit: 100 requests/second per instance (configurable)
  • Burst: 200 requests (configurable)
  • Target Latency: p50 <10ms, p99 <50ms
  • Max Concurrent: Limited by rate limiter

Resource Usage

  • CPU: ~50m idle, ~200m at 100 req/s
  • Memory: ~100MB baseline, ~200MB at peak
  • Disk: None (stateless, embedded recipe data)

Scalability

  • Horizontal: Fully stateless, linear scaling
  • Vertical: Recipe store cached in memory (sync.Once)
  • Load Balancing: Round-robin or least-connections

Caching Strategy

  • Recipe Store: Loaded once per process, cached globally
  • Client-Side: 5-minute cache via Cache-Control header
  • CDN: Recommended for public-facing deployments

Error Handling

Error Response Format

All errors follow a consistent JSON structure:

1{
2 "code": "ERROR_CODE",
3 "message": "Human-readable error message",
4 "details": {"key": "value"},
5 "requestId": "uuid",
6 "timestamp": "2025-12-25T12:00:00Z",
7 "retryable": true/false
8}

Error Codes

CodeHTTP StatusDescriptionRetryable
RATE_LIMIT_EXCEEDED429Too many requestsYes
INVALID_REQUEST400Invalid parameters or disallowed criteria valueNo
METHOD_NOT_ALLOWED405Wrong HTTP methodNo
INTERNAL_ERROR500Server errorYes
SERVICE_UNAVAILABLE503Not readyYes

Allowlist Validation Error Example:

When a request uses a criteria value not in the configured allowlist:

1{
2 "code": "INVALID_REQUEST",
3 "message": "accelerator type not allowed",
4 "details": {
5 "requested": "gb200",
6 "allowed": ["h100", "l40"]
7 },
8 "requestId": "550e8400-e29b-41d4-a716-446655440000",
9 "timestamp": "2026-01-27T12:00:00Z",
10 "retryable": false
11}

Error Handling Strategy

  1. Validation Errors: Return 400 with specific error message
  2. Rate Limiting: Return 429 with Retry-After header
  3. Panics: Recover, log, return 500
  4. Context Cancellation: Return early, cleanup resources
  5. Resource Exhaustion: Rate limiting prevents this

Security

Attack Mitigation

Rate Limiting:

  • Token bucket algorithm prevents abuse
  • Per-instance limit (shared across all clients)
  • Configurable limits and burst

Header Attacks:

  • 64KB header size limit
  • 5-second header read timeout
  • Prevents slowloris attacks

Resource Exhaustion:

  • Request timeouts (read, write, idle)
  • In-flight request limits
  • Graceful shutdown prevents connection drops

Input Validation:

  • Strict enum validation
  • Version string parsing with bounds
  • UUID validation for request IDs

Production Considerations

TLS:

  • Use reverse proxy (nginx, Envoy) for TLS termination
  • Or add TLS support to server (future enhancement)

Authentication:

  • Add API key middleware (future enhancement)
  • Or use service mesh mTLS (Istio, Linkerd)

Authorization:

  • Currently none (public API)
  • Could add rate limits per API key

Monitoring:

  • Prometheus metrics for observability
  • Request ID tracking for distributed tracing
  • Structured logging for debugging

Monitoring & Observability

Prometheus Metrics

Request Metrics:

  • aicr_http_requests_total - Total requests by method, path, status
  • aicr_http_request_duration_seconds - Request latency histogram
  • aicr_http_requests_in_flight - Current active requests

Error Metrics:

  • aicr_rate_limit_rejects_total - Rate limit rejections
  • aicr_panic_recoveries_total - Panic recoveries

Grafana Dashboard

Example queries:

1# Request rate
2rate(aicr_http_requests_total[5m])
3
4# Error rate
5rate(aicr_http_requests_total{status=~"5.."}[5m])
6
7# Latency percentiles
8histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m]))
9
10# Rate limit rejections
11rate(aicr_rate_limit_rejects_total[5m])

Alerting Rules

1groups:
2- name: aicrd
3 rules:
4 - alert: HighErrorRate
5 expr: rate(aicr_http_requests_total{status=~"5.."}[5m]) > 0.05
6 for: 5m
7 annotations:
8 summary: High error rate on aicrd
9
10 - alert: HighLatency
11 expr: histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m])) > 0.1
12 for: 5m
13 annotations:
14 summary: High latency on aicrd
15
16 - alert: HighRateLimitRejects
17 expr: rate(aicr_rate_limit_rejects_total[5m]) > 10
18 for: 5m
19 annotations:
20 summary: High rate limit rejections

Distributed Tracing

Request ID tracking enables correlation:

  1. Client sends request with X-Request-Id header
  2. Server logs all operations with request ID
  3. Response includes same X-Request-Id
  4. Client can correlate logs across services

Future: OpenTelemetry integration for full tracing

Testing Strategy

Unit Tests

  • Handler validation logic
  • Middleware functionality
  • Error response formatting
  • Query parsing

Integration Tests

  • Full HTTP request/response cycle
  • Rate limiting behavior
  • Graceful shutdown
  • Health/ready endpoints

Load Tests

  • Sustained load at rate limit
  • Burst handling
  • Latency under load
  • Memory stability

Example Test

1func TestRecipeHandler(t *testing.T) {
2 // Create test server
3 builder := recipe.NewBuilder()
4 handler := builder.HandleRecipes
5
6 // Create test request
7 req := httptest.NewRequest(
8 "GET",
9 "/v1/recipe?os=ubuntu&gpu=h100",
10 nil,
11 )
12 w := httptest.NewRecorder()
13
14 // Execute handler
15 handler(w, req)
16
17 // Verify response
18 assert.Equal(t, http.StatusOK, w.Code)
19
20 var resp recipe.Recipe
21 err := json.Unmarshal(w.Body.Bytes(), &resp)
22 assert.NoError(t, err)
23 assert.Equal(t, "ubuntu", resp.Request.Os)
24}

Dependencies

External Libraries

  • net/http - Standard HTTP server
  • golang.org/x/time/rate - Rate limiting
  • golang.org/x/sync/errgroup - Concurrent error handling
  • github.com/prometheus/client_golang - Prometheus metrics
  • github.com/google/uuid - UUID generation
  • gopkg.in/yaml.v3 - Recipe store parsing
  • log/slog - Structured logging

Internal Packages

  • pkg/recipe - Recipe building logic
  • pkg/measurement - Data model
  • pkg/version - Semantic versioning
  • pkg/serializer - JSON response formatting
  • pkg/logging - Logging configuration

Build & Deployment

Automated CI/CD Pipeline

Production builds are automated through GitHub Actions workflows. When a semantic version tag is pushed (e.g., v0.8.12), the on-tag.yaml workflow:

  1. Validates code with Go CI (tests + linting)
  2. Builds multi-platform binaries and container images with GoReleaser and ko
  3. Generates SBOMs (SPDX for binaries and for containers)
  4. Attests images with SLSA v1.0 provenance and SBOM attestations
  5. Deploys to Google Cloud Run with Workload Identity Federation

Supply Chain Security:

  • SLSA Build Level 3 compliance
  • Cosign keyless signing with Fulcio + Rekor
  • GitHub Attestation API for provenance
  • Multi-platform builds: darwin/linux × amd64/arm64

Verify Release Artifacts:

$# Get latest release tag
$export TAG=$(curl -s https://api.github.com/repos/NVIDIA/aicr/releases/latest | jq -r '.tag_name')
$
$# Verify attestations
$gh attestation verify oci://ghcr.io/nvidia/aicrd:${TAG} --owner nvidia

For detailed CI/CD architecture, see CONTRIBUTING.md and Architecture Overview.

Local Build Configuration

For local development and testing:

1VERSION ?= $(shell git describe --tags --always --dirty)
2COMMIT ?= $(shell git rev-parse --short HEAD)
3DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
4
5LDFLAGS := -X github.com/NVIDIA/aicr/pkg/api.version=$(VERSION)
6LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.commit=$(COMMIT)
7LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.date=$(DATE)
8
9go build -ldflags="$(LDFLAGS)" -o bin/aicrd ./cmd/aicrd

Container Image

Production images are built with ko (automated in CI/CD). For local development:

1FROM golang:1.26-alpine AS builder
2WORKDIR /app
3COPY . .
4RUN go build -ldflags="-X github.com/NVIDIA/aicr/pkg/api.version=v1.0.0" \
5 -o /bin/aicrd ./cmd/aicrd
6
7FROM alpine:3.19
8RUN apk --no-cache add ca-certificates
9COPY --from=builder /bin/aicrd /usr/local/bin/
10EXPOSE 8080
11ENTRYPOINT ["aicrd"]

Note: Production images use distroless base (gcr.io/distroless/static) for minimal attack surface.

Environment Variables

VariableDefaultDescription
PORT8080Server port
AICR_ALLOWED_ACCELERATORS(none)Comma-separated list of allowed GPU types (e.g., h100,l40). If not set, all types allowed.
AICR_ALLOWED_SERVICES(none)Comma-separated list of allowed K8s services (e.g., eks,gke). If not set, all services allowed.
AICR_ALLOWED_INTENTS(none)Comma-separated list of allowed intents (e.g., training). If not set, all intents allowed.
AICR_ALLOWED_OS(none)Comma-separated list of allowed OS types (e.g., ubuntu,rhel). If not set, all OS types allowed.

Criteria Allowlists

When allowlist environment variables are configured, the API server validates incoming requests against the allowed values. This enables operators to restrict the API to specific configurations.

$# Start server with restricted accelerators
$export AICR_ALLOWED_ACCELERATORS=h100,l40
$export AICR_ALLOWED_SERVICES=eks,gke
$./aicrd
$
$# Server logs on startup:
$# INFO criteria allowlists configured accelerators=2 services=2 intents=0 os_types=0
$# DEBUG criteria allowlists loaded accelerators=["h100","l40"] services=["eks","gke"] intents=[] os_types=[]

Validation behavior:

  • Requests with disallowed values return HTTP 400 with error details
  • The any value is always allowed regardless of allowlist
  • Both /v1/recipe and /v1/bundle endpoints enforce allowlists
  • CLI (aicr) is not affected by allowlists

Extension and Operating Patterns

Forward-looking guidance — Future Enhancements, Production Deployment Patterns, Reliability Patterns, Performance Optimization, Security Hardening, and Observability extensions — has moved to a dedicated page so this document can stay focused on what the API server does today.

See API Server: Extension and Operating Patterns.

References

Official Documentation

Production Patterns

HTTP & APIs

Observability

Security

Performance

Reliability