API Server | NVIDIA AI Cluster Runtime

The aicrd provides HTTP REST API access to AICR configuration recipe generation and bundle creation capabilities.

Overview

The API server provides HTTP REST access to Steps 2 and 4 of the AICR workflow — recipe generation (GET /v1/recipe) and bundle creation (POST /v1/bundle). Built on Go’s net/http with middleware for rate limiting, metrics, request tracking, and graceful shutdown.

Four-Step Workflow Context

┌──────────────┐      ┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│   Snapshot   │─────▶│    Recipe    │─────▶│   Validate   │─────▶│    Bundle    │
└──────────────┘      └──────────────┘      └──────────────┘      └──────────────┘
   CLI/Agent only       API Server           CLI only            API Server

API Server scope:

Recipe generation (Step 2, query mode only — no snapshot analysis) and bundle creation (Step 4)
Health, readiness, and Prometheus metrics endpoints
SLSA Build Level 3 attestations on released images
No snapshot capture, no validation, no ConfigMap I/O — use the CLI for those

API Server configuration:

Criteria allowlists for accelerator, service, intent, OS via AICR_ALLOWED_* env vars
Value overrides on /v1/bundle via ?set=bundler:path=value and ?dynamic=component:path (helm and argocd-helm deployers)
Node scheduling via ?system-node-selector and ?accelerated-node-selector

For the complete workflow (snapshot → recipe → validate → bundle, ConfigMap I/O via cm://namespace/name, agent deployment, Chainsaw E2E in tests/chainsaw/cli/), use the CLI.

Architecture Diagram

Request Flow

Complete Request Flow with Middleware

Component Details

Entry Point: `cmd/aicrd/main.go`

Minimal entry point:

1 package main
2 
3 import (
4     "log"
5     "github.com/NVIDIA/aicr/pkg/api"
6 )
7 
8 func main() {
9     if err := api.Serve(); err != nil {
10         log.Fatal(err)
11     }
12 }

API Package: `pkg/api/server.go`

Responsibilities: initialize structured logging; parse criteria allowlists; create recipe builder, query handler, and bundle handler with allowlist configuration; install signal handling; run server with middleware; handle graceful shutdown.

Key Features: version info injected via ldflags (version, commit, date); routes /v1/recipe, /v1/query, /v1/bundle; allowlists from AICR_ALLOWED_* env vars; production defaults; graceful shutdown on SIGINT/SIGTERM.

Initialization Flow

1 func Serve() error {
2     // Signal handling spans pre-Run setup and request handling
3     ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
4     defer stop()
5 
6     logging.SetDefaultStructuredLogger(name, version)
7 
8     allowLists, err := recipe.ParseAllowListsFromEnv()
9     if err != nil {
10         return errors.Wrap(errors.ErrCodeInternal, "failed to parse allowlists from environment", err)
11     }
12 
13     rb := recipe.NewBuilder(recipe.WithVersion(version), recipe.WithAllowLists(allowLists))
14     bb, err := bundler.New(bundler.WithAllowLists(allowLists))
15     if err != nil {
16         return errors.Wrap(errors.ErrCodeInternal, "failed to create bundler", err)
17     }
18 
19     s := server.New(
20         server.WithName(name),
21         server.WithVersion(version),
22         server.WithHandler(map[string]http.HandlerFunc{
23             "/v1/recipe": rb.HandleRecipes,
24             "/v1/query":  rb.HandleQuery,
25             "/v1/bundle": bb.HandleBundles,
26         }),
27     )
28     return s.Run(ctx)
29 }

Server Infrastructure: `pkg/server/`

Production-ready HTTP server implementation. Core files:

server.go — Server struct (config, HTTP server, rate limiter, ready state); functional options; graceful shutdown via signal.NotifyContext and errgroup; default root handler listing routes.

config.go — Configuration struct with defaults; PORT env var; read/write/idle/shutdown timeouts; rate-limit parameters.

middleware.go — Middleware chain builder; request ID (UUID generation/validation), rate limiting (token bucket), panic recovery, structured logging.

health.go — /health (liveness, always 200) and /ready (readiness, 503 when not ready); JSON status + timestamp.

errors.go — Standardized error response struct, error codes (RATE_LIMIT_EXCEEDED, INTERNAL_ERROR, …), WriteError helper with request ID.

metrics.go — Prometheus metrics:

aicr_http_requests_total (counter; method, path, status)
aicr_http_request_duration_seconds (histogram; method, path)
aicr_http_requests_in_flight (gauge)
aicr_rate_limit_rejects_total (counter)
aicr_panic_recoveries_total (counter)

context.go — Context key type for request ID storage.

doc.go — Package documentation: usage, endpoints, error handling, deployment.

Request Processing Pipeline

Recipe Handler: `pkg/recipe/handler.go`

HTTP handler for recipe generation endpoint. Supports both GET (query parameters) and POST (criteria body) methods.

Handler Flow

1 func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request) {
2     var criteria *Criteria
3     var err error
4 
5     // 1. Route based on HTTP method
6     switch r.Method {
7     case http.MethodGet:
8         // 2a. Parse query parameters for GET
9         criteria, err = ParseCriteriaFromRequest(r)
10     case http.MethodPost:
11         // 2b. Parse request body for POST (JSON or YAML)
12         criteria, err = ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"))
13         defer r.Body.Close()
14     default:
15         // Reject other methods
16         w.Header().Set("Allow", "GET, POST")
17         return 405
18     }
19 
20     // 3. Validate criteria format
21     if err := criteria.Validate(); err != nil {
22         return 400 with error details
23     }
24 
25     // 4. Validate against allowlists (if configured)
26     if b.AllowLists != nil {
27         if err := b.AllowLists.ValidateCriteria(criteria); err != nil {
28             return 400 with allowed values in error details
29         }
30     }
31 
32     // 5. Build recipe
33     recipe, err := b.BuildFromCriteria(r.Context(), criteria)
34     if err != nil {
35         return 500
36     }
37 
38     // 6. Set cache headers
39     w.Header().Set("Cache-Control", "public, max-age=600")
40 
41     // 7. Respond with JSON
42     serializer.RespondJSON(w, http.StatusOK, recipe)
43 }

POST Request Body Format

POST requests accept a RecipeCriteria resource (Kubernetes-style):

1 kind: RecipeCriteria
2 apiVersion: aicr.nvidia.com/v1alpha1
3 metadata:
4   name: my-criteria
5 spec:
6   service: eks
7   accelerator: gb200
8   os: ubuntu
9   intent: training

Supported content types:

application/json - JSON format
application/x-yaml - YAML format

Query Parameter Parsing

Parameter	Type	Validation	Example
`service`	ServiceType	Enum: eks, gke, aks, oke, kind, lke, any	`service=eks`
`accelerator`	AcceleratorType	Enum: h100, gb200, b200, a100, l40, rtx-pro-6000, any	`accelerator=h100`
`gpu`	AcceleratorType	Alias for accelerator	`gpu=h100`
`intent`	IntentType	Enum: training, inference, any	`intent=training`
`os`	OSType	Enum: ubuntu, rhel, cos, amazonlinux, talos, any	`os=ubuntu`
`nodes`	int	>= 0	`nodes=8`

Recipe Builder: `pkg/recipe/builder.go`

Shared with CLI - same logic as described in CLI architecture.

API Endpoints

Recipe Generation

Endpoints GET /v1/recipe (query parameters) and POST /v1/recipe (criteria body, application/json or application/x-yaml). See Query Parameter Parsing above for the GET parameter table and POST Request Body Format above for the body schema.

Response: 200 OK

1 {
2   "apiVersion": "aicr.nvidia.com/v1alpha1",
3   "kind": "Recipe",
4   "metadata": {
5     "version": "v1.0.0",
6     "created": "2025-12-25T12:00:00Z",
7     "appliedOverlays": [
8       "base",
9       "eks",
10       "eks-training",
11       "gb200-eks-training"
12     ],
13     "excludedOverlays": [
14       {
15         "name": "h100-eks-ubuntu-training",
16         "reason": "mixin-constraint-failed"
17       }
18     ],
19     "constraintWarnings": [
20       {
21         "overlay": "h100-eks-ubuntu-training",
22         "constraint": "OS.sysctl./proc/sys/kernel/osrelease",
23         "expected": ">= 6.8",
24         "actual": "5.15.0",
25         "reason": "mixin-constraint-failed: expected >= 6.8, got 5.15.0"
26       }
27     ]
28   },
29   "criteria": {
30     "service": "eks",
31     "accelerator": "gb200",
32     "intent": "training",
33     "os": "any"
34   },
35   "componentRefs": [
36     {
37       "name": "gpu-operator",
38       "version": "v25.3.3",
39       "order": 1
40     }
41   ],
42   "constraints": {
43     "driver": {
44       "version": "580.82.07"
45     }
46   }
47 }

metadata.excludedOverlays is optional. When present, it contains structured {name, reason} entries so API consumers can distinguish direct constraint failures from post-compose mixin fallback.

Error Response: 400 Bad Request

1 {
2   "code": "INVALID_REQUEST",
3   "message": "Invalid recipe criteria",
4   "details": {
5     "error": "[INVALID_REQUEST] invalid accelerator parameter: [INVALID_REQUEST] invalid accelerator type: invalid-gpu"
6   },
7   "requestId": "550e8400-e29b-41d4-a716-446655440000",
8   "timestamp": "2025-12-25T12:00:00Z",
9   "retryable": false
10 }

Rate Limited: 429 Too Many Requests

1 {
2   "code": "RATE_LIMIT_EXCEEDED",
3   "message": "Rate limit exceeded",
4   "details": {
5     "limit": 100,
6     "burst": 200
7   },
8   "requestId": "550e8400-e29b-41d4-a716-446655440000",
9   "timestamp": "2025-12-25T12:00:00Z",
10   "retryable": true
11 }

Headers:

X-Request-Id - Unique request identifier
X-RateLimit-Limit - Total requests allowed per second
X-RateLimit-Remaining - Requests remaining in current window
X-RateLimit-Reset - Unix timestamp when window resets
Cache-Control - Caching policy (public, max-age=300)

Health Check

Endpoint: GET /health

Response: 200 OK

1 {
2   "status": "healthy",
3   "timestamp": "2025-12-25T12:00:00Z"
4 }

Readiness Check

Endpoint: GET /ready

Response: 200 OK (ready) or 503 Service Unavailable (not ready)

1 {
2   "status": "ready",
3   "timestamp": "2025-12-25T12:00:00Z"
4 }

Metrics

Endpoint: GET /metrics

Response: Prometheus text format

# HELP aicr_http_requests_total Total number of HTTP requests
# TYPE aicr_http_requests_total counter
aicr_http_requests_total{method="GET",path="/v1/recipe",status="200"} 1234
# HELP aicr_http_request_duration_seconds HTTP request latency in seconds
# TYPE aicr_http_request_duration_seconds histogram
aicr_http_request_duration_seconds_bucket{method="GET",path="/v1/recipe",le="0.005"} 1000
aicr_http_request_duration_seconds_sum{method="GET",path="/v1/recipe"} 12.34
aicr_http_request_duration_seconds_count{method="GET",path="/v1/recipe"} 1234
# HELP aicr_http_requests_in_flight Current number of HTTP requests being processed
# TYPE aicr_http_requests_in_flight gauge
aicr_http_requests_in_flight 5
# HELP aicr_rate_limit_rejects_total Total number of requests rejected due to rate limiting
# TYPE aicr_rate_limit_rejects_total counter
aicr_rate_limit_rejects_total 42
# HELP aicr_panic_recoveries_total Total number of panics recovered in HTTP handlers
# TYPE aicr_panic_recoveries_total counter
aicr_panic_recoveries_total 0

Root

Endpoint: GET /

Response: 200 OK

1 {
2   "service": "aicrd",
3   "version": "v1.0.0",
4   "routes": [
5     "/v1/recipe"
6   ]
7 }

Usage Examples

cURL Examples

$ # Basic recipe request
$ curl "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
$ 
$ # Full specification
$ curl "http://localhost:8080/v1/recipe?os=ubuntu&service=eks&accelerator=gb200&intent=training&nodes=8"
$ 
$ # With request ID
$ curl -H "X-Request-Id: 550e8400-e29b-41d4-a716-446655440000" \
>   "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
$ 
$ # Health check
$ curl http://localhost:8080/health
$ 
$ # Readiness check
$ curl http://localhost:8080/ready
$ 
$ # Metrics
$ curl http://localhost:8080/metrics

Demo API Server Deployment

Note: This section describes the demonstration deployment of the aicrd API server for testing and development purposes only. It is not a production service. Users should self-host the aicrd API server in their own infrastructure for production use. See the Kubernetes Deployment section below for deployment guidance.

Example: Google Cloud Run

The demo API server is deployed to Google Cloud Run as an example of how to deploy aicrd:

Demo Configuration:

Platform: Google Cloud Run (fully managed serverless)
Authentication: Public access (for demo purposes)
Auto-scaling: 0-100 instances based on load
Region: us-west1

CI/CD Pipeline (on-tag.yaml):

Supply Chain Security:

SLSA Build Level 3 compliance
Signed SBOMs in SPDX format
Attestations logged in Rekor transparency log
Verification: gh attestation verify oci://ghcr.io/nvidia/aicrd:TAG --owner nvidia

Demo Monitoring:

Health endpoint: /health
Readiness endpoint: /ready
Prometheus metrics: /metrics
Request tracing with X-Request-Id headers

Scaling Behavior (demo):

Min instances: 0 (scales to zero when idle)
Max instances: 100 (automatic scaling)
Cold start: 2-3 seconds
Request timeout: 30 seconds
Concurrency: 80 requests per instance

Cloud Run Benefits (for reference):

Zero operational overhead
Automatic HTTPS with managed certificates
Built-in DDoS protection
Pay-per-use pricing (scales to zero)
Global load balancing

Client Libraries

Go Client:

1 import (
2     "encoding/json"
3     "fmt"
4     "net/http"
5     "net/url"
6 )
7 
8 func getRecipe(os, gpu string) (*Recipe, error) {
9     baseURL := "http://localhost:8080/v1/recipe"
10     params := url.Values{}
11     params.Add("os", os)
12     params.Add("gpu", gpu)
13     
14     resp, err := http.Get(baseURL + "?" + params.Encode())
15     if err != nil {
16         return nil, err
17     }
18     defer resp.Body.Close()
19     
20     if resp.StatusCode != http.StatusOK {
21         return nil, fmt.Errorf("unexpected status: %d", resp.StatusCode)
22     }
23     
24     var recipe Recipe
25     if err := json.NewDecoder(resp.Body).Decode(&recipe); err != nil {
26         return nil, err
27     }
28     
29     return &recipe, nil
30 }

Python Client:

1 import requests
2 
3 def get_recipe(os, gpu):
4     url = "http://localhost:8080/v1/recipe"
5     params = {"os": os, "gpu": gpu}
6     
7     response = requests.get(url, params=params)
8     response.raise_for_status()
9     
10     return response.json()
11 
12 # Usage
13 recipe = get_recipe("ubuntu", "h100")
14 print(f"Applied overlays: {recipe['metadata']['appliedOverlays']}")

Kubernetes Deployment

Deployment Manifest

1 apiVersion: apps/v1
2 kind: Deployment
3 metadata:
4   name: aicrd
5   namespace: aicr-system
6 spec:
7   replicas: 3
8   selector:
9     matchLabels:
10       app: aicrd
11   template:
12     metadata:
13       labels:
14         app: aicrd
15     spec:
16       containers:
17       - name: server
18         image: ghcr.io/nvidia/aicrd:v1.0.0
19         ports:
20         - containerPort: 8080
21           name: http
22         env:
23         - name: PORT
24           value: "8080"
25         resources:
26           requests:
27             cpu: 100m
28             memory: 128Mi
29           limits:
30             cpu: 500m
31             memory: 512Mi
32         livenessProbe:
33           httpGet:
34             path: /health
35             port: http
36           initialDelaySeconds: 10
37           periodSeconds: 10
38         readinessProbe:
39           httpGet:
40             path: /ready
41             port: http
42           initialDelaySeconds: 5
43           periodSeconds: 5
44 ---
45 apiVersion: v1
46 kind: Service
47 metadata:
48   name: aicrd
49   namespace: aicr-system
50 spec:
51   selector:
52     app: aicrd
53   ports:
54   - port: 80
55     targetPort: http
56   type: ClusterIP
57 ---
58 apiVersion: v1
59 kind: ServiceMonitor
60 metadata:
61   name: aicrd
62   namespace: aicr-system
63 spec:
64   selector:
65     matchLabels:
66       app: aicrd
67   endpoints:
68   - port: http
69     path: /metrics
70     interval: 30s

Ingress with TLS

1 apiVersion: networking.k8s.io/v1
2 kind: Ingress
3 metadata:
4   name: aicrd
5   namespace: aicr-system
6   annotations:
7     cert-manager.io/cluster-issuer: letsencrypt-prod
8 spec:
9   tls:
10   - hosts:
11     - api.aicr.nvidia.com
12     secretName: aicr-api-tls
13   rules:
14   - host: api.aicr.nvidia.com
15     http:
16       paths:
17       - path: /
18         pathType: Prefix
19         backend:
20           service:
21             name: aicrd
22             port:
23               number: 80

HorizontalPodAutoscaler

1 apiVersion: autoscaling/v2
2 kind: HorizontalPodAutoscaler
3 metadata:
4   name: aicrd
5   namespace: aicr-system
6 spec:
7   scaleTargetRef:
8     apiVersion: apps/v1
9     kind: Deployment
10     name: aicrd
11   minReplicas: 3
12   maxReplicas: 10
13   metrics:
14   - type: Resource
15     resource:
16       name: cpu
17       target:
18         type: Utilization
19         averageUtilization: 70
20   - type: Pods
21     pods:
22       metric:
23         name: aicr_http_requests_in_flight
24       target:
25         type: AverageValue
26         averageValue: "50"

Performance Characteristics

Throughput

Rate Limit: 100 requests/second per instance (configurable)
Burst: 200 requests (configurable)
Target Latency: p50 <10ms, p99 <50ms
Max Concurrent: Limited by rate limiter

Resource Usage

CPU: ~50m idle, ~200m at 100 req/s
Memory: ~100MB baseline, ~200MB at peak
Disk: None (stateless, embedded recipe data)

Scalability

Horizontal: Fully stateless, linear scaling
Vertical: Recipe store cached in memory (sync.Once)
Load Balancing: Round-robin or least-connections

Caching Strategy

Recipe Store: Loaded once per process, cached globally
Client-Side: 5-minute cache via Cache-Control header
CDN: Recommended for public-facing deployments

Error Handling

Error Response Format

All errors follow a consistent JSON structure:

1 {
2   "code": "ERROR_CODE",
3   "message": "Human-readable error message",
4   "details": {"key": "value"},
5   "requestId": "uuid",
6   "timestamp": "2025-12-25T12:00:00Z",
7   "retryable": true/false
8 }

Error Codes

Code	HTTP Status	Description	Retryable
`RATE_LIMIT_EXCEEDED`	429	Too many requests	Yes
`INVALID_REQUEST`	400	Invalid parameters or disallowed criteria value	No
`METHOD_NOT_ALLOWED`	405	Wrong HTTP method	No
`INTERNAL_ERROR`	500	Server error	Yes
`SERVICE_UNAVAILABLE`	503	Not ready	Yes

Allowlist Validation Error Example:

When a request uses a criteria value not in the configured allowlist:

1 {
2   "code": "INVALID_REQUEST",
3   "message": "accelerator type not allowed",
4   "details": {
5     "requested": "gb200",
6     "allowed": ["h100", "l40"]
7   },
8   "requestId": "550e8400-e29b-41d4-a716-446655440000",
9   "timestamp": "2026-01-27T12:00:00Z",
10   "retryable": false
11 }

Error Handling Strategy

Validation Errors: Return 400 with specific error message
Rate Limiting: Return 429 with Retry-After header
Panics: Recover, log, return 500
Context Cancellation: Return early, cleanup resources
Resource Exhaustion: Rate limiting prevents this

Security

Attack Mitigation

Rate Limiting:

Token bucket algorithm prevents abuse
Per-instance limit (shared across all clients)
Configurable limits and burst

Header Attacks:

64KB header size limit
5-second header read timeout
Prevents slowloris attacks

Resource Exhaustion:

Request timeouts (read, write, idle)
In-flight request limits
Graceful shutdown prevents connection drops

Input Validation:

Strict enum validation
Version string parsing with bounds
UUID validation for request IDs

Production Considerations

TLS:

Use reverse proxy (nginx, Envoy) for TLS termination
Or add TLS support to server (future enhancement)

Authentication:

Add API key middleware (future enhancement)
Or use service mesh mTLS (Istio, Linkerd)

Authorization:

Currently none (public API)
Could add rate limits per API key

Monitoring:

Prometheus metrics for observability
Request ID tracking for distributed tracing
Structured logging for debugging

Monitoring & Observability

Prometheus Metrics

Request Metrics:

aicr_http_requests_total - Total requests by method, path, status
aicr_http_request_duration_seconds - Request latency histogram
aicr_http_requests_in_flight - Current active requests

Error Metrics:

aicr_rate_limit_rejects_total - Rate limit rejections
aicr_panic_recoveries_total - Panic recoveries

Grafana Dashboard

Example queries:

1 # Request rate
2 rate(aicr_http_requests_total[5m])
3 
4 # Error rate
5 rate(aicr_http_requests_total{status=~"5.."}[5m])
6 
7 # Latency percentiles
8 histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m]))
9 
10 # Rate limit rejections
11 rate(aicr_rate_limit_rejects_total[5m])

Alerting Rules

1 groups:
2 - name: aicrd
3   rules:
4   - alert: HighErrorRate
5     expr: rate(aicr_http_requests_total{status=~"5.."}[5m]) > 0.05
6     for: 5m
7     annotations:
8       summary: High error rate on aicrd
9   
10   - alert: HighLatency
11     expr: histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m])) > 0.1
12     for: 5m
13     annotations:
14       summary: High latency on aicrd
15   
16   - alert: HighRateLimitRejects
17     expr: rate(aicr_rate_limit_rejects_total[5m]) > 10
18     for: 5m
19     annotations:
20       summary: High rate limit rejections

Distributed Tracing

Request ID tracking enables correlation:

Client sends request with X-Request-Id header
Server logs all operations with request ID
Response includes same X-Request-Id
Client can correlate logs across services

Future: OpenTelemetry integration for full tracing

Testing Strategy

Unit Tests

Handler validation logic
Middleware functionality
Error response formatting
Query parsing

Integration Tests

Full HTTP request/response cycle
Rate limiting behavior
Graceful shutdown
Health/ready endpoints

Load Tests

Sustained load at rate limit
Burst handling
Latency under load
Memory stability

Example Test

1 func TestRecipeHandler(t *testing.T) {
2     // Create test server
3     builder := recipe.NewBuilder()
4     handler := builder.HandleRecipes
5     
6     // Create test request
7     req := httptest.NewRequest(
8         "GET",
9         "/v1/recipe?os=ubuntu&gpu=h100",
10         nil,
11     )
12     w := httptest.NewRecorder()
13     
14     // Execute handler
15     handler(w, req)
16     
17     // Verify response
18     assert.Equal(t, http.StatusOK, w.Code)
19     
20     var resp recipe.Recipe
21     err := json.Unmarshal(w.Body.Bytes(), &resp)
22     assert.NoError(t, err)
23     assert.Equal(t, "ubuntu", resp.Request.Os)
24 }

Dependencies

External Libraries

net/http - Standard HTTP server
golang.org/x/time/rate - Rate limiting
golang.org/x/sync/errgroup - Concurrent error handling
github.com/prometheus/client_golang - Prometheus metrics
github.com/google/uuid - UUID generation
gopkg.in/yaml.v3 - Recipe store parsing
log/slog - Structured logging

Internal Packages

pkg/recipe - Recipe building logic
pkg/measurement - Data model
pkg/version - Semantic versioning
pkg/serializer - JSON response formatting
pkg/logging - Logging configuration

Build & Deployment

Automated CI/CD Pipeline

Production builds are automated through GitHub Actions workflows. When a semantic version tag is pushed (e.g., v0.8.12), the on-tag.yaml workflow:

Validates code with Go CI (tests + linting)
Builds multi-platform binaries and container images with GoReleaser and ko
Generates SBOMs (SPDX for binaries and for containers)
Attests images with SLSA v1.0 provenance and SBOM attestations
Deploys to Google Cloud Run with Workload Identity Federation

Supply Chain Security:

SLSA Build Level 3 compliance
Cosign keyless signing with Fulcio + Rekor
GitHub Attestation API for provenance
Multi-platform builds: darwin/linux × amd64/arm64

Verify Release Artifacts:

$ # Get latest release tag
$ export TAG=$(curl -s https://api.github.com/repos/NVIDIA/aicr/releases/latest | jq -r '.tag_name')
$ 
$ # Verify attestations
$ gh attestation verify oci://ghcr.io/nvidia/aicrd:${TAG} --owner nvidia

For detailed CI/CD architecture, see CONTRIBUTING.md and Architecture Overview.

Local Build Configuration

For local development and testing:

1 VERSION ?= $(shell git describe --tags --always --dirty)
2 COMMIT ?= $(shell git rev-parse --short HEAD)
3 DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
4 
5 LDFLAGS := -X github.com/NVIDIA/aicr/pkg/api.version=$(VERSION)
6 LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.commit=$(COMMIT)
7 LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.date=$(DATE)
8 
9 go build -ldflags="$(LDFLAGS)" -o bin/aicrd ./cmd/aicrd

Container Image

Production images are built with ko (automated in CI/CD). For local development:

1 FROM golang:1.26-alpine AS builder
2 WORKDIR /app
3 COPY . .
4 RUN go build -ldflags="-X github.com/NVIDIA/aicr/pkg/api.version=v1.0.0" \
5     -o /bin/aicrd ./cmd/aicrd
6 
7 FROM alpine:3.19
8 RUN apk --no-cache add ca-certificates
9 COPY --from=builder /bin/aicrd /usr/local/bin/
10 EXPOSE 8080
11 ENTRYPOINT ["aicrd"]

Note: Production images use distroless base (gcr.io/distroless/static) for minimal attack surface.

Environment Variables

Variable	Default	Description
`PORT`	`8080`	Server port
`AICR_ALLOWED_ACCELERATORS`	(none)	Comma-separated list of allowed GPU types (e.g., `h100,l40`). If not set, all types allowed.
`AICR_ALLOWED_SERVICES`	(none)	Comma-separated list of allowed K8s services (e.g., `eks,gke`). If not set, all services allowed.
`AICR_ALLOWED_INTENTS`	(none)	Comma-separated list of allowed intents (e.g., `training`). If not set, all intents allowed.
`AICR_ALLOWED_OS`	(none)	Comma-separated list of allowed OS types (e.g., `ubuntu,rhel`). If not set, all OS types allowed.

Criteria Allowlists

When allowlist environment variables are configured, the API server validates incoming requests against the allowed values. This enables operators to restrict the API to specific configurations.

$ # Start server with restricted accelerators
$ export AICR_ALLOWED_ACCELERATORS=h100,l40
$ export AICR_ALLOWED_SERVICES=eks,gke
$ ./aicrd
$ 
$ # Server logs on startup:
$ # INFO criteria allowlists configured accelerators=2 services=2 intents=0 os_types=0
$ # DEBUG criteria allowlists loaded accelerators=["h100","l40"] services=["eks","gke"] intents=[] os_types=[]

Validation behavior:

Requests with disallowed values return HTTP 400 with error details
The any value is always allowed regardless of allowlist
Both /v1/recipe and /v1/bundle endpoints enforce allowlists
CLI (aicr) is not affected by allowlists

Extension and Operating Patterns

Forward-looking guidance — Future Enhancements, Production Deployment Patterns, Reliability Patterns, Performance Optimization, Security Hardening, and Observability extensions — has moved to a dedicated page so this document can stay focused on what the API server does today.

See API Server: Extension and Operating Patterns.

References

Official Documentation

net/http Package - Go standard HTTP library
golang.org/x/time/rate - Token bucket rate limiter
errgroup - Concurrent error handling
context Package - Request cancellation and deadlines
slog Package - Structured logging

Production Patterns

Kubernetes Patterns - Deployment, scaling, networking
Twelve-Factor App - Cloud-native application principles
Google SRE Book - Site reliability engineering
Release Engineering - Deployment best practices

HTTP & APIs

HTTP/2 in Go - HTTP/2 server push
RESTful API Design - Google Cloud API design guide
OpenAPI Specification - API documentation standard
API Versioning - Version management strategies

Observability

Prometheus Go Client - Metrics collection
OpenTelemetry Go - Distributed tracing
Grafana Dashboards - Metrics visualization
Jaeger Tracing - Distributed tracing backend

Security

OWASP API Security - API security risks
HTTP Security Headers - Security header reference
Rate Limiting Strategies - Google Cloud guide
mTLS in Kubernetes - Istio mutual TLS

Performance

Go Performance Tips - Optimization techniques
pprof Profiler - CPU and memory profiling
High Performance Go - Dave Cheney’s workshop
Go Memory Model - Concurrency guarantees

Reliability

Circuit Breaker Pattern - Failure isolation
Retry with Backoff - Resilient retries
Chaos Engineering - Resilience testing principles
SLOs and Error Budgets - Reliability targets

Overview

Four-Step Workflow Context

Architecture Diagram

Request Flow

Complete Request Flow with Middleware

Component Details

Entry Point: cmd/aicrd/main.go

API Package: pkg/api/server.go

Initialization Flow

Server Infrastructure: pkg/server/

Request Processing Pipeline

Recipe Handler: pkg/recipe/handler.go

Handler Flow

POST Request Body Format

Query Parameter Parsing

Recipe Builder: pkg/recipe/builder.go

API Endpoints

Recipe Generation

Health Check

Readiness Check

Metrics

Root

Usage Examples

cURL Examples

Demo API Server Deployment

Example: Google Cloud Run

Client Libraries

Kubernetes Deployment

Deployment Manifest

Ingress with TLS

HorizontalPodAutoscaler

Performance Characteristics

Throughput

Resource Usage

Scalability

Caching Strategy

Error Handling

Error Response Format

Error Codes

Error Handling Strategy

Security

Attack Mitigation

Production Considerations

Monitoring & Observability

Prometheus Metrics

Grafana Dashboard

Alerting Rules

Distributed Tracing

Testing Strategy

Unit Tests

Integration Tests

Load Tests

Example Test

Dependencies

External Libraries

Internal Packages

Build & Deployment

Automated CI/CD Pipeline

Local Build Configuration

Container Image

Environment Variables

Criteria Allowlists

Extension and Operating Patterns

References

Official Documentation

Production Patterns

HTTP & APIs

Observability

Security

Performance

Reliability

Entry Point: `cmd/aicrd/main.go`

API Package: `pkg/api/server.go`

Server Infrastructure: `pkg/server/`

Recipe Handler: `pkg/recipe/handler.go`

Recipe Builder: `pkg/recipe/builder.go`