API Server Architecture
The aicrd provides HTTP REST API access to AICR configuration recipe generation and bundle creation capabilities.
Overview
The API server provides HTTP REST access to Steps 2 and 4 of the AICR workflow — recipe generation (GET /v1/recipe) and bundle creation (POST /v1/bundle). Built on Go’s net/http with middleware for rate limiting, metrics, request tracking, and graceful shutdown.
Four-Step Workflow Context
API Server scope:
- Recipe generation (Step 2, query mode only — no snapshot analysis) and bundle creation (Step 4)
- Health, readiness, and Prometheus metrics endpoints
- SLSA Build Level 3 attestations on released images
- No snapshot capture, no validation, no ConfigMap I/O — use the CLI for those
API Server configuration:
- Criteria allowlists for accelerator, service, intent, OS via
AICR_ALLOWED_*env vars - Value overrides on
/v1/bundlevia?set=bundler:path=valueand?dynamic=component:path(helm and argocd-helm deployers) - Node scheduling via
?system-node-selectorand?accelerated-node-selector
For the complete workflow (snapshot → recipe → validate → bundle, ConfigMap I/O via cm://namespace/name, agent deployment, Chainsaw E2E in tests/chainsaw/cli/), use the CLI.
Architecture Diagram
Request Flow
Complete Request Flow with Middleware
Component Details
Entry Point: cmd/aicrd/main.go
Minimal entry point:
API Package: pkg/api/server.go
Responsibilities: initialize structured logging; parse criteria allowlists; create recipe builder, query handler, and bundle handler with allowlist configuration; install signal handling; run server with middleware; handle graceful shutdown.
Key Features: version info injected via ldflags (version, commit, date); routes /v1/recipe, /v1/query, /v1/bundle; allowlists from AICR_ALLOWED_* env vars; production defaults; graceful shutdown on SIGINT/SIGTERM.
Initialization Flow
Server Infrastructure: pkg/server/
Production-ready HTTP server implementation. Core files:
server.go — Server struct (config, HTTP server, rate limiter, ready state); functional options; graceful shutdown via signal.NotifyContext and errgroup; default root handler listing routes.
config.go — Configuration struct with defaults; PORT env var; read/write/idle/shutdown timeouts; rate-limit parameters.
middleware.go — Middleware chain builder; request ID (UUID generation/validation), rate limiting (token bucket), panic recovery, structured logging.
health.go — /health (liveness, always 200) and /ready (readiness, 503 when not ready); JSON status + timestamp.
errors.go — Standardized error response struct, error codes (RATE_LIMIT_EXCEEDED, INTERNAL_ERROR, …), WriteError helper with request ID.
metrics.go — Prometheus metrics:
aicr_http_requests_total(counter; method, path, status)aicr_http_request_duration_seconds(histogram; method, path)aicr_http_requests_in_flight(gauge)aicr_rate_limit_rejects_total(counter)aicr_panic_recoveries_total(counter)
context.go — Context key type for request ID storage.
doc.go — Package documentation: usage, endpoints, error handling, deployment.
Request Processing Pipeline
Recipe Handler: pkg/recipe/handler.go
HTTP handler for recipe generation endpoint. Supports both GET (query parameters) and POST (criteria body) methods.
Handler Flow
POST Request Body Format
POST requests accept a RecipeCriteria resource (Kubernetes-style):
Supported content types:
application/json- JSON formatapplication/x-yaml- YAML format
Query Parameter Parsing
Recipe Builder: pkg/recipe/builder.go
Shared with CLI - same logic as described in CLI architecture.
API Endpoints
Recipe Generation
Endpoints GET /v1/recipe (query parameters) and POST /v1/recipe (criteria body, application/json or application/x-yaml). See Query Parameter Parsing above for the GET parameter table and POST Request Body Format above for the body schema.
Response: 200 OK
metadata.excludedOverlays is optional. When present, it contains structured {name, reason} entries so API consumers can distinguish direct constraint failures from post-compose mixin fallback.
Error Response: 400 Bad Request
Rate Limited: 429 Too Many Requests
Headers:
X-Request-Id- Unique request identifierX-RateLimit-Limit- Total requests allowed per secondX-RateLimit-Remaining- Requests remaining in current windowX-RateLimit-Reset- Unix timestamp when window resetsCache-Control- Caching policy (public, max-age=300)
Health Check
Endpoint: GET /health
Response: 200 OK
Readiness Check
Endpoint: GET /ready
Response: 200 OK (ready) or 503 Service Unavailable (not ready)
Metrics
Endpoint: GET /metrics
Response: Prometheus text format
Root
Endpoint: GET /
Response: 200 OK
Usage Examples
cURL Examples
Demo API Server Deployment
Note: This section describes the demonstration deployment of the
aicrdAPI server for testing and development purposes only. It is not a production service. Users should self-host theaicrdAPI server in their own infrastructure for production use. See the Kubernetes Deployment section below for deployment guidance.
Example: Google Cloud Run
The demo API server is deployed to Google Cloud Run as an example of how to deploy aicrd:
Demo Configuration:
- Platform: Google Cloud Run (fully managed serverless)
- Authentication: Public access (for demo purposes)
- Auto-scaling: 0-100 instances based on load
- Region:
us-west1
CI/CD Pipeline (on-tag.yaml):
Supply Chain Security:
- SLSA Build Level 3 compliance
- Signed SBOMs in SPDX format
- Attestations logged in Rekor transparency log
- Verification:
gh attestation verify oci://ghcr.io/nvidia/aicrd:TAG --owner nvidia
Demo Monitoring:
- Health endpoint:
/health - Readiness endpoint:
/ready - Prometheus metrics:
/metrics - Request tracing with
X-Request-Idheaders
Scaling Behavior (demo):
- Min instances: 0 (scales to zero when idle)
- Max instances: 100 (automatic scaling)
- Cold start: 2-3 seconds
- Request timeout: 30 seconds
- Concurrency: 80 requests per instance
Cloud Run Benefits (for reference):
- Zero operational overhead
- Automatic HTTPS with managed certificates
- Built-in DDoS protection
- Pay-per-use pricing (scales to zero)
- Global load balancing
Client Libraries
Go Client:
Python Client:
Kubernetes Deployment
Deployment Manifest
Ingress with TLS
HorizontalPodAutoscaler
Performance Characteristics
Throughput
- Rate Limit: 100 requests/second per instance (configurable)
- Burst: 200 requests (configurable)
- Target Latency: p50 <10ms, p99 <50ms
- Max Concurrent: Limited by rate limiter
Resource Usage
- CPU: ~50m idle, ~200m at 100 req/s
- Memory: ~100MB baseline, ~200MB at peak
- Disk: None (stateless, embedded recipe data)
Scalability
- Horizontal: Fully stateless, linear scaling
- Vertical: Recipe store cached in memory (sync.Once)
- Load Balancing: Round-robin or least-connections
Caching Strategy
- Recipe Store: Loaded once per process, cached globally
- Client-Side: 5-minute cache via Cache-Control header
- CDN: Recommended for public-facing deployments
Error Handling
Error Response Format
All errors follow a consistent JSON structure:
Error Codes
Allowlist Validation Error Example:
When a request uses a criteria value not in the configured allowlist:
Error Handling Strategy
- Validation Errors: Return 400 with specific error message
- Rate Limiting: Return 429 with Retry-After header
- Panics: Recover, log, return 500
- Context Cancellation: Return early, cleanup resources
- Resource Exhaustion: Rate limiting prevents this
Security
Attack Mitigation
Rate Limiting:
- Token bucket algorithm prevents abuse
- Per-instance limit (shared across all clients)
- Configurable limits and burst
Header Attacks:
- 64KB header size limit
- 5-second header read timeout
- Prevents slowloris attacks
Resource Exhaustion:
- Request timeouts (read, write, idle)
- In-flight request limits
- Graceful shutdown prevents connection drops
Input Validation:
- Strict enum validation
- Version string parsing with bounds
- UUID validation for request IDs
Production Considerations
TLS:
- Use reverse proxy (nginx, Envoy) for TLS termination
- Or add TLS support to server (future enhancement)
Authentication:
- Add API key middleware (future enhancement)
- Or use service mesh mTLS (Istio, Linkerd)
Authorization:
- Currently none (public API)
- Could add rate limits per API key
Monitoring:
- Prometheus metrics for observability
- Request ID tracking for distributed tracing
- Structured logging for debugging
Monitoring & Observability
Prometheus Metrics
Request Metrics:
aicr_http_requests_total- Total requests by method, path, statusaicr_http_request_duration_seconds- Request latency histogramaicr_http_requests_in_flight- Current active requests
Error Metrics:
aicr_rate_limit_rejects_total- Rate limit rejectionsaicr_panic_recoveries_total- Panic recoveries
Grafana Dashboard
Example queries:
Alerting Rules
Distributed Tracing
Request ID tracking enables correlation:
- Client sends request with
X-Request-Idheader - Server logs all operations with request ID
- Response includes same
X-Request-Id - Client can correlate logs across services
Future: OpenTelemetry integration for full tracing
Testing Strategy
Unit Tests
- Handler validation logic
- Middleware functionality
- Error response formatting
- Query parsing
Integration Tests
- Full HTTP request/response cycle
- Rate limiting behavior
- Graceful shutdown
- Health/ready endpoints
Load Tests
- Sustained load at rate limit
- Burst handling
- Latency under load
- Memory stability
Example Test
Dependencies
External Libraries
net/http- Standard HTTP servergolang.org/x/time/rate- Rate limitinggolang.org/x/sync/errgroup- Concurrent error handlinggithub.com/prometheus/client_golang- Prometheus metricsgithub.com/google/uuid- UUID generationgopkg.in/yaml.v3- Recipe store parsinglog/slog- Structured logging
Internal Packages
pkg/recipe- Recipe building logicpkg/measurement- Data modelpkg/version- Semantic versioningpkg/serializer- JSON response formattingpkg/logging- Logging configuration
Build & Deployment
Automated CI/CD Pipeline
Production builds are automated through GitHub Actions workflows. When a semantic version tag is pushed (e.g., v0.8.12), the on-tag.yaml workflow:
- Validates code with Go CI (tests + linting)
- Builds multi-platform binaries and container images with GoReleaser and ko
- Generates SBOMs (SPDX for binaries and for containers)
- Attests images with SLSA v1.0 provenance and SBOM attestations
- Deploys to Google Cloud Run with Workload Identity Federation
Supply Chain Security:
- SLSA Build Level 3 compliance
- Cosign keyless signing with Fulcio + Rekor
- GitHub Attestation API for provenance
- Multi-platform builds: darwin/linux × amd64/arm64
Verify Release Artifacts:
For detailed CI/CD architecture, see CONTRIBUTING.md and Architecture Overview.
Local Build Configuration
For local development and testing:
Container Image
Production images are built with ko (automated in CI/CD). For local development:
Note: Production images use distroless base (gcr.io/distroless/static) for minimal attack surface.
Environment Variables
Criteria Allowlists
When allowlist environment variables are configured, the API server validates incoming requests against the allowed values. This enables operators to restrict the API to specific configurations.
Validation behavior:
- Requests with disallowed values return HTTP 400 with error details
- The
anyvalue is always allowed regardless of allowlist - Both
/v1/recipeand/v1/bundleendpoints enforce allowlists - CLI (
aicr) is not affected by allowlists
Extension and Operating Patterns
Forward-looking guidance — Future Enhancements, Production Deployment Patterns, Reliability Patterns, Performance Optimization, Security Hardening, and Observability extensions — has moved to a dedicated page so this document can stay focused on what the API server does today.
See API Server: Extension and Operating Patterns.
References
Official Documentation
- net/http Package - Go standard HTTP library
- golang.org/x/time/rate - Token bucket rate limiter
- errgroup - Concurrent error handling
- context Package - Request cancellation and deadlines
- slog Package - Structured logging
Production Patterns
- Kubernetes Patterns - Deployment, scaling, networking
- Twelve-Factor App - Cloud-native application principles
- Google SRE Book - Site reliability engineering
- Release Engineering - Deployment best practices
HTTP & APIs
- HTTP/2 in Go - HTTP/2 server push
- RESTful API Design - Google Cloud API design guide
- OpenAPI Specification - API documentation standard
- API Versioning - Version management strategies
Observability
- Prometheus Go Client - Metrics collection
- OpenTelemetry Go - Distributed tracing
- Grafana Dashboards - Metrics visualization
- Jaeger Tracing - Distributed tracing backend
Security
- OWASP API Security - API security risks
- HTTP Security Headers - Security header reference
- Rate Limiting Strategies - Google Cloud guide
- mTLS in Kubernetes - Istio mutual TLS
Performance
- Go Performance Tips - Optimization techniques
- pprof Profiler - CPU and memory profiling
- High Performance Go - Dave Cheney’s workshop
- Go Memory Model - Concurrency guarantees
Reliability
- Circuit Breaker Pattern - Failure isolation
- Retry with Backoff - Resilient retries
- Chaos Engineering - Resilience testing principles
- SLOs and Error Budgets - Reliability targets