Kubernetes Deployment
Deploy the AICR API Server in your Kubernetes cluster for self-hosted recipe generation.
Overview
API Server deployment enables self-hosted recipe generation:
- Isolated deployment: Recipe data stays within your infrastructure
- Custom recipes: Modify embedded recipe data (see
recipes/) - High availability: Deploy multiple replicas with load balancing
- Observability: Prometheus
/metricsendpoint and structured logging
API Server scope:
- Recipe generation from query parameters (query mode)
- Does not capture snapshots (use agent Job or CLI)
- Generates bundles via
POST /v1/bundle - Does not analyze snapshots (query mode only)
Agent deployment (separate component):
- Kubernetes Job captures cluster configuration
- Writes snapshot to ConfigMap via Kubernetes API
- Requires RBAC: ServiceAccount with ConfigMap create/update permissions
- See Agent Deployment
Typical workflow:
- Deploy agent Job → Captures snapshot → Writes to ConfigMap
- CLI reads ConfigMap → Generates recipe → Writes to file or ConfigMap
- CLI reads recipe → Generates bundle → Writes to filesystem
- Apply bundle to cluster (Helm install, kubectl apply)
Quick Start
Helm chart: Not yet available. Use the manual manifests below.
Manual Deployment
1. Create Namespace
2. Create Deployment
3. Create Service
4. Create Ingress (Optional)
Capturing Snapshots (Agent)
The API server only generates recipes and bundles — it does not capture
cluster state. Snapshot capture is a separate concern handled by the AICR
agent Job, including its RBAC (ServiceAccount, Role, ClusterRole), the
privileged-mode requirement, ConfigMap storage (cm://<ns>/<name>), and the
full snapshot → recipe → bundle CLI flow. That material is documented
canonically in Agent Deployment and is not
duplicated here.
Configuration Options
Environment Variables
Note: The API server uses structured JSON logging to stderr. The CLI supports three logging modes (CLI/Text/JSON), but the API server always uses JSON for consistent log aggregation.
ConfigMap for Custom Recipe Data (Advanced)
Note: This example shows the concept of mounting custom recipe data. The actual recipe format uses a base-plus-overlay architecture. See
recipes/for the current schema (overlays/*.yamlincludingbase.yaml).
Mount in deployment:
High Availability
Horizontal Pod Autoscaler
Pod Disruption Budget
Monitoring
Prometheus ServiceMonitor
Grafana Dashboard
Key panels:
- Request rate (by status code)
- Request duration (p50, p95, p99)
- Error rate
- Rate limit rejections
- Active connections
Security
Network Policies
Pod Security Standards
RBAC (If API server needs K8s access)
Troubleshooting
Check Pod Status
Check Service
Check Ingress
Performance Issues
Connection Refused
- Check service exists:
kubectl get svc -n aicr - Check endpoints:
kubectl get endpoints -n aicr - Check pod is ready:
kubectl get pods -n aicr - Check readiness probe:
kubectl describe pod -n aicr <pod-name>
Rate Limiting
Check rate limit settings:
Adjust via deployment:
Upgrading
Rolling Update
The aicrd server is stateless — it holds no persistent data, so there is
nothing to back up beyond the manifests in this guide (keep them in version
control). Standard Kubernetes patterns apply unchanged for blue-green/canary
rollouts, backup/restore of resource definitions, and right-sizing requests
and limits (start small — see the requests/limits in the
Deployment above — and adjust from kubectl top
output or a Vertical Pod Autoscaler). Refer to the upstream
Kubernetes documentation
for these; none require AICR-specific handling.
See Also
- API Reference - API endpoint documentation
- Automation - CI/CD integration
- Data Flow - Understanding data architecture
- API Server Architecture - Internal architecture