Kubernetes Deployment

View as Markdown

Deploy the AICR API Server in your Kubernetes cluster for self-hosted recipe generation.

Overview

API Server deployment enables self-hosted recipe generation:

  • Isolated deployment: Recipe data stays within your infrastructure
  • Custom recipes: Modify embedded recipe data (see recipes/)
  • High availability: Deploy multiple replicas with load balancing
  • Observability: Prometheus /metrics endpoint and structured logging

API Server scope:

  • Recipe generation from query parameters (query mode)
  • Does not capture snapshots (use agent Job or CLI)
  • Generates bundles via POST /v1/bundle
  • Does not analyze snapshots (query mode only)

Agent deployment (separate component):

  • Kubernetes Job captures cluster configuration
  • Writes snapshot to ConfigMap via Kubernetes API
  • Requires RBAC: ServiceAccount with ConfigMap create/update permissions
  • See Agent Deployment

Typical workflow:

  1. Deploy agent Job → Captures snapshot → Writes to ConfigMap
  2. CLI reads ConfigMap → Generates recipe → Writes to file or ConfigMap
  3. CLI reads recipe → Generates bundle → Writes to filesystem
  4. Apply bundle to cluster (Helm install, kubectl apply)

Quick Start

$# Create namespace
$kubectl create namespace aicr
$
$# Deploy API server (save the manifest from the Deployment section below as aicrd-deployment.yaml)
$kubectl apply -f aicrd-deployment.yaml
$
$# Check deployment
$kubectl get pods -n aicr
$kubectl get svc -n aicr

Helm chart: Not yet available. Use the manual manifests below.

Manual Deployment

1. Create Namespace

1# namespace.yaml
2apiVersion: v1
3kind: Namespace
4metadata:
5 name: aicr
6 labels:
7 app: aicrd
$kubectl apply -f namespace.yaml

2. Create Deployment

1# deployment.yaml
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: aicrd
6 namespace: aicr
7 labels:
8 app: aicrd
9spec:
10 replicas: 3
11 selector:
12 matchLabels:
13 app: aicrd
14 template:
15 metadata:
16 labels:
17 app: aicrd
18 annotations:
19 prometheus.io/scrape: "true"
20 prometheus.io/port: "8080"
21 prometheus.io/path: "/metrics"
22 spec:
23 securityContext:
24 runAsNonRoot: true
25 runAsUser: 65532
26 fsGroup: 65532
27
28 containers:
29 - name: api-server
30 image: ghcr.io/nvidia/aicrd:latest
31 imagePullPolicy: IfNotPresent
32
33 ports:
34 - name: http
35 containerPort: 8080
36 protocol: TCP
37
38 env:
39 - name: PORT
40 value: "8080"
41 - name: AICR_LOG_LEVEL
42 value: "info"
43
44 livenessProbe:
45 httpGet:
46 path: /health
47 port: http
48 initialDelaySeconds: 10
49 periodSeconds: 30
50 timeoutSeconds: 5
51 failureThreshold: 3
52
53 readinessProbe:
54 httpGet:
55 path: /ready
56 port: http
57 initialDelaySeconds: 5
58 periodSeconds: 10
59 timeoutSeconds: 5
60 failureThreshold: 3
61
62 resources:
63 requests:
64 cpu: 100m
65 memory: 128Mi
66 limits:
67 cpu: 500m
68 memory: 512Mi
69
70 securityContext:
71 allowPrivilegeEscalation: false
72 readOnlyRootFilesystem: true
73 capabilities:
74 drop: ["ALL"]
$kubectl apply -f deployment.yaml

3. Create Service

1# service.yaml
2apiVersion: v1
3kind: Service
4metadata:
5 name: aicrd
6 namespace: aicr
7 labels:
8 app: aicrd
9spec:
10 type: ClusterIP
11 selector:
12 app: aicrd
13 ports:
14 - name: http
15 port: 80
16 targetPort: http
17 protocol: TCP
$kubectl apply -f service.yaml

4. Create Ingress (Optional)

1# ingress.yaml
2apiVersion: networking.k8s.io/v1
3kind: Ingress
4metadata:
5 name: aicrd
6 namespace: aicr
7 annotations:
8 cert-manager.io/cluster-issuer: letsencrypt-prod
9 nginx.ingress.kubernetes.io/rate-limit: "100"
10spec:
11 ingressClassName: nginx
12 tls:
13 - hosts:
14 - aicr.yourdomain.com
15 secretName: aicr-tls
16 rules:
17 - host: aicr.yourdomain.com
18 http:
19 paths:
20 - path: /
21 pathType: Prefix
22 backend:
23 service:
24 name: aicrd
25 port:
26 number: 80
$kubectl apply -f ingress.yaml

Capturing Snapshots (Agent)

The API server only generates recipes and bundles — it does not capture cluster state. Snapshot capture is a separate concern handled by the AICR agent Job, including its RBAC (ServiceAccount, Role, ClusterRole), the privileged-mode requirement, ConfigMap storage (cm://<ns>/<name>), and the full snapshot → recipe → bundle CLI flow. That material is documented canonically in Agent Deployment and is not duplicated here.

Configuration Options

Environment Variables

VariableDefaultDescription
PORT8080HTTP server port
AICR_LOG_LEVELinfoLogging level: debug, info, warn, error
RATE_LIMIT100Requests per second
RATE_BURST200Burst capacity
READ_TIMEOUT30sHTTP read timeout
WRITE_TIMEOUT30sHTTP write timeout
IDLE_TIMEOUT60sHTTP idle timeout

Note: The API server uses structured JSON logging to stderr. The CLI supports three logging modes (CLI/Text/JSON), but the API server always uses JSON for consistent log aggregation.

ConfigMap for Custom Recipe Data (Advanced)

Note: This example shows the concept of mounting custom recipe data. The actual recipe format uses a base-plus-overlay architecture. See recipes/ for the current schema (overlays/*.yaml including base.yaml).

1# configmap.yaml - Example showing custom recipe data mounting
2apiVersion: v1
3kind: ConfigMap
4metadata:
5 name: aicr-recipe-data
6 namespace: aicr
7data:
8 overlays/base.yaml: |
9 # Your custom base recipe
10 apiVersion: aicr.nvidia.com/v1alpha1
11 kind: RecipeMetadata
12 # ... (see recipes/overlays/base.yaml for schema)

Mount in deployment:

1spec:
2 template:
3 spec:
4 volumes:
5 - name: recipe-data
6 configMap:
7 name: aicr-recipe-data
8 containers:
9 - name: api-server
10 volumeMounts:
11 - name: recipe-data
12 mountPath: /data
13 env:
14 - name: RECIPE_DATA_PATH
15 value: /data

High Availability

Horizontal Pod Autoscaler

1# hpa.yaml
2apiVersion: autoscaling/v2
3kind: HorizontalPodAutoscaler
4metadata:
5 name: aicrd
6 namespace: aicr
7spec:
8 scaleTargetRef:
9 apiVersion: apps/v1
10 kind: Deployment
11 name: aicrd
12 minReplicas: 3
13 maxReplicas: 10
14 metrics:
15 - type: Resource
16 resource:
17 name: cpu
18 target:
19 type: Utilization
20 averageUtilization: 70
21 - type: Resource
22 resource:
23 name: memory
24 target:
25 type: Utilization
26 averageUtilization: 80
27 behavior:
28 scaleDown:
29 stabilizationWindowSeconds: 300
30 policies:
31 - type: Percent
32 value: 50
33 periodSeconds: 60
34 scaleUp:
35 stabilizationWindowSeconds: 0
36 policies:
37 - type: Percent
38 value: 100
39 periodSeconds: 15
$kubectl apply -f hpa.yaml

Pod Disruption Budget

1# pdb.yaml
2apiVersion: policy/v1
3kind: PodDisruptionBudget
4metadata:
5 name: aicrd
6 namespace: aicr
7spec:
8 minAvailable: 2
9 selector:
10 matchLabels:
11 app: aicrd
$kubectl apply -f pdb.yaml

Monitoring

Prometheus ServiceMonitor

1# servicemonitor.yaml
2apiVersion: monitoring.coreos.com/v1
3kind: ServiceMonitor
4metadata:
5 name: aicrd
6 namespace: aicr
7 labels:
8 app: aicrd
9spec:
10 selector:
11 matchLabels:
12 app: aicrd
13 endpoints:
14 - port: http
15 path: /metrics
16 interval: 30s
17 scrapeTimeout: 10s
$kubectl apply -f servicemonitor.yaml

Grafana Dashboard

Key panels:

  • Request rate (by status code)
  • Request duration (p50, p95, p99)
  • Error rate
  • Rate limit rejections
  • Active connections

Security

Network Policies

1# networkpolicy.yaml
2apiVersion: networking.k8s.io/v1
3kind: NetworkPolicy
4metadata:
5 name: aicrd
6 namespace: aicr
7spec:
8 podSelector:
9 matchLabels:
10 app: aicrd
11 policyTypes:
12 - Ingress
13 - Egress
14 ingress:
15 - from:
16 - namespaceSelector: {}
17 ports:
18 - protocol: TCP
19 port: 8080
20 egress:
21 - to:
22 - namespaceSelector: {}
23 ports:
24 - protocol: TCP
25 port: 53 # DNS
26 - to:
27 - namespaceSelector:
28 matchLabels:
29 name: kube-system
30 ports:
31 - protocol: TCP
32 port: 443 # Kubernetes API

Pod Security Standards

1# Add to namespace
2apiVersion: v1
3kind: Namespace
4metadata:
5 name: aicr
6 labels:
7 pod-security.kubernetes.io/enforce: restricted
8 pod-security.kubernetes.io/audit: restricted
9 pod-security.kubernetes.io/warn: restricted

RBAC (If API server needs K8s access)

1# serviceaccount.yaml
2apiVersion: v1
3kind: ServiceAccount
4metadata:
5 name: aicrd
6 namespace: aicr
7
8---
9# role.yaml
10apiVersion: rbac.authorization.k8s.io/v1
11kind: ClusterRole
12metadata:
13 name: aicrd
14rules:
15 - apiGroups: [""]
16 resources: ["nodes", "pods"]
17 verbs: ["get", "list"]
18
19---
20# rolebinding.yaml
21apiVersion: rbac.authorization.k8s.io/v1
22kind: ClusterRoleBinding
23metadata:
24 name: aicrd
25roleRef:
26 apiGroup: rbac.authorization.k8s.io
27 kind: ClusterRole
28 name: aicrd
29subjects:
30 - kind: ServiceAccount
31 name: aicrd
32 namespace: aicr

Troubleshooting

Check Pod Status

$# Pod status
$kubectl get pods -n aicr
$
$# Describe pod
$kubectl describe pod -n aicr -l app=aicrd
$
$# View logs
$kubectl logs -n aicr -l app=aicrd
$
$# Follow logs
$kubectl logs -n aicr -l app=aicrd -f

Check Service

$# Service status
$kubectl get svc -n aicr
$
$# Endpoints
$kubectl get endpoints -n aicr
$
$# Test from within cluster
$kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
> curl http://aicrd.aicr.svc.cluster.local/health

Check Ingress

$# Ingress status
$kubectl get ingress -n aicr
$
$# Describe ingress
$kubectl describe ingress aicrd -n aicr
$
$# Check cert-manager certificate
$kubectl get certificate -n aicr

Performance Issues

$# Check resource usage
$kubectl top pods -n aicr
$
$# Check HPA status
$kubectl get hpa -n aicr
$
$# Check metrics
$kubectl exec -n aicr -it deploy/aicrd -- \
> wget -qO- http://localhost:8080/metrics

Connection Refused

  1. Check service exists: kubectl get svc -n aicr
  2. Check endpoints: kubectl get endpoints -n aicr
  3. Check pod is ready: kubectl get pods -n aicr
  4. Check readiness probe: kubectl describe pod -n aicr <pod-name>

Rate Limiting

Check rate limit settings:

$kubectl exec -n aicr deploy/aicrd -- env | grep RATE

Adjust via deployment:

1env:
2 - name: RATE_LIMIT
3 value: "200" # Increase limit
4 - name: RATE_BURST
5 value: "400"

Upgrading

Rolling Update

$# Update image
$kubectl set image deployment/aicrd \
> api-server=ghcr.io/nvidia/aicrd:v0.8.0 \
> -n aicr
$
$# Watch rollout
$kubectl rollout status deployment/aicrd -n aicr
$
$# Rollback if needed
$kubectl rollout undo deployment/aicrd -n aicr

The aicrd server is stateless — it holds no persistent data, so there is nothing to back up beyond the manifests in this guide (keep them in version control). Standard Kubernetes patterns apply unchanged for blue-green/canary rollouts, backup/restore of resource definitions, and right-sizing requests and limits (start small — see the requests/limits in the Deployment above — and adjust from kubectl top output or a Vertical Pod Autoscaler). Refer to the upstream Kubernetes documentation for these; none require AICR-specific handling.

See Also