Kubernetes Deployment | NVIDIA AI Cluster Runtime

Deploy the AICR API Server in your Kubernetes cluster for self-hosted recipe generation.

Overview

API Server deployment enables self-hosted recipe generation:

Isolated deployment: Recipe data stays within your infrastructure
Custom recipes: Modify embedded recipe data (see recipes/)
High availability: Deploy multiple replicas with load balancing
Observability: Prometheus /metrics endpoint and structured logging

API Server scope:

Recipe generation from query parameters (query mode)
Does not capture snapshots (use agent Job or CLI)
Generates bundles via POST /v1/bundle
Does not analyze snapshots (query mode only)

Agent deployment (separate component):

Kubernetes Job captures cluster configuration
Writes snapshot to ConfigMap via Kubernetes API
Requires RBAC: ServiceAccount with ConfigMap create/update permissions
See Agent Deployment

Typical workflow:

Deploy agent Job → Captures snapshot → Writes to ConfigMap
CLI reads ConfigMap → Generates recipe → Writes to file or ConfigMap
CLI reads recipe → Generates bundle → Writes to filesystem
Apply bundle to cluster (Helm install, kubectl apply)

Quick Start

$ # Create namespace
$ kubectl create namespace aicr
$ 
$ # Deploy API server (save the manifest from the Deployment section below as aicrd-deployment.yaml)
$ kubectl apply -f aicrd-deployment.yaml
$ 
$ # Check deployment
$ kubectl get pods -n aicr
$ kubectl get svc -n aicr

Helm chart: Not yet available. Use the manual manifests below.

Manual Deployment

1. Create Namespace

1 # namespace.yaml
2 apiVersion: v1
3 kind: Namespace
4 metadata:
5   name: aicr
6   labels:
7     app: aicrd

$ kubectl apply -f namespace.yaml

2. Create Deployment

1 # deployment.yaml
2 apiVersion: apps/v1
3 kind: Deployment
4 metadata:
5   name: aicrd
6   namespace: aicr
7   labels:
8     app: aicrd
9 spec:
10   replicas: 3
11   selector:
12     matchLabels:
13       app: aicrd
14   template:
15     metadata:
16       labels:
17         app: aicrd
18       annotations:
19         prometheus.io/scrape: "true"
20         prometheus.io/port: "8080"
21         prometheus.io/path: "/metrics"
22     spec:
23       securityContext:
24         runAsNonRoot: true
25         runAsUser: 65532
26         fsGroup: 65532
27       
28       containers:
29         - name: api-server
30           image: ghcr.io/nvidia/aicrd:latest
31           imagePullPolicy: IfNotPresent
32           
33           ports:
34             - name: http
35               containerPort: 8080
36               protocol: TCP
37           
38           env:
39             - name: PORT
40               value: "8080"
41             - name: AICR_LOG_LEVEL
42               value: "info"
43           
44           livenessProbe:
45             httpGet:
46               path: /health
47               port: http
48             initialDelaySeconds: 10
49             periodSeconds: 30
50             timeoutSeconds: 5
51             failureThreshold: 3
52           
53           readinessProbe:
54             httpGet:
55               path: /ready
56               port: http
57             initialDelaySeconds: 5
58             periodSeconds: 10
59             timeoutSeconds: 5
60             failureThreshold: 3
61           
62           resources:
63             requests:
64               cpu: 100m
65               memory: 128Mi
66             limits:
67               cpu: 500m
68               memory: 512Mi
69           
70           securityContext:
71             allowPrivilegeEscalation: false
72             readOnlyRootFilesystem: true
73             capabilities:
74               drop: ["ALL"]

$ kubectl apply -f deployment.yaml

3. Create Service

1 # service.yaml
2 apiVersion: v1
3 kind: Service
4 metadata:
5   name: aicrd
6   namespace: aicr
7   labels:
8     app: aicrd
9 spec:
10   type: ClusterIP
11   selector:
12     app: aicrd
13   ports:
14     - name: http
15       port: 80
16       targetPort: http
17       protocol: TCP

$ kubectl apply -f service.yaml

4. Create Ingress (Optional)

1 # ingress.yaml
2 apiVersion: networking.k8s.io/v1
3 kind: Ingress
4 metadata:
5   name: aicrd
6   namespace: aicr
7   annotations:
8     cert-manager.io/cluster-issuer: letsencrypt-prod
9     nginx.ingress.kubernetes.io/rate-limit: "100"
10 spec:
11   ingressClassName: nginx
12   tls:
13     - hosts:
14         - aicr.yourdomain.com
15       secretName: aicr-tls
16   rules:
17     - host: aicr.yourdomain.com
18       http:
19         paths:
20           - path: /
21             pathType: Prefix
22             backend:
23               service:
24                 name: aicrd
25                 port:
26                   number: 80

$ kubectl apply -f ingress.yaml

Agent Deployment

Deploy the AICR Agent as a Kubernetes Job to automatically capture cluster configuration.

1. Create RBAC Resources

1 # agent-rbac.yaml
2 apiVersion: v1
3 kind: ServiceAccount
4 metadata:
5   name: aicr
6   namespace: gpu-operator
7 ---
8 apiVersion: rbac.authorization.k8s.io/v1
9 kind: Role
10 metadata:
11   name: aicr
12   namespace: gpu-operator
13 rules:
14 - apiGroups: [""]
15   resources: ["configmaps"]
16   verbs: ["get", "list", "create", "update", "patch"]
17 ---
18 apiVersion: rbac.authorization.k8s.io/v1
19 kind: RoleBinding
20 metadata:
21   name: aicr
22   namespace: gpu-operator
23 roleRef:
24   apiGroup: rbac.authorization.k8s.io
25   kind: Role
26   name: aicr
27 subjects:
28 - kind: ServiceAccount
29   name: aicr
30   namespace: gpu-operator  # Must match ServiceAccount namespace
31 ---
32 apiVersion: rbac.authorization.k8s.io/v1
33 kind: ClusterRole
34 metadata:
35   name: aicr
36 rules:
37 - apiGroups: [""]
38   resources: ["nodes", "pods"]
39   verbs: ["get", "list"]
40 - apiGroups: ["nvidia.com"]
41   resources: ["clusterpolicies"]
42   verbs: ["get", "list"]
43 ---
44 apiVersion: rbac.authorization.k8s.io/v1
45 kind: ClusterRoleBinding
46 metadata:
47   name: aicr
48 roleRef:
49   apiGroup: rbac.authorization.k8s.io
50   kind: ClusterRole
51   name: aicr
52 subjects:
53 - kind: ServiceAccount
54   name: aicr
55   namespace: gpu-operator

$ kubectl apply -f agent-rbac.yaml

2. Create Agent Job

1 # agent-job.yaml
2 apiVersion: batch/v1
3 kind: Job
4 metadata:
5   name: aicr
6   namespace: gpu-operator
7   labels:
8     app: aicr-agent
9 spec:
10   template:
11     metadata:
12       labels:
13         app: aicr-agent
14     spec:
15       serviceAccountName: aicr
16       restartPolicy: Never
17       
18       containers:
19       - name: aicr
20         image: ghcr.io/nvidia/aicr:latest
21         imagePullPolicy: IfNotPresent
22         
23         command:
24         - aicr
25         - snapshot
26         - --output
27         - cm://gpu-operator/aicr-snapshot
28         
29         securityContext:
30           privileged: true
31           runAsUser: 0
32           runAsGroup: 0
33       hostPID: true
34       hostNetwork: true
35       hostIPC: true
36       volumes:
37       - name: systemd
38         hostPath:
39           path: /run/systemd
40           type: Directory

Note: The agent defaults to privileged mode, which is required for GPU, SystemD, and OS collectors. For PSS-restricted namespaces where only the Kubernetes collector is needed, use --privileged=false when deploying via the CLI. See Agent Deployment for details.

$ kubectl apply -f agent-job.yaml
$ 
$ # Wait for completion
$ kubectl wait --for=condition=complete job/aicr -n gpu-operator --timeout=5m
$ 
$ # Verify ConfigMap was created
$ kubectl get configmap aicr-snapshot -n gpu-operator
$ 
$ # View snapshot data
$ kubectl get configmap aicr-snapshot -n gpu-operator -o jsonpath='{.data.snapshot\.yaml}'

3. Generate Recipe from ConfigMap

$ # Using CLI (local or in another Job)
$ aicr recipe --snapshot cm://gpu-operator/aicr-snapshot \
>              --intent training \
>              --platform kubeflow \
>              --output recipe.yaml
$ 
$ # Or write recipe back to ConfigMap
$ aicr recipe --snapshot cm://gpu-operator/aicr-snapshot \
>              --intent training \
>              --platform kubeflow \
>              --output cm://gpu-operator/aicr-recipe

4. Generate Bundle

$ # From file
$ aicr bundle --recipe recipe.yaml --output ./bundles
$ 
$ # From ConfigMap
$ aicr bundle --recipe cm://gpu-operator/aicr-recipe --output ./bundles

E2E Testing

Validate the complete workflow:

$ # Run all CLI integration tests (no cluster needed)
$ make e2e
$ 
$ # Run cluster-based E2E tests (requires Kind cluster)
$ make e2e-tilt

CLI tests use Kyverno Chainsaw for declarative YAML assertions. See tests/chainsaw/README.md for details.

Configuration Options

Environment Variables

Variable	Default	Description
`PORT`	8080	HTTP server port
`AICR_LOG_LEVEL`	info	Logging level: debug, info, warn, error
`RATE_LIMIT`	100	Requests per second
`RATE_BURST`	200	Burst capacity
`READ_TIMEOUT`	30s	HTTP read timeout
`WRITE_TIMEOUT`	30s	HTTP write timeout
`IDLE_TIMEOUT`	60s	HTTP idle timeout

Note: The API server uses structured JSON logging to stderr. The CLI supports three logging modes (CLI/Text/JSON), but the API server always uses JSON for consistent log aggregation.

ConfigMap for Custom Recipe Data (Advanced)

Note: This example shows the concept of mounting custom recipe data. The actual recipe format uses a base-plus-overlay architecture. See recipes/ for the current schema (overlays/*.yaml including base.yaml).

1 # configmap.yaml - Example showing custom recipe data mounting
2 apiVersion: v1
3 kind: ConfigMap
4 metadata:
5   name: aicr-recipe-data
6   namespace: aicr
7 data:
8   overlays/base.yaml: |
9     # Your custom base recipe
10     apiVersion: aicr.nvidia.com/v1alpha1
11     kind: RecipeMetadata
12     # ... (see recipes/overlays/base.yaml for schema)

Mount in deployment:

1 spec:
2   template:
3     spec:
4       volumes:
5         - name: recipe-data
6           configMap:
7             name: aicr-recipe-data
8       containers:
9         - name: api-server
10           volumeMounts:
11             - name: recipe-data
12               mountPath: /data
13           env:
14             - name: RECIPE_DATA_PATH
15               value: /data

High Availability

Horizontal Pod Autoscaler

1 # hpa.yaml
2 apiVersion: autoscaling/v2
3 kind: HorizontalPodAutoscaler
4 metadata:
5   name: aicrd
6   namespace: aicr
7 spec:
8   scaleTargetRef:
9     apiVersion: apps/v1
10     kind: Deployment
11     name: aicrd
12   minReplicas: 3
13   maxReplicas: 10
14   metrics:
15     - type: Resource
16       resource:
17         name: cpu
18         target:
19           type: Utilization
20           averageUtilization: 70
21     - type: Resource
22       resource:
23         name: memory
24         target:
25           type: Utilization
26           averageUtilization: 80
27   behavior:
28     scaleDown:
29       stabilizationWindowSeconds: 300
30       policies:
31         - type: Percent
32           value: 50
33           periodSeconds: 60
34     scaleUp:
35       stabilizationWindowSeconds: 0
36       policies:
37         - type: Percent
38           value: 100
39           periodSeconds: 15

$ kubectl apply -f hpa.yaml

Pod Disruption Budget

1 # pdb.yaml
2 apiVersion: policy/v1
3 kind: PodDisruptionBudget
4 metadata:
5   name: aicrd
6   namespace: aicr
7 spec:
8   minAvailable: 2
9   selector:
10     matchLabels:
11       app: aicrd

$ kubectl apply -f pdb.yaml

Monitoring

Prometheus ServiceMonitor

1 # servicemonitor.yaml
2 apiVersion: monitoring.coreos.com/v1
3 kind: ServiceMonitor
4 metadata:
5   name: aicrd
6   namespace: aicr
7   labels:
8     app: aicrd
9 spec:
10   selector:
11     matchLabels:
12       app: aicrd
13   endpoints:
14     - port: http
15       path: /metrics
16       interval: 30s
17       scrapeTimeout: 10s

$ kubectl apply -f servicemonitor.yaml

Grafana Dashboard

Key panels:

Request rate (by status code)
Request duration (p50, p95, p99)
Error rate
Rate limit rejections
Active connections

Security

Network Policies

1 # networkpolicy.yaml
2 apiVersion: networking.k8s.io/v1
3 kind: NetworkPolicy
4 metadata:
5   name: aicrd
6   namespace: aicr
7 spec:
8   podSelector:
9     matchLabels:
10       app: aicrd
11   policyTypes:
12     - Ingress
13     - Egress
14   ingress:
15     - from:
16         - namespaceSelector: {}
17       ports:
18         - protocol: TCP
19           port: 8080
20   egress:
21     - to:
22         - namespaceSelector: {}
23       ports:
24         - protocol: TCP
25           port: 53  # DNS
26     - to:
27         - namespaceSelector:
28             matchLabels:
29               name: kube-system
30       ports:
31         - protocol: TCP
32           port: 443  # Kubernetes API

Pod Security Standards

1 # Add to namespace
2 apiVersion: v1
3 kind: Namespace
4 metadata:
5   name: aicr
6   labels:
7     pod-security.kubernetes.io/enforce: restricted
8     pod-security.kubernetes.io/audit: restricted
9     pod-security.kubernetes.io/warn: restricted

RBAC (If API server needs K8s access)

1 # serviceaccount.yaml
2 apiVersion: v1
3 kind: ServiceAccount
4 metadata:
5   name: aicrd
6   namespace: aicr
7 
8 ---
9 # role.yaml
10 apiVersion: rbac.authorization.k8s.io/v1
11 kind: ClusterRole
12 metadata:
13   name: aicrd
14 rules:
15   - apiGroups: [""]
16     resources: ["nodes", "pods"]
17     verbs: ["get", "list"]
18 
19 ---
20 # rolebinding.yaml
21 apiVersion: rbac.authorization.k8s.io/v1
22 kind: ClusterRoleBinding
23 metadata:
24   name: aicrd
25 roleRef:
26   apiGroup: rbac.authorization.k8s.io
27   kind: ClusterRole
28   name: aicrd
29 subjects:
30   - kind: ServiceAccount
31     name: aicrd
32     namespace: aicr

Troubleshooting

Check Pod Status

$ # Pod status
$ kubectl get pods -n aicr
$ 
$ # Describe pod
$ kubectl describe pod -n aicr -l app=aicrd
$ 
$ # View logs
$ kubectl logs -n aicr -l app=aicrd
$ 
$ # Follow logs
$ kubectl logs -n aicr -l app=aicrd -f

Check Service

$ # Service status
$ kubectl get svc -n aicr
$ 
$ # Endpoints
$ kubectl get endpoints -n aicr
$ 
$ # Test from within cluster
$ kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
>   curl http://aicrd.aicr.svc.cluster.local/health

Check Ingress

$ # Ingress status
$ kubectl get ingress -n aicr
$ 
$ # Describe ingress
$ kubectl describe ingress aicrd -n aicr
$ 
$ # Check cert-manager certificate
$ kubectl get certificate -n aicr

Performance Issues

$ # Check resource usage
$ kubectl top pods -n aicr
$ 
$ # Check HPA status
$ kubectl get hpa -n aicr
$ 
$ # Check metrics
$ kubectl exec -n aicr -it deploy/aicrd -- \
>   wget -qO- http://localhost:8080/metrics

Connection Refused

Check service exists: kubectl get svc -n aicr
Check endpoints: kubectl get endpoints -n aicr
Check pod is ready: kubectl get pods -n aicr
Check readiness probe: kubectl describe pod -n aicr <pod-name>

Rate Limiting

Check rate limit settings:

$ kubectl exec -n aicr deploy/aicrd -- env | grep RATE

Adjust via deployment:

1 env:
2   - name: RATE_LIMIT
3     value: "200"  # Increase limit
4   - name: RATE_BURST
5     value: "400"

Upgrading

Rolling Update

$ # Update image
$ kubectl set image deployment/aicrd \
>   api-server=ghcr.io/nvidia/aicrd:v0.8.0 \
>   -n aicr
$ 
$ # Watch rollout
$ kubectl rollout status deployment/aicrd -n aicr
$ 
$ # Rollback if needed
$ kubectl rollout undo deployment/aicrd -n aicr

Blue-Green Deployment

$ # Deploy new version
$ kubectl apply -f deployment-v2.yaml
$ 
$ # Switch service
$ kubectl patch service aicrd -n aicr \
>   -p '{"spec":{"selector":{"version":"v2"}}}'
$ 
$ # Delete old deployment
$ kubectl delete deployment aicrd-v1 -n aicr

Backup and Disaster Recovery

Export Configuration

$ # Export all resources
$ kubectl get all -n aicr -o yaml > aicr-backup.yaml
$ 
$ # Export specific resources
$ kubectl get deployment,service,ingress -n aicr -o yaml > aicr-config.yaml

Restore from Backup

$ # Restore namespace and resources
$ kubectl apply -f aicr-backup.yaml

Cost Optimization

Resource Limits

Start with minimal resources:

1 resources:
2   requests:
3     cpu: 50m
4     memory: 64Mi
5   limits:
6     cpu: 200m
7     memory: 256Mi

Monitor and adjust based on usage.

Vertical Pod Autoscaler (Optional)

1 # vpa.yaml
2 apiVersion: autoscaling.k8s.io/v1
3 kind: VerticalPodAutoscaler
4 metadata:
5   name: aicrd
6   namespace: aicr
7 spec:
8   targetRef:
9     apiVersion: apps/v1
10     kind: Deployment
11     name: aicrd
12   updatePolicy:
13     updateMode: "Auto"