Configuration Guide#
This guide covers configuration options for the CuOpt NIM Operator deployment.
Image Configuration#
CuOpt Image Versions#
Update the image tag in cuopt-nimservice.yaml:
CUDA Version |
Image Tag |
|---|---|
CUDA 12.9 |
|
spec:
image:
repository: nvcr.io/nvidia/cuopt/cuopt
tag: "25.12.0-cuda12.9-py3.13"
pullPolicy: IfNotPresent
Resource Configuration#
GPU Resources#
Configure GPU allocation:
spec:
resources:
limits:
nvidia.com/gpu: 1 # Number of GPUs
Memory Resources#
For workloads requiring specific memory allocation:
spec:
resources:
limits:
nvidia.com/gpu: 1
memory: "32Gi"
requests:
memory: "16Gi"
Environment Variables#
CuOpt supports several environment variables for configuration:
spec:
env:
- name: CUOPT_DATA_DIR
value: /model-store
- name: CUOPT_SERVER_LOG_LEVEL
value: info # Options: debug, info, warning, error
- name: CUOPT_SERVER_PORT
value: "8000"
Storage Configuration#
The deployment optionally uses persistent storage so that datasets can be passed through the filesystem rather than over http. If data is sent over http (the default), this storage is not needed.
spec:
storage:
pvc:
create: true
size: 10Gi
storageClass: "" # Uses default storage class
volumeAccessMode: "ReadWriteOnce"
For custom storage class:
spec:
storage:
pvc:
create: true
size: 20Gi
storageClass: "fast-ssd"
volumeAccessMode: "ReadWriteOnce"
Networking Configuration#
Service Configuration#
Default ClusterIP service:
spec:
expose:
service:
type: ClusterIP
port: 8000
For NodePort access:
spec:
expose:
service:
type: NodePort
port: 8000
nodePort: 30800
For LoadBalancer (cloud environments): .. note:: Currently the cuopt service does not support scaling; there can only be 1 instance of the pod per service. Therefore a LoadBalancer service is unnecessary.
spec:
expose:
service:
type: LoadBalancer
port: 8000
Ingress Configuration#
To expose CuOpt externally via ingress:
spec:
expose:
service:
type: ClusterIP
port: 8000
ingress:
enabled: true
spec:
ingressClassName: nginx
rules:
- host: cuopt.example.com
http:
paths:
- backend:
service:
name: cuopt-service
port:
number: 8000
path: /
pathType: Prefix
With TLS:
spec:
expose:
ingress:
enabled: true
spec:
ingressClassName: nginx
tls:
- hosts:
- cuopt.example.com
secretName: cuopt-tls-secret
rules:
- host: cuopt.example.com
http:
paths:
- backend:
service:
name: cuopt-service
port:
number: 8000
path: /
pathType: Prefix
Scaling Configuration#
Currently the cuOpt service does not support scaling. Only a single instance of the pod per service is supported.
Health Probes#
Liveness Probe#
Determines if the container is running:
spec:
livenessProbe:
enabled: true
probe:
failureThreshold: 3
httpGet:
path: /v2/health/live
port: api
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Readiness Probe#
Determines if the container is ready to accept traffic:
spec:
readinessProbe:
enabled: true
probe:
failureThreshold: 30
httpGet:
path: /v2/health/ready
port: api
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Startup Probe#
For slower starting containers:
spec:
startupProbe:
enabled: true
probe:
failureThreshold: 30
httpGet:
path: /v2/health/ready
port: api
periodSeconds: 10
Monitoring Configuration#
Enable Prometheus metrics and ServiceMonitor:
spec:
metrics:
enabled: true
serviceMonitor:
additionalLabels:
release: kube-prometheus-stack
Full Configuration Example#
Here’s a complete production-ready configuration:
1# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2# SPDX-License-Identifier: Apache-2.0
3
4# Full production-ready CuOpt NIMService configuration
5apiVersion: apps.nvidia.com/v1alpha1
6kind: NIMService
7metadata:
8 name: cuopt-service
9 namespace: nim-service
10spec:
11 image:
12 repository: nvcr.io/nvidia/cuopt/cuopt
13 tag: "25.12.0-cuda12.9-py3.13"
14 pullPolicy: IfNotPresent
15 pullSecrets:
16 - ngc-secret
17 authSecret: ngc-api-secret
18 env:
19 - name: CUOPT_DATA_DIR
20 value: /model-store
21 - name: CUOPT_SERVER_LOG_LEVEL
22 value: info
23 - name: CUOPT_SERVER_PORT
24 value: "8000"
25 storage:
26 pvc:
27 create: true
28 size: 20Gi
29 storageClass: "fast-ssd"
30 volumeAccessMode: "ReadWriteOnce"
31 resources:
32 limits:
33 nvidia.com/gpu: 1
34 memory: "32Gi"
35 requests:
36 memory: "16Gi"
37 expose:
38 service:
39 type: ClusterIP
40 port: 8000
41 ingress:
42 enabled: true
43 spec:
44 ingressClassName: nginx
45 tls:
46 - hosts:
47 - cuopt.example.com
48 secretName: cuopt-tls-secret
49 rules:
50 - host: cuopt.example.com
51 http:
52 paths:
53 - backend:
54 service:
55 name: cuopt-service
56 port:
57 number: 8000
58 path: /
59 pathType: Prefix
60 metrics:
61 enabled: true
62 serviceMonitor:
63 additionalLabels:
64 release: kube-prometheus-stack
65 scale:
66 enabled: false
67 livenessProbe:
68 enabled: true
69 probe:
70 failureThreshold: 3
71 httpGet:
72 path: /v2/health/live
73 port: api
74 initialDelaySeconds: 15
75 periodSeconds: 10
76 successThreshold: 1
77 timeoutSeconds: 1
78 readinessProbe:
79 enabled: true
80 probe:
81 failureThreshold: 30
82 httpGet:
83 path: /v2/health/ready
84 port: api
85 initialDelaySeconds: 30
86 periodSeconds: 10
87 successThreshold: 1
88 timeoutSeconds: 1