Configuration Guide#

This guide covers configuration options for the CuOpt NIM Operator deployment.

Image Configuration#

CuOpt Image Versions#

Update the image tag in cuopt-nimservice.yaml:

CUDA Version

Image Tag

CUDA 12.9

25.12.0-cuda12.9-py3.13

spec:
  image:
    repository: nvcr.io/nvidia/cuopt/cuopt
    tag: "25.12.0-cuda12.9-py3.13"
    pullPolicy: IfNotPresent

Resource Configuration#

GPU Resources#

Configure GPU allocation:

spec:
  resources:
    limits:
      nvidia.com/gpu: 1    # Number of GPUs

Memory Resources#

For workloads requiring specific memory allocation:

spec:
  resources:
    limits:
      nvidia.com/gpu: 1
      memory: "32Gi"
    requests:
      memory: "16Gi"

Environment Variables#

CuOpt supports several environment variables for configuration:

spec:
  env:
    - name: CUOPT_DATA_DIR
      value: /model-store
    - name: CUOPT_SERVER_LOG_LEVEL
      value: info          # Options: debug, info, warning, error
    - name: CUOPT_SERVER_PORT
      value: "8000"

Storage Configuration#

The deployment optionally uses persistent storage so that datasets can be passed through the filesystem rather than over http. If data is sent over http (the default), this storage is not needed.

spec:
  storage:
    pvc:
      create: true
      size: 10Gi
      storageClass: ""           # Uses default storage class
      volumeAccessMode: "ReadWriteOnce"

For custom storage class:

spec:
  storage:
    pvc:
      create: true
      size: 20Gi
      storageClass: "fast-ssd"
      volumeAccessMode: "ReadWriteOnce"

Networking Configuration#

Service Configuration#

Default ClusterIP service:

spec:
  expose:
    service:
      type: ClusterIP
      port: 8000

For NodePort access:

spec:
  expose:
    service:
      type: NodePort
      port: 8000
      nodePort: 30800

For LoadBalancer (cloud environments): .. note:: Currently the cuopt service does not support scaling; there can only be 1 instance of the pod per service. Therefore a LoadBalancer service is unnecessary.

spec:
  expose:
    service:
      type: LoadBalancer
      port: 8000

Ingress Configuration#

To expose CuOpt externally via ingress:

spec:
  expose:
    service:
      type: ClusterIP
      port: 8000
    ingress:
      enabled: true
      spec:
        ingressClassName: nginx
        rules:
          - host: cuopt.example.com
            http:
              paths:
              - backend:
                  service:
                    name: cuopt-service
                    port:
                      number: 8000
                path: /
                pathType: Prefix

With TLS:

spec:
  expose:
    ingress:
      enabled: true
      spec:
        ingressClassName: nginx
        tls:
          - hosts:
              - cuopt.example.com
            secretName: cuopt-tls-secret
        rules:
          - host: cuopt.example.com
            http:
              paths:
              - backend:
                  service:
                    name: cuopt-service
                    port:
                      number: 8000
                path: /
                pathType: Prefix

Scaling Configuration#

Currently the cuOpt service does not support scaling. Only a single instance of the pod per service is supported.

Health Probes#

Liveness Probe#

Determines if the container is running:

spec:
  livenessProbe:
    enabled: true
    probe:
      failureThreshold: 3
      httpGet:
        path: /v2/health/live
        port: api
      initialDelaySeconds: 15
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Readiness Probe#

Determines if the container is ready to accept traffic:

spec:
  readinessProbe:
    enabled: true
    probe:
      failureThreshold: 30
      httpGet:
        path: /v2/health/ready
        port: api
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

Startup Probe#

For slower starting containers:

spec:
  startupProbe:
    enabled: true
    probe:
      failureThreshold: 30
      httpGet:
        path: /v2/health/ready
        port: api
      periodSeconds: 10

Monitoring Configuration#

Enable Prometheus metrics and ServiceMonitor:

spec:
  metrics:
    enabled: true
    serviceMonitor:
      additionalLabels:
        release: kube-prometheus-stack

Full Configuration Example#

Here’s a complete production-ready configuration:

cuopt-nimservice-full.yaml

 1# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
 2# SPDX-License-Identifier: Apache-2.0
 3
 4# Full production-ready CuOpt NIMService configuration
 5apiVersion: apps.nvidia.com/v1alpha1
 6kind: NIMService
 7metadata:
 8  name: cuopt-service
 9  namespace: nim-service
10spec:
11  image:
12    repository: nvcr.io/nvidia/cuopt/cuopt
13    tag: "25.12.0-cuda12.9-py3.13"
14    pullPolicy: IfNotPresent
15    pullSecrets:
16      - ngc-secret
17  authSecret: ngc-api-secret
18  env:
19    - name: CUOPT_DATA_DIR
20      value: /model-store
21    - name: CUOPT_SERVER_LOG_LEVEL
22      value: info
23    - name: CUOPT_SERVER_PORT
24      value: "8000"
25  storage:
26    pvc:
27      create: true
28      size: 20Gi
29      storageClass: "fast-ssd"
30      volumeAccessMode: "ReadWriteOnce"
31  resources:
32    limits:
33      nvidia.com/gpu: 1
34      memory: "32Gi"
35    requests:
36      memory: "16Gi"
37  expose:
38    service:
39      type: ClusterIP
40      port: 8000
41    ingress:
42      enabled: true
43      spec:
44        ingressClassName: nginx
45        tls:
46          - hosts:
47              - cuopt.example.com
48            secretName: cuopt-tls-secret
49        rules:
50          - host: cuopt.example.com
51            http:
52              paths:
53              - backend:
54                  service:
55                    name: cuopt-service
56                    port:
57                      number: 8000
58                path: /
59                pathType: Prefix
60  metrics:
61    enabled: true
62    serviceMonitor:
63      additionalLabels:
64        release: kube-prometheus-stack
65  scale:
66    enabled: false
67  livenessProbe:
68    enabled: true
69    probe:
70      failureThreshold: 3
71      httpGet:
72        path: /v2/health/live
73        port: api
74      initialDelaySeconds: 15
75      periodSeconds: 10
76      successThreshold: 1
77      timeoutSeconds: 1
78  readinessProbe:
79    enabled: true
80    probe:
81      failureThreshold: 30
82      httpGet:
83        path: /v2/health/ready
84        port: api
85      initialDelaySeconds: 30
86      periodSeconds: 10
87      successThreshold: 1
88      timeoutSeconds: 1