Appendix A: Helm Chart Values#

Deployment Parameters#

Name

Description

Value

affinity

Affinity settings for deployment. Allows to constraint pods to nodes.

{}

securityContext

Specify privilege and access control settings for Container(Only affects the main container).

{}

envVars

Adds arbitrary environment variables to the main container - Key Value Pairs.

{}

extraVolumes

Adds arbitrary additional volumes to the deployment set definition.

{}

image.repository

NIM-LLM Image Repository.

""

image.tag

Image tag.

""

image.pullPolicy

Image pull policy.

""

imagePullSecrets

Specify secret names that are needed for the main container and any init containers. Object keys are the names of the secrets.

{}

nodeSelector

Specify labels to ensure that NeMo Inference is deployed only on certain nodes (likely best to set this to nvidia.com/gpu.present: "true" depending on cluster setup).

{}

podAnnotations

Specify additional annotation to the main deployment pods.

{}

podSecurityContext

Specify privilege and access control settings for pod (Only affects the main pod).

podSecurityContext.runAsUser

Specify user UID for pod.

1000

podSecurityContext.runAsGroup

Specify group ID for pod.

1000

podSecurityContext.fsGroup

Specify file system owner group id.

1000

replicaCount

Specify replica count for deployment.

1

resources

Specify resources limits and requests for the running service.

resources.limits.nvidia.com/gpu

Specify number of GPUs to present to the running service.

1

serviceAccount.create

Specifies whether a service account should be created.

false

serviceAccount.annotations

Specifies annotations to be added to the service account.

{}

serviceAccount.automount

Specifies whether to automatically mount the service account to the container.

{}

serviceAccount.name

Specify name of the service account to use. If it is not set and create is true, a name is generated using a fullname template.

""

tolerations

Specify tolerations for pod assignment. Allows the scheduler to schedule pods with matching taints.

Autoscaling Parameters#

Values used for autoscaling. If autoscaling is not enabled, these are ignored. They should be overridden on a per-model basis based on quality-of-service metrics as well as cost metrics. This isn’t recommended except with usage of the custom metrics API using something like the prometheus-adapter. Standard metrics of CPU and memory are of limited use in scaling NIM

Name

Description

Value

autoscaling.enabled

Enable horizontal pod autoscaler.

false

autoscaling.minReplicas

Specify minimum replicas for autoscaling.

1

autoscaling.maxReplicas

Specify maximum replicas for autoscaling.

10

autoscaling.metrics

Array of metrics for autoscaling.

[]

Ingress Parameters#

Name

Description

Value

ingress.enabled

Enables ingress.

false

ingress.className

Specify class name for Ingress.

""

ingress.annotations

Specify additional annotations for ingress.

{}

ingress.hosts

Specify list of hosts each containing lists of paths.

ingress.hosts[0].host

Specify name of host.

chart-example.local

ingress.hosts[0].paths[0].path

Specify ingress path.

/

ingress.hosts[0].paths[0].pathType

Specify path type.

ImplementationSpecific

ingress.hosts[0].paths[0].serviceType

Specify service type. It can be nemo or openai – make sure your model serves the appropriate port(s).

openai

ingress.tls

Specify list of pairs of TLS secretName and hosts.

[]

Probe Parameters#

Name

Description

Value

livenessProbe.enabled

Enable livenessProbe.

true

livenessProbe.method

LivenessProbe http or script, but no script is currently provided.

http

livenessProbe.path

LivenessProbe endpoint path.

/v1/health/live

livenessProbe.initialDelaySeconds

Initial delay seconds for livenessProbe.

15

livenessProbe.timeoutSeconds

Timeout seconds for livenessProbe.

1

livenessProbe.periodSeconds

Period seconds for livenessProbe.

10

livenessProbe.successThreshold

Success threshold for livenessProbe.

1

livenessProbe.failureThreshold

Failure threshold for livenessProbe.

3

readinessProbe.enabled

Enable readinessProbe.

true

readinessProbe.path

Readiness Endpoint Path.

/v1/health/ready

readinessProbe.initialDelaySeconds

Initial delay seconds for readinessProbe.

15

readinessProbe.timeoutSeconds

Timeout seconds for readinessProbe.

1

readinessProbe.periodSeconds

Period seconds for readinessProbe.

10

readinessProbe.successThreshold

Success threshold for readinessProbe.

1

readinessProbe.failureThreshold

Failure threshold for readinessProbe.

3

startupProbe.enabled

Enable startupProbe.

true

startupProbe.path

StartupProbe Endpoint Path.

/v1/health/ready

startupProbe.initialDelaySeconds

Initial delay seconds for startupProbe.

40

startupProbe.timeoutSeconds

Timeout seconds for startupProbe.

1

startupProbe.periodSeconds

Period seconds for startupProbe.

10

startupProbe.successThreshold

Success threshold for startupProbe.

1

startupProbe.failureThreshold

Failure threshold for startupProbe.

180

Storage Parameters#

Name

Description

Value

persistence

Specify settings to modify the path /model-store if model.legacyCompat is enabled else /.cache volume where the model is served from.

persistence.enabled

Enable persistent volumes.

false

persistence.existingClaimName

Secify existing claim. If using existingClaim, run only one replica or use a ReadWriteMany storage setup.

""

persistence.class      

Specify persistent volume storage class. If null (the default), no storageClassName spec is set, choosing the default provisioner.

""

persistence.retain      

Specify whether the Persistent Volume should survive when the helm chart is upgraded or deleted.

""

persistence.createPV      

True if you need to have the chart create a PV for hostPath use cases.

false

persistence.accessMode

Specify accessModes. If using an NFS or similar setup, you can use ReadWriteMany.

ReadWriteOnce

persistence.size

Specify size of claim (e.g. 8Gi).

50Gi

hostPath

Configures model cache on local disk on the nodes using hostPath – for special cases. One should investigate and understand the security implications before using this option.

""

Service Parameters#

Name

Description

Value

service.type

Specify service type for the deployment.

ClusterIP

service.name

Override the default service name.

""

service.http_port

Specify HTTP Port for the service.

8080

service.annotations

Specify additional annotations to be added to service.

{}

OpenTelemetry Parameters#

Name

Description

Value

zipkinDeployed

Specify if this chart should deploy zipkin for metrics.

false

otelDeployed

Specify if this chart should deploy OpenTelemetry for metrics.

false

otelEnabled

Specify if this chart should sink metrics to OpenTelemetry.

false

otelEnvVars

Env variables to configure OTEL in the container, sane defaults in chart.

{}

logLevel

Log Level to set for the container and metrics collection.

{}

OpenTelemetry configurations can be found in the values section of the OpenTelemetry repository.

Note

Configure the OpenTelemetry exporters according to your needs. The provided helm chart provides sample configuration for exporting traces to Zipkin, and metrics to an OTLP-compatible receiver, stored at opentelemetry-collector.config.exporters.zipkin and opentelemetry-collector.config.exporters.otlp, respectively.

For example, if your metrics setup operates in a pull-based fashion and you want to expose NIM metrics in Prometheus format, you can do so by replacing the OTLP exporter with a Prometheus exporter.