Appendix A: Helm Chart Values
Name |
Description |
Value |
---|---|---|
affinity |
Affinity settings for deployment. Allows to constraint pods to nodes. | {} |
securityContext |
Specify privilege and access control settings for Container(Only affects the main container) | {} |
envVars |
Adds arbitrary environment variables to the main container - Key Value Pairs | {} |
extraVolumes |
Adds arbitrary additional volumes to the deployment set definition | {} |
image.repository |
NIM-LLM Image Repository | "" |
image.tag |
Image tag | "" |
image.pullPolicy |
Image pull policy | "" |
imagePullSecrets |
Specify secret names that are needed for the main container and any init containers. Object keys are the names of the secrets | {} |
nodeSelector |
Specify labels to ensure that NeMo Inference is deployed only on certain nodes (likely best to set this to nvidia.com/gpu.present: "true" depending on cluster setup). |
{} |
podAnnotations |
Specify additional annotation to the main deployment pods | {} |
podSecurityContext |
Specify privilege and access control settings for pod (Only affects the main pod). | |
podSecurityContext.runAsUser |
Specify user UID for pod. | 1000 |
podSecurityContext.runAsGroup |
Specify group ID for pod. | 1000 |
podSecurityContext.fsGroup |
Specify file system owner group id. | 1000 |
replicaCount |
Specify replica count for deployment. | 1 |
resources |
Specify resources limits and requests for the running service. | |
resources.limits.nvidia.com/gpu |
Specify number of GPUs to present to the running service. | 1 |
serviceAccount.create |
Specifies whether a service account should be created. | false |
serviceAccount.annotations |
Specifies annotations to be added to the service account. | {} |
serviceAccount.automount |
Specifies whether to automatically mount the service account to the container. | {} |
serviceAccount.name |
Specify name of the service account to use. If it is not set and create is true, a name is generated using a fullname template. | "" |
tolerations |
Specify tolerations for pod assignment. Allows the scheduler to schedule pods with matching taints. |
Values used for autoscaling. If autoscaling is not enabled, these are ignored. They should be overridden on a per-model basis based on quality-of-service metrics as well as cost metrics. This isn’t recommended except with usage of the custom metrics API using something like the prometheus-adapter. Standard metrics of CPU and memory are of limited use in scaling NIM
Name |
Description |
Value |
---|---|---|
autoscaling.enabled |
Enable horizontal pod autoscaler. | false |
autoscaling.minReplicas |
Specify minimum replicas for autoscaling. | 1 |
autoscaling.maxReplicas |
Specify maximum replicas for autoscaling. | 10 |
autoscaling.metrics |
Array of metrics for autoscaling. | [] |
Name |
Description |
Value |
---|---|---|
ingress.enabled |
Enables ingress. | false |
ingress.className |
Specify class name for Ingress. | "" |
ingress.annotations |
Specify additional annotations for ingress. | {} |
ingress.hosts |
Specify list of hosts each containing lists of paths. | |
ingress.hosts[0].host |
Specify name of host. | chart-example.local |
ingress.hosts[0].paths[0].path |
Specify ingress path. | / |
ingress.hosts[0].paths[0].pathType |
Specify path type. | ImplementationSpecific |
ingress.hosts[0].paths[0].serviceType |
Specify service type. It can be nemo or openai – make sure your model serves the appropriate port(s). | openai |
ingress.tls |
Specify list of pairs of TLS secretName and hosts. | [] |
Name |
Description |
Value |
---|---|---|
livenessProbe.enabled |
Enable livenessProbe | true |
livenessProbe.method |
LivenessProbe http or script, but no script is currently provided | http |
livenessProbe.path |
LivenessProbe endpoint path | /v1/health/live |
livenessProbe.initialDelaySeconds |
Initial delay seconds for livenessProbe | 15 |
livenessProbe.timeoutSeconds |
Timeout seconds for livenessProbe | 1 |
livenessProbe.periodSeconds |
Period seconds for livenessProbe | 10 |
livenessProbe.successThreshold |
Success threshold for livenessProbe | 1 |
livenessProbe.failureThreshold |
Failure threshold for livenessProbe | 3 |
readinessProbe.enabled |
Enable readinessProbe | true |
readinessProbe.path |
Readiness Endpoint Path | /v1/health/ready |
readinessProbe.initialDelaySeconds |
Initial delay seconds for readinessProbe | 15 |
readinessProbe.timeoutSeconds |
Timeout seconds for readinessProbe | 1 |
readinessProbe.periodSeconds |
Period seconds for readinessProbe | 10 |
readinessProbe.successThreshold |
Success threshold for readinessProbe | 1 |
readinessProbe.failureThreshold |
Failure threshold for readinessProbe | 3 |
startupProbe.enabled |
Enable startupProbe | true |
startupProbe.path |
StartupProbe Endpoint Path | /v1/health/ready |
startupProbe.initialDelaySeconds |
Initial delay seconds for startupProbe | 40 |
startupProbe.timeoutSeconds |
Timeout seconds for startupProbe | 1 |
startupProbe.periodSeconds |
Period seconds for startupProbe | 10 |
startupProbe.successThreshold |
Success threshold for startupProbe | 1 |
startupProbe.failureThreshold |
Failure threshold for startupProbe | 180 |
Name |
Description |
Value |
---|---|---|
persistence |
Specify settings to modify the path /model-store if model.legacyCompat is enabled else /.cache volume where the model is served from. |
|
persistence.enabled |
Enable persistent volumes. | false |
persistence.existingClaimName |
Secify existing claim. If using existingClaim, run only one replica or use a ReadWriteMany storage setup. | "" |
persistence.class |
Specify persistent volume storage class. If null (the default), no storageClassName spec is set, choosing the default provisioner. | "" |
persistence.retain |
Specify whether the Persistent Volume should survive when the helm chart is upgraded or deleted. | "" |
persistence.createPV |
True if you need to have the chart create a PV for hostPath use cases. | false |
persistence.accessMode |
Specify accessModes. If using an NFS or similar setup, you can use ReadWriteMany. | ReadWriteOnce |
persistence.size |
Specify size of claim (e.g. 8Gi). | 50Gi |
lhostPath |
Configures model cache on local disk on the nodes using hostPath – for special cases. One should investigate and understand the security implications before using this option. | "" |
Name |
Description |
Value |
---|---|---|
service.type |
Specify service type for the deployment. | ClusterIP |
service.name |
Override the default service name | "" |
service.http_port |
Specify HTTP Port for the service. | 8080 |
service.annotations |
Specify additional annotations to be added to service. | {} |
Name |
Description |
Value |
---|---|---|
zipkinDeployed |
Specify if this chart should deploy zipkin for metrics. | false |
otelDeployed |
Specify if this chart should deploy OpenTelemetry for metrics. | false |
otelEnabled |
Specify if this chart should sink metrics to OpenTelemetry. | false |
otelEnvVars |
Env variables to configure OTEL in the container, sane defaults in chart. | {} |
logLevel |
Log Level to set for the container and metrics collection. | {} |
OpenTelemetry configurations can be found in the values section of the OpenTelemetry repository.
Configure the OpenTelemetry exporters according to your needs. The provided helm chart provides sample configuration for exporting traces to Zipkin, and metrics to an OTLP-compatible receiver, stored at opentelemetry-collector.config.exporters.zipkin
and opentelemetry-collector.config.exporters.otlp
, respectively.
For example, if your metrics setup operates in a pull-based fashion and you want to expose NIM metrics in Prometheus format, you can do so by replacing the OTLP exporter with a Prometheus exporter.