Appendix A: Helm Chart Parameters#

Refer to the Kubernetes API reference for details on customizing values in the Helm chart.

The values shown are the default values.

Deployment Parameters#

Name	Description	Value
`affinity`	Affinity settings for deployment. Allows to constraint pods to nodes.	`{}`
`securityContext`	Specify privilege and access control settings for Container(Only affects the main container).	`{}`
`envVars`	Adds arbitrary environment variables to the main container - Key Value Pairs.	`{}`
`extraVolumes`	Adds arbitrary additional volumes to the deployment set definition.	`{}`
`image.repository`	NIM-LLM Image Repository.	`""`
`image.tag`	Image tag.	`""`
`image.pullPolicy`	Image pull policy.	`""`
`imagePullSecrets`	Specify secret names that are needed for the main container and any init containers. Object keys are the names of the secrets.	`{}`
`nodeSelector`	Specify labels to ensure that NeMo Inference is deployed only on certain nodes (likely best to set this to `nvidia.com/gpu.present: "true"` depending on cluster setup).	`{}`
`podAnnotations`	Specify additional annotation to the main deployment pods.	`{}`
`podSecurityContext`	Specify privilege and access control settings for pod (Only affects the main pod).
`podSecurityContext.runAsUser`	Specify user UID for pod.	`1000`
`podSecurityContext.runAsGroup`	Specify group ID for pod.	`1000`
`podSecurityContext.fsGroup`	Specify file system owner group id.	`1000`
`replicaCount`	Specify replica count for deployment.	`1`
`resources`	Specify resources limits and requests for the running service.
`resources.limits.nvidia.com/gpu`	Specify number of GPUs to present to the running service.	`1`
`serviceAccount.create`	Specifies whether a service account should be created.	`false`
`serviceAccount.annotations`	Specifies annotations to be added to the service account.	`{}`
`serviceAccount.automount`	Specifies whether to automatically mount the service account to the container.	`{}`
`serviceAccount.name`	Specify name of the service account to use. If it is not set and create is true, a name is generated using a fullname template.	`""`
`tolerations`	Specify tolerations for pod assignment. Allows the scheduler to schedule pods with matching taints.

Autoscaling Parameters#

Values used for autoscaling. If autoscaling is not enabled, these are ignored. They should be overridden on a per-model basis based on quality-of-service metrics as well as cost metrics. This isn’t recommended except with usage of the custom metrics API using something like the prometheus-adapter. Standard metrics of CPU and memory are of limited use in scaling NIM.

Name	Description	Value
`autoscaling.enabled`	Enable horizontal pod autoscaler.	`false`
`autoscaling.minReplicas`	Specify minimum replicas for autoscaling.	`1`
`autoscaling.maxReplicas`	Specify maximum replicas for autoscaling.	`10`
`autoscaling.metrics`	Array of metrics for autoscaling.	`[]`

Ingress Parameters#

Name	Description	Value
`ingress.enabled`	Enables ingress.	`false`
`ingress.className`	Specify class name for Ingress.	`""`
`ingress.annotations`	Specify additional annotations for ingress.	`{}`
`ingress.hosts`	Specify list of hosts each containing lists of paths.
`ingress.hosts[0].host`	Specify name of host.	`chart-example.local`
`ingress.hosts[0].paths[0].path`	Specify ingress path.	`/`
`ingress.hosts[0].paths[0].pathType`	Specify path type.	`ImplementationSpecific`
`ingress.hosts[0].paths[0].serviceType`	Specify service type. It can be nemo or openai – make sure your model serves the appropriate port(s).	`openai`
`ingress.tls`	Specify list of pairs of TLS secretName and hosts.	`[]`

Probe Parameters#

Name	Description	Value
`livenessProbe.enabled`	Enable livenessProbe.	`true`
`livenessProbe.method`	LivenessProbe http or script, but no script is currently provided.	`http`
`livenessProbe.path`	LivenessProbe endpoint path.	`/v1/health/live`
`livenessProbe.initialDelaySeconds`	Initial delay seconds for livenessProbe.	`15`
`livenessProbe.timeoutSeconds`	Timeout seconds for livenessProbe.	`1`
`livenessProbe.periodSeconds`	Period seconds for livenessProbe.	`10`
`livenessProbe.successThreshold`	Success threshold for livenessProbe.	`1`
`livenessProbe.failureThreshold`	Failure threshold for livenessProbe.	`3`
`readinessProbe.enabled`	Enable readinessProbe.	`true`
`readinessProbe.path`	Readiness Endpoint Path.	`/v1/health/ready`
`readinessProbe.initialDelaySeconds`	Initial delay seconds for readinessProbe.	`15`
`readinessProbe.timeoutSeconds`	Timeout seconds for readinessProbe.	`1`
`readinessProbe.periodSeconds`	Period seconds for readinessProbe.	`10`
`readinessProbe.successThreshold`	Success threshold for readinessProbe.	`1`
`readinessProbe.failureThreshold`	Failure threshold for readinessProbe.	`3`
`startupProbe.enabled`	Enable startupProbe.	`true`
`startupProbe.path`	StartupProbe Endpoint Path.	`/v1/health/ready`
`startupProbe.initialDelaySeconds`	Initial delay seconds for startupProbe.	`40`
`startupProbe.timeoutSeconds`	Timeout seconds for startupProbe.	`1`
`startupProbe.periodSeconds`	Period seconds for startupProbe.	`10`
`startupProbe.successThreshold`	Success threshold for startupProbe.	`1`
`startupProbe.failureThreshold`	Failure threshold for startupProbe.	`180`

Storage Parameters#

Name	Description	Value
`persistence`	Specify settings to modify the path `/model-store` if `model.legacyCompat` is enabled else `/.cache` volume where the model is served from.
`persistence.enabled`	Enable persistent volumes.	`false`
`persistence.existingClaimName`	Secify existing claim. If using existingClaim, run only one replica or use a ReadWriteMany storage setup.	`""`
`persistence.class`	Specify persistent volume storage class. If null (the default), no storageClassName spec is set, choosing the default provisioner.	`""`
`persistence.retain`	Specify whether the Persistent Volume should survive when the helm chart is upgraded or deleted.	`""`
`persistence.createPV`	True if you need to have the chart create a PV for hostPath use cases.	`false`
`persistence.accessMode`	Specify accessModes. If using an NFS or similar setup, you can use ReadWriteMany.	`ReadWriteOnce`
`persistence.size`	Specify size of claim (e.g. 8Gi).	`50Gi`
`hostPath`	Configures model cache on local disk on the nodes using hostPath – for special cases. One should investigate and understand the security implications before using this option.	`""`

Service Parameters#

Name	Description	Value
`service.type`	Specify service type for the deployment.	`ClusterIP`
`service.name`	Override the default service name.	`""`
`service.http_port`	Specify HTTP Port for the service.	`8080`
`service.annotations`	Specify additional annotations to be added to service.	`{}`