Text Embedding (Latest)
Microservices

Appendix A: Helm Chart Parameters

Name

Description

Value

affinity Affinity settings for deployment. Allows to constraint pods to nodes. {}
securityContext Specify privilege and access control settings for Container(Only affects the main container) {}
envVars Adds arbitrary environment variables to the main container - Key Value Pairs {}
extraVolumes Adds arbitrary additional volumes to the deployment set definition {}
image.repository NIM-LLM Image Repository ""
image.tag Image tag ""
image.pullPolicy Image pull policy ""
imagePullSecrets Specify secret names that are needed for the main container and any init containers. Object keys are the names of the secrets {}
nodeSelector Specify labels to ensure that NeMo Inference is deployed only on certain nodes (likely best to set this to nvidia.com/gpu.present: "true" depending on cluster setup). {}
podAnnotations Specify additional annotation to the main deployment pods {}
podSecurityContext Specify privilege and access control settings for pod (Only affects the main pod).

podSecurityContext.runAsUser Specify user UID for pod. 1000
podSecurityContext.runAsGroup Specify group ID for pod. 1000
podSecurityContext.fsGroup Specify file system owner group id. 1000
replicaCount Specify replica count for deployment. 1
resources Specify resources limits and requests for the running service.

resources.limits.nvidia.com/gpu Specify number of GPUs to present to the running service. 1
serviceAccount.create Specifies whether a service account should be created. false
serviceAccount.annotations Specifies annotations to be added to the service account. {}
serviceAccount.automount Specifies whether to automatically mount the service account to the container. {}
serviceAccount.name Specify name of the service account to use. If it is not set and create is true, a name is generated using a fullname template. ""
tolerations Specify tolerations for pod assignment. Allows the scheduler to schedule pods with matching taints.

Values used for autoscaling. If autoscaling is not enabled, these are ignored. They should be overridden on a per-model basis based on quality-of-service metrics as well as cost metrics. This isn’t recommended except with usage of the custom metrics API using something like the prometheus-adapter. Standard metrics of CPU and memory are of limited use in scaling NIM

Name

Description

Value

autoscaling.enabled Enable horizontal pod autoscaler. false
autoscaling.minReplicas Specify minimum replicas for autoscaling. 1
autoscaling.maxReplicas Specify maximum replicas for autoscaling. 10
autoscaling.metrics Array of metrics for autoscaling. []

Name

Description

Value

ingress.enabled Enables ingress. false
ingress.className Specify class name for Ingress. ""
ingress.annotations Specify additional annotations for ingress. {}
ingress.hosts Specify list of hosts each containing lists of paths.

ingress.hosts[0].host Specify name of host. chart-example.local
ingress.hosts[0].paths[0].path Specify ingress path. /
ingress.hosts[0].paths[0].pathType Specify path type. ImplementationSpecific
ingress.hosts[0].paths[0].serviceType Specify service type. It can be can be nemo or openai – make sure your model serves the appropriate port(s). openai
ingress.tls Specify list of pairs of TLS secretName and hosts. []

Name

Description

Value

livenessProbe.enabled Enable livenessProbe true
livenessProbe.method LivenessProbe http or script, but no script is currently provided http
livenessProbe.path LivenessProbe endpoint path /v1/health/live
livenessProbe.initialDelaySeconds Initial delay seconds for livenessProbe 15
livenessProbe.timeoutSeconds Timeout seconds for livenessProbe 1
livenessProbe.periodSeconds Period seconds for livenessProbe 10
livenessProbe.successThreshold Success threshold for livenessProbe 1
livenessProbe.failureThreshold Failure threshold for livenessProbe 3
readinessProbe.enabled Enable readinessProbe true
readinessProbe.path Readiness Endpoint Path /v1/health/ready
readinessProbe.initialDelaySeconds Initial delay seconds for readinessProbe 15
readinessProbe.timeoutSeconds Timeout seconds for readinessProbe 1
readinessProbe.periodSeconds Period seconds for readinessProbe 10
readinessProbe.successThreshold Success threshold for readinessProbe 1
readinessProbe.failureThreshold Failure threshold for readinessProbe 3
startupProbe.enabled Enable startupProbe true
startupProbe.path StartupProbe Endpoint Path /v1/health/ready
startupProbe.initialDelaySeconds Initial delay seconds for startupProbe 40
startupProbe.timeoutSeconds Timeout seconds for startupProbe 1
startupProbe.periodSeconds Period seconds for startupProbe 10
startupProbe.successThreshold Success threshold for startupProbe 1
startupProbe.failureThreshold Failure threshold for startupProbe 180

Name

Description

Value

persistence Specify settings to modify the path /model-store if model.legacyCompat is enabled else /.cache volume where the model is served from.

persistence.enabled Enable persistent volumes. false
persistence.existingClaimName Secify existing claim. If using existingClaim, run only one replica or use a ReadWriteMany storage setup. ""
persistence.class Specify persistent volume storage class. If null (the default), no storageClassName spec is set, choosing the default provisioner. ""
persistence.retain Specify whether the Persistent Volume should survive when the helm chart is upgraded or deleted. ""
persistence.createPV True if you need to have the chart create a PV for hostPath use cases. false
persistence.accessMode Specify accessModes. If using an NFS or similar setup, you can use ReadWriteMany. ReadWriteOnce
persistence.size Specify size of claim (e.g. 8Gi). 50Gi
lhostPath Configures model cache on local disk on the nodes using hostPath – for special cases. One should investigate and understand the security implications before using this option. ""

Name

Description

Value

service.type Specify service type for the deployment. ClusterIP
service.name Override the default service name ""
service.http_port Specify HTTP Port for the service. 8080
service.annotations Specify additional annotations to be added to service. {}

Name

Description

Value

zipkinDeployed Specify if this chart should deploy zipkin for metrics. false
otelDeployed Specify if this chart should deploy OpenTelemetry for metrics. false
otelEnabled Specify if this chart should sink metrics to OpenTelemetry. false
otelEnvVars Env variables to configure OTEL in the container, sane defaults in chart. {}
logLevel Log Level to set for the container and metrics collection. {}

OpenTelemetry configurations can be found in the values section of the OpenTelemetry repository.

Note

You should configure the OpenTelemetry exporters according to your needs. The provided helm chart provides sample configuration for exporting traces to Zipkin, and metrics to an OTLP-compatible receiver, stored at opentelemetry-collector.config.exporters.zipkin and opentelemetry-collector.config.exporters.otlp, respectively.

For example, if your metrics setup operates in a pull-based fashion and you want to expose NIM metrics in Prometheus format, you can do so by replacing the OTLP exporter with a Prometheus exporter.

Previous Governing Terms
© Copyright © 2024, NVIDIA Corporation. Last updated on Jul 23, 2024.