Managing NeMo Customizer#
About NeMo Customizer#
NeMo Customizer is as a lightweight API server that allows you to run managed training jobs on GPU nodes with the Volcano scheduler. Using this microservice, you can take LLMs models, either from NVIDIA or open-source, and customize them to fit your specific use cases. NeMo Customizer lets you to provide examples of desired responses to prompts and will tailor the model to make future responses matches to your examples.
Read the NeMo Customizer documentation for details on customizing your LLM models.
Prerequisites#
All the common NeMo microservice prerequisites.
Mimimum system requirements
A single-node Kubernetes cluster on a Linux host and cluster-admin level permissions.
At least 200 GB of free disk space.
At least one dedicated GPUs (A100 80 GB or H100 80 GB)
A NeMo Data Store and a NeMo Entity Store deployed on your cluster. The NeMo Entity Store and NeMo Data Store work closely together to hold information about the model entities on your cluster.
The NeMo Operator deployed on your cluster. This manages several training custom resources that are required to run customization jobs.
Note
You can use the NeMo Dependencies Ansible Playbook to deploy all the following NeMo Customizer microservice dependencies.
A Weights & Biases API Key. The NeMo Customizer microservice uses the provided API Key to send telemetry data including job Id, training loss, validation loss, and more to Weights & Biases to create various training and validation loss curves. Sign up for an W&B API key.
Volcano scheduler installed. Read the Volcano install documentation for details on installing with Helm.
OpenTelemetry Collector installed on your cluster. Read the OpenTelemetry documentation for details on installing OpenTelemetry Collect with Helm.
Storage
Access to an external PostgreSQL database to store model customization objects.
Access to an NFS-backed Persistent Volume that supports
ReadWriteMany
access mode to enable fast checkpointing and minimize network traffic.The NeMo Customizer microservice creates PVCs to hold data while completing fine-tuning jobs. The NeMo Customizer custom resource configures two persistant volumes that are created and used to hold training job and model data.
spec.trainingConfig.modelPVC
: A PVC used to hold model data completing fine-tuning jobs. You can provide an existing PVC, or have the NIM Operator create the PVC for you. If you delete a NeMo Customizer resource that was created withspec.trainingConfig.modelPVC.create: true
, the NIM Operator will also delete the persistent volume (PV) and persistent volume claim (PVC).spec.trainingConfig.workspacePVC
: a PVC configuration for the NeMo Operator NemoTrainingJob custom resource. This object defines how the NeMo Operator automatically creates a PVC for each job.
Kubernetes
Create required secrets for your database user secret and a W&B API key secret.
Create a secret file, such as
nemo-customizer-secrets.yaml
, with contents like the following example:--- apiVersion: v1 stringData: password: <ncspassword> kind: Secret metadata: name: <customizer-pg-existing-secret> namespace: nemo type: Opaque --- apiVersion: v1 stringData: wandb_api_key: <API-key> kind: Secret metadata: name: <wandb-secret> namespace: nemo type: Opaque
Apply the secret file.
$ kubectl apply -n nemo -f nemo-customizer-secrets.yaml
Deploying a NeMo Customizer#
Update the following sample scripts <inputs>
with values for your cluster configuration.
Refer to the Configure NeMo Customizer section for more details on configuration options.
Create a ConfigMap with your training configurations in a file such as
nemo-customizer-training-config.yaml
, with contents like the following example:apiVersion: v1 kind: ConfigMap metadata: name: nemo-training-config namespace: nemo data: training: | # Optional additional configuration for training jobs container_defaults: imagePullPolicy: IfNotPresent
Apply the training ConfigMap file.
$ kubectl apply -n nemo -f nemo-customizer-training-config.yaml
Create a ConfigMap with your model configurations in a file such as,
nemo-customizer-model-config.yaml
, with contents like the following example. Refer to the Model Configurations in the NeMo microservices documentation for details on configuring models.Note
The default configuration in the sample below lists all the supported models and enables the
meta/llama-3.1-8b-instruct
model to be downloaded. Update the configuration to enable one or more models you want to use in your customization job. Each enabled model is downloaded in a PVC by default and downloading several models will increase the storage requirements and startup time for NeMo Customizer.apiVersion: v1 kind: ConfigMap metadata: name: nemo-model-config namespace: nemo data: models: | # -- Llama 3.2 3B Instruct model configuration. # @default -- This object has the following default values for the Llama 3.2 3B Instruct model. meta/llama-3.2-3b-instruct: # -- Whether to enable the model. enabled: false # -- NGC model URI. model_uri: ngc://nvidia/nemo/llama-3_2-3b-instruct:2.0 # -- Path where model files are stored. model_path: llama32_3b-instruct # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 1 num_nodes: 1 tensor_parallel_size: 1 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 3000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3.2 1B model configuration. # @default -- This object has the following default values for the Llama 3.2 1B model. meta/llama-3.2-1b: # -- Whether to enable the model. enabled: false # -- NGC model URI for Llama 3.2 1B model. model_uri: ngc://nvidia/nemo/llama-3_2-1b:2.0 # -- Path where model files are stored. model_path: llama32_1b # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 1 num_nodes: 1 tensor_parallel_size: 1 - training_type: sft finetuning_type: all_weights num_gpus: 1 num_nodes: 1 tensor_parallel_size: 1 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 1000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3.2 1B Instruct model configuration. # @default -- This object has the following default values for the Llama 3.2 1B Instruct model. meta/llama-3.2-1b-instruct: # -- Whether to enable the model. enabled: false # -- NGC model URI for Llama 3.2 1B Instruct model. model_uri: ngc://nvidia/nemo/llama-3_2-1b-instruct:2.0 # -- Path where model files are stored. model_path: llama32_1b-instruct # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 1 num_nodes: 1 tensor_parallel_size: 1 - training_type: sft finetuning_type: all_weights num_gpus: 1 num_nodes: 1 tensor_parallel_size: 1 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 1000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3 70B Instruct model configuration. # @default -- This object has the following default values for the Llama 3 70B Instruct model. meta/llama3-70b-instruct: # -- Whether to enable the model. enabled: false # -- NGC model URI for Llama 3 70B Instruct model. model_uri: ngc://nvidia/nemo/llama-3-70b-instruct-nemo:2.0 # -- Path where model files are stored. model_path: llama-3-70b-bf16 # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 4 num_nodes: 1 tensor_parallel_size: 4 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 70000000000 # -- Micro batch size for training. micro_batch_size: 1 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3.1 8B Instruct model configuration. # @default -- This object has the following default values for the Llama 3.1 8B Instruct model. meta/llama-3.1-8b-instruct: # -- Whether to enable the model. enabled: true # -- NGC model URI for Llama 3.1 8B Instruct model. model_uri: ngc://nvidia/nemo/llama-3_1-8b-instruct-nemo:2.0 # -- Path where model files are stored. model_path: llama-3_1-8b-instruct_0_0_1 # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 1 - training_type: sft finetuning_type: all_weights num_gpus: 8 num_nodes: 1 tensor_parallel_size: 4 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 8000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3.1 70B Instruct model configuration. # @default -- This object has the following default values for the Llama 3.1 70B Instruct model. meta/llama-3.1-70b-instruct: # -- Whether to enable the model. enabled: false # -- NGC model URI for Llama 3.1 70B Instruct model. model_uri: ngc://nvidia/nemo/llama-3_1-70b-instruct-nemo:2.0 # -- Path where model files are stored. model_path: llama-3_1-70b-instruct_0_0_1 # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 4 num_nodes: 1 tensor_parallel_size: 4 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 70000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Phi-4 model configuration. # @default -- This object has the following default values for the Phi-4. microsoft/phi-4: # -- Whether to enable the model. enabled: false # -- NGC model URI for Phi-4 model. model_uri: ngc://nvidia/nemo/phi-4:1.0 # -- Path where model files are stored. model_path: phi-4 # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 1 num_nodes: 1 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 14659507200 # -- Model precision format. precision: bf16 # -- Template for formatting prompts. prompt_template: "{prompt} {completion}" # -- Llama 3.3 70B Instruct model configuration. # @default -- This object has the following default values for the Llama 3.3 70B Instruct model. meta/llama-3.3-70b-instruct: # -- Whether to enable the model. enabled: false # -- NGC model URI for Llama 3.3 70B Instruct model. model_uri: ngc://nvidia/nemo/llama-3_3-70b-instruct:2.0 # -- Path where model files are stored. model_path: llama-3_3-70b-instruct_0_0_1 # -- Training options for different fine-tuning methods. training_options: - training_type: sft finetuning_type: lora num_gpus: 4 num_nodes: 1 tensor_parallel_size: 4 # -- Micro batch size for training. micro_batch_size: 1 # -- Maximum sequence length for input tokens. max_seq_length: 4096 # -- Number of model parameters. num_parameters: 70000000000 # -- Model precision format. precision: bf16-mixed # -- Template for formatting prompts. prompt_template: "{prompt} {completion}"
Apply the ConfigMap file.
$ kubectl apply -n nemo -f nemo-customizer-model-config.yaml
Create a file, such as
nemo-customizer.yaml
, with contents like the following example:apiVersion: apps.nvidia.com/v1alpha1 kind: NemoCustomizer metadata: name: nemocustomizer-sample namespace: nemo spec: # Scheduler configuration for training jobs. Currently, only volcano is required. scheduler: type: "volcano" # Weights & Biases configuration for experiment tracking wandb: secretName: <wandb-secret> # Kubernetes secret that stores WANDB_API_KEY and optionally encryption key apiKeyKey: <apiKey> # Key in the secret that holds the W&B API key encryptionKey: <encryptionKey> # Key in the secret that holds optional encryption key # OpenTelemetry tracing configuration otel: enabled: true exporterOtlpEndpoint: http://<customizer-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317 # PostgreSQL database connection configuration databaseConfig: credentials: user: <ncsuser> # Database username secretName: <customizer-pg-existing-secret> # Secret containing password passwordKey: <password> # Key inside secret that contains the password host: <customizer-pg-postgresql>.<nemo>.svc.cluster.local port: 5432 databaseName: <ncsdb> # Customizer API service exposure settings expose: service: type: ClusterIP port: 8000 # Global image pull settings used in various subcomponents image: repository: nvcr.io/nvidia/nemo-microservices/customizer-api tag: "25.04" pullPolicy: IfNotPresent pullSecrets: - ngc-secret # URL to the NeMo Entity Store microservice entitystore: endpoint: http://<nemoentitystore-sample>.<nemo>.svc.cluster.local:8000 # URL to the NeMo Data Store microservice datastore: endpoint: http://<nemodatastore-sample>.<nemo>.svc.cluster.local:8000 # URL for MLflow tracking server mlflow: endpoint: http://<mlflow-tracking>.<nemo>.svc.cluster.local:80 # Configuration for the data store CLI tools nemoDatastoreTools: image: nvcr.io/nvidia/nemo-microservices/nds-v2-huggingface-cli:25.04 # Configuration for model download jobs modelDownloadJobs: image: "nvcr.io/nvidia/nemo-microservices/customizer-api:25.04" ngcAPISecret: # Secret that stores NGC API key name: ngc-api-secret # Key inside secret key: "NGC_API_KEY" securityContext: fsGroup: 1000 runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 # Time (in seconds) to retain job after completion ttlSecondsAfterFinished: 600 # Polling frequency to check job status pollIntervalSeconds: 15 # Name to the ConfigMap containing model definitions modelConfig: name: <nemo-model-config` # Training configuration trainingConfig: configMap: # Optional: Additional configuration to merge into training config name: <nemo-training-config> # PVC where model artifacts are cached or used during training modelPVC: create: true name: <finetuning-ms-models-pvc> # StorageClass for the PVC (can be empty to use default) storageClass: "" volumeAccessMode: ReadWriteMany size: 50Gi # Workspace PVC automatically created per job workspacePVC: storageClass: "" volumeAccessMode: ReadWriteMany size: 10Gi # Mount path for workspace inside container mountPath: /pvc/workspace image: repository: nvcr.io/nvidia/nemo-microservices/customizer tag: "25.04" env: - name: LOG_LEVEL value: INFO # Multi-node networking environment variables for training (CSPs) networkConfig: - name: NCCL_IB_SL value: "0" - name: NCCL_IB_TC value: "41" - name: NCCL_IB_QPS_PER_CONNECTION value: "4" - name: UCX_TLS value: TCP - name: UCX_NET_DEVICES value: eth0 - name: HCOLL_ENABLE_MCAST_ALL value: "0" - name: NCCL_IB_GID_INDEX value: "3" # TTL for training job after it completes ttlSecondsAfterFinished: 3600 # Timeout duration (in seconds) for training job timeout: 3600 # Node tolerations tolerations: - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule"
Apply the manifest:
$ kubectl apply -n nemo -f nemo-customizer.yaml
Note
NeMo Customizer image is large and it will take a few minutes to download from the registy.
Verify NeMo Customizer#
View NeMo Customizer status:
$ kubectl get nemocustomizer.apps.nvidia.com -n nemo
Partial Output
NAME STATUS AGE nemocustomizer-sample Ready 7s
View information about the NeMo Customizer:
$ kubectl describe nemocustomizer.apps.nvidia.com nemocustomizer-sample -n nemo
Partial Output
... Status: Conditions: Last Transition Time: 2025-04-24T17:40:04Z Message: deployment "nemocustomizer-sample" successfully rolled out Reason: Ready Status: True Type: Ready Last Transition Time: 2025-04-24T17:39:34Z Message: Reason: Ready Status: False Type: Failed State: Ready
Check NeMo Customizer Service is Reachable#
Once you have a NeMo Customizer deployed on your cluster, use the steps below to verify the service is up and runnig.
Start a pod that has access to the
curl
command. Substitute any pod that has this command and meets your organization’s security requirements.$ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
After the pod starts, you are connected to the
ash
shell in the pod.Connect to the NeMo Customizer service
$ curl -X GET "http://nemocustomizer-sample.nemo:8000/v1/customization/configs"
Press Ctrl+D to exit and delete the pod.
Configure NeMo Customizer#
The following table shows more information about the commonly modified fields for the NeMo Data Store custom resource.
Field |
Description |
Default Value |
---|---|---|
|
Specifies to add the user-supplied annotations to the pod. |
None |
|
Specifies the external PostgreSQL configuration details. |
None |
|
Specifies the password key used in the database credentials secret. |
|
|
Specifies the secret name for the database credentials. |
None |
|
Specifies the user for the database. |
None |
|
Specifies the name for the database. |
None |
|
Specifies the endpoint for the database. |
None |
|
Specifies the port for the database. |
|
|
Specifies the endpoint for the NeMo Data Store to use for customization jobs. |
none |
|
Specifies the endpoint for the NeMo Entity Store to use for customization jobs. |
none |
|
Specifies attributes to expose a service for this NeMo microservice. Use an expose object to specify Kubernetes Ingress and Service information. |
None |
|
When set to If you have an ingress controller, values like the following sample configures an ingress for the ingress:
enabled: true
spec:
ingressClassName: nginx
host: nemo-customizer.example.com
paths:
- path: /
pathType: Prefix
|
|
|
Specifies the network port number for the NeMo Evaluator microservice. |
|
|
Specifies the Kubernetes service type to create for the NIM microservice. |
|
|
Specifies the group for the pods.
This value is used to set the security context of the pod in the |
|
|
Specifies repository, tag, pull policy, and pull secret for the container image. You must specify the repository and tag for the NeMo microservice image you are using. |
None |
|
Specifies the user-supplied labels to add to the pod. |
None |
|
When set to |
|
|
Specifies the MLFlow tracking endpoint deployed on your cluster. |
None |
|
Specifies the name of the ConfigMap containing you model definitions. |
None |
|
Specifies the image to use for model downloader jobs. |
None |
|
Specifies the image pull policy to use for model downloader image. |
None |
|
Specifies the name of the key in your NGC secret that contains your NGC API Key. Refer to the Image Pull Secrets page for more details on creating this secret. |
None |
|
Specifies the secret name with your NGC API Key. Refer to the Image Pull Secrets page for more details on creating this secret. |
None |
|
Specifies the polling interval for model download status. |
None |
|
Specifies the Kubernetes security context for the model downloader. |
None |
|
Specifies the time to live after the model downloader job finishes in seconds. |
None |
|
Specifies the image to use for the NeMo Datastore CLI tools. |
None |
|
When set to |
None |
|
When set to |
None |
|
Specifies URLs to be excluded from tracing. |
None |
|
Specifies the log exporter. Values include |
None |
|
Specifies the metrics exporter. Values include |
None |
|
Specifies the trace exporter. Values include |
None |
|
Specifies the OpenTelemetry Protocol endpoint. |
None |
|
Specifies the log level for OpenTelemetry. Values include |
None |
|
Specifies the number of replicas to have on the cluster. |
None |
|
Specifies the memory and CPU request. |
None |
|
Specifies the memory and CPU limits. |
None |
|
Specifies the scheduler type to use for cusotmization jobs.
Available values are |
None |
|
Specifies the tolerations for the pods. |
None |
|
Specifies a ConfigMap of your training configuration. Its recommended that you create the ConfigMap with your training configurations ahead of creating a NeMo Customizer. Note that if you make adjustments to your trianing configurations after deploying the NeMO Customizer, the service must be restarted. Refer to the NeMo Customizer configuration documentation for details on setting up your training configuration. |
None |
|
Specifies enviroment variables passed to training jobs. |
None |
|
Specifies the repository, tag, pull policy, and pull secret for the NeMo Customizer image used for training. You must specify the repository and tag image you are using. |
None |
|
When set to |
|
|
Specifies the PVC name.
This field is required if you specify |
The NeMo Customizer resource name with a |
|
Specifies the size, in Gi, for the PVC to create. This field is required if you specify |
None |
|
Specifies the Kubernetes StorageClass for the PVC. Leave this empty to use your cluster’s default StorageClase. |
None |
|
Specifies the subpath inside the PVC that is mounted. for the PVC to create. |
None |
|
Specifies the access mode for the PVC to create. NeMo Customzier requires a volume access mode of ReadWriteMany. |
None |
|
Specifies the network configuration for multi-node training.
Use - name: NCCL_IB_SL
value: "0"
- name: NCCL_IB_TC
value: "41"
- name: NCCL_IB_QPS_PER_CONNECTION
value: "4"
- name: UCX_TLS
value: TCP
- name: UCX_NET_DEVICES
value: eth0
- name: HCOLL_ENABLE_MCAST_ALL
value: "0"
- name: NCCL_IB_GID_INDEX
value: "3"
|
None |
|
Specifies the node selector labels for where to run training jobs. |
None |
|
Specifies the PodAffinity for the training jobs. |
None |
|
Specifies the resources for the training jobs. |
None |
|
Specifies the timeout limit for the training jobs to complete. |
None |
|
Specifies the time to live after the training job finishes in seconds. |
None |
|
Specifies PVC configuration for the NeMo Operator NemoTrainingJob custom resource.
A PVC is automatically created for each job.
Use the |
None |
|
Specifies the path where the workspace PVC is mounted within the training job. |
|
|
Specifies the size, in Gi, for the PVC to create. |
None |
|
Specifies the Kubernetes StorageClass for the PVC. Leave this empty to use your cluster’s default StorageClase. |
None |
|
Specifies the access mode for the PVC to create. NeMo Customizer requires a volume access mode of ReadWriteMany. |
None |
|
Specifies the user ID for the pod.
This value is used to set the security context of the pod in the |
|
|
Specifies the key in the secret that holds the Weights and Biases API key. |
None |
|
Specifies an optional key in the secret used for encrypting Weights&Biases credentials. This can be used for additional security layers if required. |
|
|
Specifies the name of the Kubernetes Secret containing the Weights&Biases API key. |
None |