Managing NeMo Customizer#

About NeMo Customizer#

NeMo Customizer is as a lightweight API server that allows you to run managed training jobs on GPU nodes with the Volcano scheduler. Using this microservice, you can take LLMs models, either from NVIDIA or open-source, and customize them to fit your specific use cases. NeMo Customizer lets you to provide examples of desired responses to prompts and will tailor the model to make future responses matches to your examples.

For more details on customizing your LLMs, refer to the NeMo Customizer fine-tuning documentation.

Prerequisites#

  • All the common NeMo microservice prerequisites.

  • Mimimum system requirements

    • A single-node Kubernetes cluster on a Linux host and cluster-admin level permissions.

    • At least 200 GB of free disk space.

    • At least one dedicated GPUs (A100 80 GB or H100 80 GB)

  • A NeMo Data Store and a NeMo Entity Store deployed on your cluster. The NeMo Entity Store and NeMo Data Store work closely together to hold information about the model entities on your cluster.

  • The NeMo Operator deployed on your cluster. This manages several training custom resources that are required to run customization jobs.

Note

You can use the NeMo Dependencies Ansible Playbook to deploy all the following NeMo Customizer microservice dependencies.

Storage

  • Access to an external PostgreSQL database to store model customization objects.

  • Access to an NFS-backed Persistent Volume that supports ReadWriteMany access mode to enable fast checkpointing and minimize network traffic.

    The NeMo Customizer microservice creates PVCs to hold data while completing fine-tuning jobs. The NeMo Customizer custom resource configures two persistant volumes that are created and used to hold training job and model data.

    • spec.trainingConfig.modelPVC: A PVC used to hold model data completing fine-tuning jobs. You can provide an existing PVC, or have the NIM Operator create the PVC for you. If you delete a NeMo Customizer resource that was created with spec.trainingConfig.modelPVC.create: true, the NIM Operator will also delete the persistent volume (PV) and persistent volume claim (PVC).

    • spec.trainingConfig.workspacePVC: a PVC configuration for the NeMo Operator NemoTrainingJob custom resource. This object defines how the NeMo Operator automatically creates a PVC for each job.

Kubernetes

  • Create required secrets for your database user secret and a W&B API key secret.

    Create a secret file, such as nemo-customizer-secrets.yaml, with contents like the following example:

    ---
    apiVersion: v1
    stringData:
      password: <ncspassword>
    kind: Secret
    metadata:
      name: <customizer-pg-existing-secret>
      namespace: nemo
    type: Opaque
    ---
    apiVersion: v1
    stringData:
      wandb_api_key: <API-key>
    kind: Secret
    metadata:
      name: <wandb-secret>
      namespace: nemo
    type: Opaque
    

    Apply the secret file.

    $ kubectl apply -n nemo -f nemo-customizer-secrets.yaml
    

Deploying a NeMo Customizer#

Update the following sample scripts <inputs> with values for your cluster configuration.

Refer to the Configure NeMo Customizer section for more details on configuration options.

  1. Create a ConfigMap with your training configurations in a file such as nemo-customizer-training-config.yaml, with contents like the following example:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: nemo-training-config
      namespace: nemo
    data:
      training: |
        # Optional additional configuration for training jobs
        container_defaults:
          imagePullPolicy: IfNotPresent
    
  2. Apply the training ConfigMap file.

      $ kubectl apply -n nemo -f nemo-customizer-training-config.yaml
    
  3. Create a ConfigMap with your model configurations in a file such as, nemo-customizer-model-config.yaml, with contents like the following example. Refer to the Model Configurations in the NeMo microservices documentation for details on configuring models and customization targets.

    Note

    The default configuration in the sample below lists all the supported models and enables the meta/llama-3.1-8b-instruct model to be downloaded. Update the configuration to enable one or more models you want to use in your customization job. Each enabled model is downloaded in a PVC by default and downloading several models will increase the storage requirements and startup time for NeMo Customizer.

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: nemo-model-config
      namespace: nemo
    data:
      customizationTargets: |
        overrideExistingTargets: true
        targets:
          meta/llama-3.1-8b-instruct@2.0:
            base_model: meta/llama-3.1-8b-instruct
            enabled: true
            model_path: llama-3_1-8b-instruct_2_0
            model_uri: ngc://nvidia/nemo/llama-3_1-8b-instruct-nemo:2.0
            name: llama-3.1-8b-instruct@2.0
            namespace: meta
            num_parameters: 8000000000
            precision: bf16-mixed
          meta/llama-3.1-70b-instruct@2.0:
            base_model: meta/llama-3.1-70b-instruct
            enabled: false
            model_path: llama-3_1-70b-instruct_2_0
            model_uri: ngc://nvidia/nemo/llama-3_1-70b-instruct-nemo:2.0
            name: llama-3.1-70b-instruct@2.0
            namespace: meta
            num_parameters: 70000000000
            precision: bf16-mixed
          meta/llama-3.2-1b-embedding@0.0.1:
            base_model: meta/llama-3.2-1b-embedding
            enabled: false
            model_path: llama32_1b-embedding
            model_uri: ngc://nvidia/nemo/llama-3_2-1b-embedding-base:0.0.1
            name: llama-3.2-1b-embedding@0.0.1
            namespace: meta
            num_parameters: 1000000000
            precision: bf16-mixed
          meta/llama-3.2-1b-instruct@2.0:
            base_model: meta/llama-3.2-1b-instruct
            enabled: true
            model_path: llama32_1b-instruct_2_0
            model_uri: ngc://nvidia/nemo/llama-3_2-1b-instruct:2.0
            name: llama-3.2-1b-instruct@2.0
            namespace: meta
            num_parameters: 1000000000
            precision: bf16-mixed
          meta/llama-3.2-1b@2.0:
            base_model: meta/llama-3.2-1b
            enabled: false
            model_path: llama32_1b_2_0
            model_uri: ngc://nvidia/nemo/llama-3_2-1b:2.0
            name: llama-3.2-1b@2.0
            namespace: meta
            num_parameters: 1000000000
            precision: bf16-mixed
          meta/llama-3.2-3b-instruct@2.0:
            base_model: meta/llama-3.2-3b-instruct
            enabled: false
            model_path: llama32_3b-instruct_2_0
            model_uri: ngc://nvidia/nemo/llama-3_2-3b-instruct:2.0
            name: llama-3.2-3b-instruct@2.0
            namespace: meta
            num_parameters: 3000000000
            precision: bf16-mixed
          meta/llama-3.3-70b-instruct@2.0:
            base_model: meta/llama-3.3-70b-instruct
            enabled: false
            model_path: llama-3_3-70b-instruct_2_0
            model_uri: ngc://nvidia/nemo/llama-3_3-70b-instruct:2.0
            name: llama-3.3-70b-instruct@2.0
            namespace: meta
            num_parameters: 70000000000
            precision: bf16-mixed
          meta/llama3-70b-instruct@2.0:
            base_model: meta/llama3-70b-instruct
            enabled: false
            model_path: llama-3-70b-bf16_2_0
            model_uri: ngc://nvidia/nemo/llama-3-70b-instruct-nemo:2.0
            name: llama3-70b-instruct@2.0
            namespace: meta
            num_parameters: 70000000000
            precision: bf16-mixed
          microsoft/phi-4@1.0:
            base_model: microsoft/phi-4
            enabled: false
            model_path: phi-4_1_0
            model_uri: ngc://nvidia/nemo/phi-4:1.0
            name: phi-4@1.0
            namespace: microsoft
            num_parameters: 14659507200
            precision: bf16
            version: "1.0"
          nvidia/nemotron-nano-llama-3.1-8b@1.0:
            base_model: nvidia/nemotron-nano-llama-3.1-8b
            enabled: false
            model_path: nemotron-nano-3_1-8b_0_0_1
            model_uri: ngc://nvidia/nemo/nemotron-nano-3_1-8b:0.0.1
            name: nemotron-nano-llama-3.1-8b@1.0
            namespace: nvidia
            num_parameters: 8000000000
            precision: bf16-mixed
          nvidia/nemotron-super-llama-3.3-49b@1.0:
            base_model: nvidia/nemotron-super-llama-3.3-49b
            enabled: false
            model_path: nemotron-super-3_3-49b_v1
            model_uri: ngc://nvidia/nemo/nemotron-super-3_3-49b:v1
            name: nemotron-super-llama-3.3-49b@1.0
            namespace: nvidia
            num_parameters: 8000000000
            precision: bf16-mixed
    
      customizationConfigTemplates: |
        overrideExistingTemplates: true
        templates:
          meta/llama-3.1-8b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.1-8b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.1-8b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 8
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: distillation
          meta/llama-3.1-8b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.1-8b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.1-8b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 2
              tensor_parallel_size: 2
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              tensor_parallel_size: 4
              training_type: sft
          meta/llama-3.1-70b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.1-70b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.1-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
          meta/llama-3.1-70b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.1-70b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.1-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              pipeline_parallel_size: 2
              tensor_parallel_size: 4
              training_type: sft
          meta/llama-3.2-1b-embedding@0.0.1+A100:
            max_seq_length: 2048
            name: llama-3.2-1b-embedding@0.0.1+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-1b-embedding@0.0.1
            training_options:
            - finetuning_type: all_weights
              micro_batch_size: 8
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          meta/llama-3.2-1b-embedding@0.0.1+L40:
            max_seq_length: 2048
            name: llama-3.2-1b-embedding@0.0.1+L40
            namespace: meta
            target: meta/llama-3.2-1b-embedding@0.0.1
            training_options:
            - finetuning_type: all_weights
              micro_batch_size: 4
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          meta/llama-3.2-1b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.2-1b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-1b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: distillation
          meta/llama-3.2-1b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.2-1b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-1b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          meta/llama-3.2-1b@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.2-1b@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-1b@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: distillation
          meta/llama-3.2-1b@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.2-1b@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-1b@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          meta/llama-3.2-3b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.2-3b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-3b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 2
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: distillation
          meta/llama-3.2-3b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.2-3b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.2-3b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          meta/llama-3.3-70b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama-3.3-70b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.3-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
          meta/llama-3.3-70b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama-3.3-70b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama-3.3-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              pipeline_parallel_size: 2
              tensor_parallel_size: 4
              training_type: sft
          meta/llama3-70b-instruct@v1.0.0+A100:
            max_seq_length: 4096
            name: llama3-70b-instruct@v1.0.0+A100
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama3-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
          meta/llama3-70b-instruct@v1.0.0+L40:
            max_seq_length: 4096
            name: llama3-70b-instruct@v1.0.0+L40
            namespace: meta
            prompt_template: '{prompt} {completion}'
            target: meta/llama3-70b-instruct@2.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              pipeline_parallel_size: 2
              tensor_parallel_size: 4
              training_type: sft
          microsoft/phi-4@v1.0.0+A100:
            max_seq_length: 4096
            name: phi-4@v1.0.0+A100
            namespace: microsoft
            prompt_template: '{prompt} {completion}'
            target: microsoft/phi-4@1.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              training_type: sft
          microsoft/phi-4@v1.0.0+L40:
            max_seq_length: 4096
            name: phi-4@v1.0.0+L40
            namespace: microsoft
            prompt_template: '{prompt} {completion}'
            target: microsoft/phi-4@1.0
            training_options:
            - data_parallel_size: 2
              finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 2
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
          nvidia/nemotron-nano-llama-3.1-8b@v1.0.0+A100:
            max_seq_length: 4096
            name: nemotron-nano-llama-3.1-8b@v1.0.0+A100
            namespace: nvidia
            prompt_template: '{prompt} {completion}'
            target: nvidia/nemotron-nano-llama-3.1-8b@1.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 1
              num_nodes: 1
              tensor_parallel_size: 1
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 8
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
          nvidia/nemotron-nano-llama-3.1-8b@v1.0.0+L40:
            max_seq_length: 4096
            name: nemotron-nano-llama-3.1-8b@v1.0.0+L40
            namespace: nvidia
            prompt_template: '{prompt} {completion}'
            target: nvidia/nemotron-nano-llama-3.1-8b@1.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 2
              num_nodes: 1
              tensor_parallel_size: 2
              training_type: sft
            - finetuning_type: all_weights
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              pipeline_parallel_size: 2
              tensor_parallel_size: 4
              training_type: sft
          nvidia/nemotron-super-llama-3.3-49b@v1.0.0+A100:
            max_seq_length: 4096
            name: nemotron-super-llama-3.3-49b@v1.0.0+A100
            namespace: nvidia
            prompt_template: '{prompt} {completion}'
            target: nvidia/nemotron-super-llama-3.3-49b@1.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 1
              tensor_parallel_size: 4
              training_type: sft
          nvidia/nemotron-super-llama-3.3-49b@v1.0.0+L40:
            max_seq_length: 4096
            name: nemotron-super-llama-3.3-49b@v1.0.0+L40
            namespace: nvidia
            prompt_template: '{prompt} {completion}'
            target: nvidia/nemotron-super-llama-3.3-49b@1.0
            training_options:
            - finetuning_type: lora
              micro_batch_size: 1
              num_gpus: 4
              num_nodes: 2
              pipeline_parallel_size: 2
              tensor_parallel_size: 4
              training_type: sft
    
  4. Apply the ConfigMap file.

    $ kubectl apply -n nemo -f nemo-customizer-model-config.yaml
    
  5. Create a file, such as nemo-customizer.yaml, with contents like the following example:

    apiVersion: apps.nvidia.com/v1alpha1
    kind: NemoCustomizer
    metadata:
      name: nemocustomizer-sample
      namespace: nemo
    spec:
      # Scheduler configuration for training jobs (volcano (default))
      scheduler:
        type: "<volcano>"
      # Weights & Biases configuration for experiment tracking
      wandb:
        secretName: <wandb-secret>       # Kubernetes secret that stores WANDB_API_KEY and optionally encryption key
        apiKeyKey: <apiKey>                 # Key in the secret that holds the W&B API key
        encryptionKey: <encryptionKey>   # Key in the secret that holds optional encryption key
      # OpenTelemetry tracing configuration
      otel:
        enabled: true
        exporterOtlpEndpoint: http://<customizer-otel-opentelemetry-collector>.<nemo>.svc.cluster.local:4317
      # PostgreSQL database connection configuration
      databaseConfig:
        credentials:
          user: <ncsuser>                        # Database username
          secretName: <customizer-pg-existing-secret>  # Secret containing password
          passwordKey: <password>               # Key inside secret that contains the password
        host: <customizer-pg-postgresql>.<nemo>.svc.cluster.local
        port: 5432
        databaseName: <ncsdb>
      # Customizer API service exposure settings
      expose:
        service:
          type: ClusterIP
          port: 8000
      # Global image pull settings used in various subcomponents
      image:
        repository: nvcr.io/nvidia/nemo-microservices/customizer-api
        tag: "25.06"
        pullPolicy: IfNotPresent
        pullSecrets:
          - ngc-secret
      # URL to the NeMo Entity Store microservice
      entitystore:
        endpoint: http://<nemoentitystore-sample>.<nemo>.svc.cluster.local:8000
      # URL to the NeMo Data Store microservice
      datastore:
        endpoint: http://<nemodatastore-sample>.<nemo>.svc.cluster.local:8000
      # URL for MLflow tracking server
      mlflow: 
        endpoint: http://<mlflow-tracking>.<nemo>.svc.cluster.local:80
      # Configuration for the data store CLI tools
      nemoDatastoreTools:
        image: nvcr.io/nvidia/nemo-microservices/nds-v2-huggingface-cli:25.06
      # Configuration for model download jobs
      modelDownloadJobs:
        image: "nvcr.io/nvidia/nemo-microservices/customizer-api:25.06"
        ngcAPISecret:
          # Secret that stores NGC API key
          name: ngc-api-secret
          # Key inside secret         
          key: "NGC_API_KEY"                 
        securityContext:
          fsGroup: 1000
          runAsNonRoot: true
          runAsUser: 1000
          runAsGroup: 1000
         # Time (in seconds) to retain job after completion
        ttlSecondsAfterFinished: 600   
        # Polling frequency to check job status     
        pollIntervalSeconds: 15              
      # Name to the ConfigMap containing model definitions
      modelConfig:
        name: <nemo-model-config>
      # Training configuration
      trainingConfig:
        configMap:
          # Optional: Additional configuration to merge into training config
          name: <nemo-training-config>         
        # PVC where model artifacts are cached or used during training
        modelPVC:
          create: true
          name: <finetuning-ms-models-pvc>
          # StorageClass for the PVC (can be empty to use default)
          storageClass: ""
          volumeAccessMode: ReadWriteMany
          size: 50Gi
        # Workspace PVC automatically created per job
        workspacePVC:
          storageClass: ""
          volumeAccessMode: ReadWriteMany
          size: 10Gi
          # Mount path for workspace inside container
          mountPath: /pvc/workspace          
        image:
          repository: nvcr.io/nvidia/nemo-microservices/customizer
          tag: "25.06"
        env:
          - name: LOG_LEVEL
            value: INFO                    
        # Multi-node networking environment variables for training (CSPs)
        networkConfig:
          - name: NCCL_IB_SL
            value: "0"
          - name: NCCL_IB_TC
            value: "41"
          - name: NCCL_IB_QPS_PER_CONNECTION
            value: "4"
          - name: UCX_TLS
            value: TCP
          - name: UCX_NET_DEVICES
            value: eth0
          - name: HCOLL_ENABLE_MCAST_ALL
            value: "0"
          - name: NCCL_IB_GID_INDEX
            value: "3"
        # TTL for training job after it completes
        ttlSecondsAfterFinished: 3600       
        # Timeout duration (in seconds) for training job
        timeout: 3600                       
        # Node tolerations
        tolerations:
          - key: "nvidia.com/gpu"
            operator: "Exists"
            effect: "NoSchedule"
    
  6. Apply the manifest:

    $ kubectl apply -n nemo -f nemo-customizer.yaml
    

    Note

    NeMo Customizer image is large and it will take a few minutes to download from the registy.

Verify NeMo Customizer#

  1. View NeMo Customizer status:

    $ kubectl get nemocustomizer.apps.nvidia.com -n nemo
    

    Partial Output

     NAME                    STATUS     AGE
     nemocustomizer-sample   Ready   7s
    
  2. View information about the NeMo Customizer:

    $ kubectl describe nemocustomizer.apps.nvidia.com  nemocustomizer-sample -n nemo
    

    Partial Output

    ...
    Status:
     Conditions:
       Last Transition Time:  2025-04-24T17:40:04Z
       Message:               deployment "nemocustomizer-sample" successfully rolled out
    
       Reason:                Ready
       Status:                True
       Type:                  Ready
       Last Transition Time:  2025-04-24T17:39:34Z
       Message:
       Reason:                Ready
       Status:                False
       Type:                  Failed
     State:                   Ready
    

Check NeMo Customizer Service is Reachable#

Once you have a NeMo Customizer deployed on your cluster, use the steps below to verify the service is up and runnig.

  1. Start a pod that has access to the curl command. Substitute any pod that has this command and meets your organization’s security requirements.

    $ kubectl run --rm -it -n default curl --image=curlimages/curl:latest -- ash
    

    After the pod starts, you are connected to the ash shell in the pod.

  2. Connect to the NeMo Customizer service

    $ curl -X GET "http://nemocustomizer-sample.nemo:8000/v1/customization/configs"
    
  3. Press Ctrl+D to exit and delete the pod.

Configure NeMo Customizer#

The following table shows more information about the commonly modified fields for the NeMo Data Store custom resource.

Field

Description

Default Value

spec.annotations

Specifies to add the user-supplied annotations to the pod.

None

spec.databaseConfig (required)

Specifies the external PostgreSQL configuration details.

None

spec.databaseConfig.credentials.passwordKey

Specifies the password key used in the database credentials secret.

password

spec.databaseConfig.credentials.secretName (required)

Specifies the secret name for the database credentials.

None

spec.databaseConfig.credentials.user (required)

Specifies the user for the database.

None

spec.databaseConfig.databaseName (required)

Specifies the name for the database.

None

spec.databaseConfig.host (required)

Specifies the endpoint for the database.

None

spec.databaseConfig.port

Specifies the port for the database.

5432

spec.datastore.endpoint (required)

Specifies the endpoint for the NeMo Data Store to use for customization jobs.

none

spec.entitystore.endpoint (required)

Specifies the endpoint for the NeMo Entity Store to use for customization jobs.

none

spec.expose

Specifies attributes to expose a service for this NeMo microservice. Use an expose object to specify Kubernetes Ingress and Service information.

None

spec.expose.ingress.enabled

When set to true, the Operator creates a Ingress resource for the NeMo Customizer. Specify the ingress specification in the spec.expose.ingress.spec field.

If you have an ingress controller, values like the following sample configures an ingress for the / endpoint.

ingress:
  enabled: true
  spec:
    ingressClassName: nginx
    host: nemo-customizer.example.com
    paths:
      - path: /
        pathType: Prefix

false

spec.expose.service.port

Specifies the network port number for the NeMo Evaluator microservice.

8000

spec.expose.service.type

Specifies the Kubernetes service type to create for the NIM microservice.

ClusterIP

spec.groupID

Specifies the group for the pods. This value is used to set the security context of the pod in the runAsGroup and fsGroup fields.

2000

spec.image (required)

Specifies repository, tag, pull policy, and pull secret for the container image. You must specify the repository and tag for the NeMo microservice image you are using.

None

spec.labels

Specifies the user-supplied labels to add to the pod.

None

spec.metrics.enabled

When set to true, the Operator configures a Prometheus service monitor for the service. Specify the service monitor specification in the spec.metrics.serviceMonitor field. Refer to the Observability page for more details.

false

spec.mlflow (required)

Specifies the MLFlow tracking endpoint deployed on your cluster.

None

spec.modelConfig.name (required)

Specifies the name of the ConfigMap containing you model definitions.

None

spec.modelDownloadJobs.hfSecret.key (required)

Specifies the key in the secret that contains your HuggingFace Hub API token. Required if you include the spec.modelDownloadJobs.hfSecret object.

None

spec.modelDownloadJobs.hfSecret.name (required)

Specifies the name in the secret that contains your HuggingFace Hub API token. Required if you include the spec.modelDownloadJobs.hfSecret object.

None

spec.modelDownloadJobs.image (required)

Specifies the image to use for model downloader jobs.

None

spec.modelDownloadJobs.imagePullPolicy

Specifies the image pull policy to use for model downloader image.

None

spec.modelDownloadJobs.ngcAPISecret.key

Specifies the name of the key in your NGC secret that contains your NGC API Key. Refer to the Image Pull Secrets page for more details on creating this secret. Required if you include the spec.modelDownloadJobs.ngcAPISecret object.

None

spec.modelDownloadJobs.ngcAPISecret.name

Specifies the secret name with your NGC API Key. Refer to the Image Pull Secrets page for more details on creating this secret. Required if you include the spec.modelDownloadJobs.ngcAPISecret object.

None

spec.modelDownloadJobs.pollIntervalSeconds (required)

Specifies the polling interval for model download status.

None

spec.modelDownloadJobs.securityContext

Specifies the Kubernetes security context for the model downloader.

None

spec.modelDownloadJobs.ttlSecondsAfterFinished (required)

Specifies the time to live after the model downloader job finishes in seconds.

None

spec.nemoDatastoreTools.image (required)

Specifies the image to use for the NeMo Datastore CLI tools.

None

spec.otel.disableLogging

When set to true, Python logging auto-instrumentation is enabled.

None

spec.otel.disableLogging

When set to true, Python logging auto-instrumentation is enabled.

None

spec.otel.excludeUrls

Specifies URLs to be excluded from tracing.

None

spec.otel.exporterConfig.logsExporter

Specifies the log exporter. Values include otlp, console, none.

None

spec.otel.exporterConfig.metricsExporter

Specifies the metrics exporter. Values include otlp, console, none.

None

spec.otel.exporterConfig.traceExporter

Specifies the trace exporter. Values include otlp, console, none.

None

spec.otel.OtlpEndpoint

Specifies the OpenTelemetry Protocol endpoint.

None

spec.otel.logLevel

Specifies the log level for OpenTelemetry. Values include INFO and DEBUG.

None

spec.replicas

Specifies the number of replicas to have on the cluster.

None

spec.resources.requests

Specifies the memory and CPU request.

None

spec.resources.limits

Specifies the memory and CPU limits.

None

spec.scheduler

Specifies the scheduler type to use for cusotmization jobs. Available values are volcano.

None

spec.tolerations

Specifies the tolerations for the pods.

None

spec.trainingConfig.configMap.name (required)

Specifies a ConfigMap of your training configuration. Its recommended that you create the ConfigMap with your training configurations ahead of creating a NeMo Customizer. Note that if you make adjustments to your trianing configurations after deploying the NeMO Customizer, the service must be restarted. Refer to the NeMo Customizer configuration documentation for details on setting up your training configuration.

None

spec.trainingConfig.env

Specifies enviroment variables passed to training jobs.

None

spec.trainingConfig.image (required)

Specifies the repository, tag, pull policy, and pull secret for the NeMo Customizer image used for training. You must specify the repository and tag image you are using.

None

spec.trainingConfig.modelPVC.create (required)

When set to true, the Operator creates the PVC where model artifacts are cached or used during training. If you delete a NeMo customizer resource and this field was set to true, the Operator deletes the PVC and the cached models.

false

spec.trainingConfig.modelPVC.name (required)

Specifies the PVC name. This field is required if you specify create: false.

The NeMo Customizer resource name with a -pvc suffix.

spec.trainingConfig.modelPVC.size (required)

Specifies the size, in Gi, for the PVC to create.

This field is required if you specify create: true.

None

spec.trainingConfig.modelPVC.storageClass (required)

Specifies the Kubernetes StorageClass for the PVC. Leave this empty to use your cluster’s default StorageClase.

None

spec.trainingConfig.modelPVC.subPath

Specifies the subpath inside the PVC that is mounted. for the PVC to create.

None

spec.trainingConfig.modelPVC.volumeAccessMode (required)

Specifies the access mode for the PVC to create. NeMo Customzier requires a volume access mode of ReadWriteMany.

None

spec.trainingConfig.networkConfig

Specifies the network configuration for multi-node training. Use name and value pairs to define your network. For example,

  - name: NCCL_IB_SL
    value: "0"
  - name: NCCL_IB_TC
    value: "41"
  - name: NCCL_IB_QPS_PER_CONNECTION
    value: "4"
  - name: UCX_TLS
    value: TCP
  - name: UCX_NET_DEVICES
    value: eth0
  - name: HCOLL_ENABLE_MCAST_ALL
    value: "0"
  - name: NCCL_IB_GID_INDEX
    value: "3"

None

spec.trainingConfig.nodeSlector

Specifies the node selector labels for where to run training jobs.

None

spec.trainingConfig.podAffinity

Specifies the PodAffinity for the training jobs.

None

spec.trainingConfig.resources

Specifies the resources for the training jobs.

None

spec.trainingConfig.sharedMemorySizeLimit

Specifies the max size of the shared memory volume (emptyDir) used by training jobs for fast model runtime read and write operations. If not specified, the NIM Operator will create an emptyDir with no resource limit.

None

spec.trainingConfig.timeOut

Specifies the timeout limit for the training jobs to complete.

None

spec.trainingConfig.ttlSecondsAfterFinished

Specifies the time to live after the training job finishes in seconds.

None

spec.trainingConfig.workspacePVC (required)

Specifies PVC configuration for the NeMo Operator NemoTrainingJob custom resource. A PVC is automatically created for each job. Use the workspacePVC object to define how to deploy these PVCs.

None

spec.trainingConfig.workspacePVC.mountPath

Specifies the path where the workspace PVC is mounted within the training job.

/pvc/workspace

spec.trainingConfig.workspacePVC.size (required)

Specifies the size, in Gi, for the PVC to create.

None

spec.trainingConfig.workspacePVC.storageClass (required)

Specifies the Kubernetes StorageClass for the PVC. Leave this empty to use your cluster’s default StorageClase.

None

spec.trainingConfig.workspacePVC.volumeAccessMode (required)

Specifies the access mode for the PVC to create. NeMo Customizer requires a volume access mode of ReadWriteMany.

None

spec.userID

Specifies the user ID for the pod. This value is used to set the security context of the pod in the runAsUser fields.

1000

spec.wandbSecret.apiKeyKey (required)

Specifies the key in the secret that holds the Weights and Biases API key.

None

spec.wandbSecret.encryptionKey

Specifies an optional key in the secret used for encrypting Weights&Biases credentials. This can be used for additional security layers if required.

encryptionKey

spec.wandbSecret.name (required)

Specifies the name of the Kubernetes Secret containing the Weights&Biases API key.

None