Model Configurations#

The models available to train with NeMo Customizer are configurable through the Helm chart. Refer to the Model Catalog for a list of supported models.

Configure Models#

To make specific models available in your deployment, you’ll need to enable them in your Helm chart’s values. While the values.yaml file includes default configurations for all supported models, you can choose which ones to activate.

For example, to enable just two models - meta/llama-3.1-8b-instruct and mistralai/mistral-7b-instruct-v0.3 - you would add the following configuration to your values file. This will use their default settings while keeping all other models hidden from the API:

customizerConfig:
  models:
    meta/llama-3.1-8b-instruct:
      enabled: true
    mistralai/mistral-7b-instruct-v0.3:
      enabled: true

Configure Training Methods#

Each model can be configured with specific training methods and resource allocations. You can customize these settings by configuring the training_options in your values file.

Here’s an example that configures a model with PEFT (specifically LoRA) and full SFT methods:

customizerConfig:
  models:
    meta/llama-3.1-8b-instruct:
      enabled: true
      training_options:
        - training_type: sft
          finetuning_type: lora
          num_gpus: 1
        - training_type: sft
          finetuning_type: all-weights
          num_gpus: 8

Configure Resources#

The Helm chart comes with default GPU configurations that have been tested for PEFT training with typical dataset sizes. While the default setup assumes 8 GPUs per node, you can adjust these settings based on your cluster’s capabilities and your specific needs. For example, if you need more computational power, you can scale up the GPU allocation. Here’s how to increase the resources for LoRA training from the default 4 GPUs to 16 GPUs (spread across 2 nodes):

customizerConfig:
  models:
    meta/llama-3.1-8b-instruct:
      enabled: true
      training_options:
        - training_type: sft
          finetuning_type: lora
          num_gpus: 8
          num_nodes: 2

You can optionally increase the training parallelism by increasing the number of GPUs if your cluster has more resources.

A common use case for increasing the number of GPUs is if a number of training jobs fail from running into an out of memory error, like the following error capture from a job log.

[rank12]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 4 has a total capacity of 79.25 GiB of which 134.75 MiB is free. Process 1538798 has 79.10 GiB memory in use. Of the allocated memory 76.27 GiB is allocated by PyTorch, and 245.30 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Configuration Options#

Configuration Parameters#

Parameter

Description

run_local_jobs

Set to true to run in standalone mode

entity_store_url

URL of the Entity Store microservice for managing dataset and model entities

nemo_data_store_url

URL of NeMo Data Store for Customizer to connect to for dataset and model files

models

List of models to expose for training with the NeMo Customizer microservice

models.<name>

Model name as map key (must match NeMo Data Store and NVIDIA Inference Microservices (NIM) for Large Language Models (LLMs))

models.<name>.enabled

Set to true to expose the model for training

models.<name>.model_uri

URI to download the model from. Formats:

  • NGC: ngc://org/[team]/model-name[:version]

  • NeMo Data Store: hf://namespace/model-name[@version]

models.<name>.model_path

Directory for model download. Can be absolute path or relative to MODELS_CACHE_PATH or PVC mount

models.<name>.micro_batch_size

Micro batch size for training. Larger sizes improve efficiency but risk out-of-memory errors. Use 1 for local mode

models.<name>.training_options

Training configuration settings for the model

models.<name>.training_options.training_type

Training objective (currently supports Supervised Fine-Tuning (SFT))

models.<name>.training_options.finetuning_type

Fine-tuning method (supports LoRA)

models.<name>.training_options.num_gpus

Number of GPUs for training (must not exceed available node GPUs)