Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Understand the Configurations

Grasping the structure of preloaded configurations and knowing how to modify them is a crucial aspect of effectively utilizing the NeMo Launcher.

Overall Hierarchy:

The launcher utilizes hierarchical configurations, with the primary file located at conf/config.yaml. All preloaded configurations, optimized and rigorously tested by NVIDIA, are found in the conf folder, organized by stage. The structure of these configuration files is conf/(stage_name)/(model_type)/(model_name).yaml.

Pipeline Configurations

The conf/config.yaml file contains default configuration settings for various stages of your pipeline, including data preparation, training, fine-tuning, evaluation, and more. The stages field specifies the stages that will be executed during the pipeline run.

defaults:
  - _self_
  - cluster: bcm  # Leave it as bcm even if using bcp. It will be ignored for bcp.
  - data_preparation: gpt3/download_gpt3_pile
  - training: gpt3/5b
  - conversion: null
  - fine_tuning: null
  - evaluation: gpt3/evaluate_all
  - fw_inference: null
  - export: null
  - override hydra/job_logging: stdout

stages:
  - data_preparation
  - training
  - fw_inference

Customize the Pipeline for Your Needs:

  1. Include or exclude a stage: To include or exclude a stage in the pipeline, add or remove the stage name from the stages list.

  2. Modify stage configuration settings: To modify the configuration settings for a specific stage, navigate to the appropriate folder in the conf directory (e.g., conf/training for training options) and edit the relevant fields.

  3. Use a different configuration file: To use a different configuration file for a stage, update the corresponding field in the defaults section (e.g., change training: gpt3/5b to training: (model_type)/(model_name)).

  4. Update specific model configurations: Modify the YAML files in conf/(stage_name)/(model_type)/(model_name).yaml to update specific stage configurations, such as the number of nodes, precision, and model configurations.

Cluster Configurations

The first parameter that must be set is the launcher_scripts_path parameter inside the conf/config.yaml file. This parameter must point to the absolute path where the launcher_scripts folder (pulled from the container) is stored in the file system. Additionally, if using a Slurm based cluster, the config file in the subfolder of conf/cluster/bcm.yaml has the parameters to set the generic cluster related information, such as the partition or account parameters. Tailor the cluster configuration below to match your cluster setup.

partition: null
account: null
exclusive: True
gpus_per_task: null
gpus_per_node: 8
mem: 0
overcommit: False
job_name_prefix: "nemo-multimodal-"

Environment Variables Configurations

To configure or add additional environment variables when running pipelines, you can modify or include new fields under the env_vars section in the conf/config.yaml file. If a variable is set to null, it will be ignored.

env_vars:
  NCCL_TOPO_FILE: null # Should be a path to an XML file describing the topology
  UCX_IB_PCI_RELAXED_ORDERING: null # Needed to improve Azure performance
  ...
  TRANSFORMER_OFFLINE: 1

NUMA Mapping Configurations

NUMA mapping is a technique used with multiple processors, where memory access times can vary depending on which processor is accessing the memory. The goal of NUMA mapping is to assign memory to processors in a way that minimizes non-uniform memory access times and ensures that each processor has access to the memory it needs with minimal delay. This technique is important for maximizing system performance in high-performance computing environments.

The NUMA mapping can also be configured from the conf/config.yaml file. The mapping should be automatic; the code will read the number of CPU cores available in your cluster, and provide the best possible mapping, to maximize performance. The mapping is enabled by default, but it can be disabled by setting enable: False in the numa_mapping section of the conf/config.yaml file. The type of mapping can also be configured using the same file. See the full config parameters below:

numa_mapping:
  enable: True  # Set to False to disable all mapping (performance will suffer).
  mode: unique_contiguous  # One of: all, single, single_unique, unique_interleaved or unique_contiguous.
  scope: node  # Either node or socket.
  cores: all_logical  # Either all_logical or single_logical.
  balanced: True  # Whether to assing an equal number of physical cores to each process.
  min_cores: 1  # Minimum number of physical cores per process.
  max_cores: 8  # Maximum number of physical cores per process. Can be null to use all available cores.