Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Common Configuration Files#
This section describes the NeMo configuration file setup that is specific to models in the MM Text2Img collection. For general information about how to set up and run experiments that is common to all NeMo models (e.g. Experiment Manager and PyTorch Lightning trainer parameters), see the Core Documentation section.
The model section of the NeMo Multimodal Text2Img configuration files generally requires information about the dataset(s) being used, the text and image encoder, parameters for any augmentation being performed, as well as the model architecture specification. The sections on this page cover each of these in more detail.
Example configuration files for all of the NeMo Multimodal Text2Img scripts can be found in the config directory of the examples.
Dataset Configuration#
Training, validation, and test parameters are specified using the train
, validation
, and
test
sections in the configuration file, respectively. Depending on the task, there may be arguments specifying the augmentations
for the dataset, the resolution filter for filtering out images, and so on.
Any initialization parameter that is accepted for the Dataset class used in the experiment can be set in the config file. Refer to the Datasets section of the API for a list of Datasets and their respective parameters.
An example Text2Img train configuration should look similar to the following:
model:
data:
num_workers: 16 # The number of workers for dataloader process
train:
dataset_path: # List of wdinfo files for the datasets to train on
- dataset1.pkl
- dataset2.pkl
augmentations:
resize_samllest_side: 64 # Resize the smallest side of the image to the specified resolution
center_crop_h_w: 64, 64 # Center cropping
horizontal_flip: False # Whether to perform horizontal flip
filterings:
resolution:
method: larger
value: 64
webdataset:
use_webdataset: True
infinite_sampler: false
local_root_path: ??? # Path that stores the dataset
verbose: False # Whether to print detail debugging information
Currently, our diffusion-based Text2Img models do not require validation steps for faster convergence.
As discussed in Datasets, storing training dataset in webdataset format is the requirement for all
text2img training pipeline. Using webdataset.infinite_sampler=True
is the preferred way for training especially if the dataset
is large as suggested by Webdataset Multinode Training Guideline .
Enabling train.filterings
allows one to filter out images (and corresponding text pairs) based on some common use cases (e.g., minimum resolution)
without having to create a redundant subset of the webdataset on the disk prior to training. The example above showcases how to filter the dataset so that only images with a resolution
larger than 64x64 will be used for training. Concatenating multiple webdataset is as easy as listing all wdinfo files in
train.dataset_path
.
Trainer Configuration#
Trainer configuration specifies the arguments for Pytorch Lightning Trainer Object.
trainer:
devices: 1 # number of GPUs (0 for CPU), or list of the GPUs to use e.g. [0, 1]
num_nodes: 1
max_epochs: -1
max_steps: 2500000 # precedence over max_epochs
logger: False # Provided by exp_manager
precision: bf16 # Should be set to 16 for O1 and O2 to enable the AMP.
accelerator: gpu
log_every_n_steps: 5 # Interval of logging.
resume_from_checkpoint: null # The path to a checkpoint file to continue the training, restores the whole state including the epoch, step, LR schedulers, apex, etc.
num_sanity_val_steps: 10 # number of steps to perform validation steps for sanity check the validation process before starting the training, setting to 0 disables it
enable_checkpointing: False # Provided by exp_manager
accumulate_grad_batches: 1 # do not modify, grad acc is automatic for training megatron models
gradient_clip_val: 1.0
benchmark: False
enable_model_summary: True
Refer to the Pytorch Lightning Trainer API section for all possible arguments
Experiment Manager Configurations#
NeMo Experiment Manager provides convenient way to configure logging, saving, resuming options and more.
exp_manager:
exp_dir: null # exp_dir for your experiment, if None, defaults to "./nemo_experiments"
name: ${name}
create_wandb_logger: True
wandb_logger_kwargs: # Whether you want exp_manger to create a Wandb logger
name: training-session
project: text2img
group: nemo
resume: True
create_tensorboard_logger: True # Whether you want exp_manger to create a tb logger
create_checkpoint_callback: True # Whether you want exp_manager to create a model checkpoint callback
checkpoint_callback_params:
monitor: reduced_train_loss
save_top_k: 5
every_n_epochs: 0 # Save checkpoint frequency.
every_n_train_steps: 1000 # Mutually exclusive with every_n_epochs. It is recommended to set this if training on large-scale dataset.
filename: '${name}--{reduced_train_loss:.2f}-{step}-{consumed_samples}'
resume_if_exists: True
resume_ignore_no_checkpoint: True
resume_from_checkpoint: ${model.resume_from_checkpoint}
ema:
enable: True
decay: 0.9999
validate_original_weights: False
every_n_steps: 1
cpu_offload: False
EMA feature can be enabled by setting exp_manager.ema.enable=True
.
Optimizer Configurations#
optim:
name: fused_adam
lr: 0.0001
eps: 1e-8
betas: [ 0.9, 0.999 ]
weight_decay: 0.01
sched:
name: WarmupPolicy
warmup_steps: 10000
warmup_ratio: null
By default we use fused_adam
as the optimizer, refer to NeMo user guide for all supported optimizers.
Learning rate scheduler can be specified in optim.sched
section.
Model Architecture Configurations#
Each configuration file should describe the model architecture being used for the experiment.
Here is the list of the parameters in the model section which are shared among most of the MM Text2Img models:
Parameter |
Datatype |
Description |
---|---|---|
|
int |
micro batch size that fits on each GPU |
|
int |
global batch size that takes consideration of gradient accumulation, data parallelism |
|
bool |
enable TorchInductor optimization |
|
bool |
enable NHWC training format |
|
int |
seed used in training |