Batching#
Batch size is one of the first parameters you should adjust. For efficiency and convergence, we recommend first maximizing your batch size per GPU to fully utilize your GPU RAM.
NeMo Framework uses the following parameters:
Parameter |
Description |
---|---|
Micro Batch Size |
The number of examples per data parallel rank. |
Global Batch Size |
The global batch size is calculated as: |
Gradient Accumulation |
This parameter supports training with large batch sizes while maintaining a fixed memory footprint, though it requires additional compute. The |
Set the Batching Parameters#
The following example shows how to set up a pretraining recipe and batching parameters for a LLaMA-3 8B model:
from nemo.collections import llm from functools import partial # Load train recipe recipe = partial(llm.llama3_8b.pretrain_recipe)() # Set micro and global batch size recipe.data.micro_batch_size = 4 recipe.data.global_batch_size = 16 # Set accumulate_grad_batches recipe.trainer.accumulate_grad_batches = 1
Set batching parameters directly from CLI:
nemo llm pretrain --factory llama3_8b data.micro_batch_size=4 data.global_batch_size=16 trainer.accumulate_grad_batches=1