Microbatches Calculator#

This api is used to calculate the number of microbatches required to fit a given model on a given batch size.

Module contents#

Megatron Core number of microbatches calculators.

class core.num_microbatches_calculator.ConstantNumMicroBatchesCalculator(
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
)#

Bases: NumMicroBatchesCalculator

Calculator of number of microbatches with constant global batch size.

Parameters:
  • global_batch_size (int) – Global batch size.

  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, decrease batch size to ensure divisibility by DP size * microbatch size (if needed).

  • rank (int) – Rank (to determine whether logging should be performed).

update(
consumed_samples,
consistency_check,
verbose=False,
) None#

Update number of microbatches depending on batch size rampup.

class core.num_microbatches_calculator.NumMicroBatchesCalculator#

Bases: ABC

Base class for number of microbatches calculator.

get() int#

Get number of microbatches.

get_current_global_batch_size() int#

Get current global batch size.

get_current_running_global_batch_size() int#

Get current running global batch size. If decrease_batch_size_if_needed is False, this just equals global batch size.

get_micro_batch_size() int#

Get current global batch size.

abstract update(
consumed_samples,
consistency_check,
verbose=False,
) None#

Update number of microbatches depending on batch size rampup.

class core.num_microbatches_calculator.RampupBatchsizeNumMicroBatchesCalculator(
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
start_global_batch_size: int,
batch_size_increment: int,
ramup_samples: int,
)#

Bases: NumMicroBatchesCalculator

Calculator of number of microbatches with batch size rampup. Over steps = (global-batch-size - start-batch-size) / batch_size_increment increment batch size from start-batch-size to global-batch-size using rampup-samples / steps samples.

Parameters:
  • global_batch_size (int) – Global batch size post rampup.

  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, decrease batch size to ensure divisibility by DP size * microbatch size (if needed).

  • rank (int) – Rank (to determine whether logging should be performed).

  • start_global_batch_size (int) – Global batch size to start with.

  • batch_size_increment (int) – Global batch size increments.

  • ramup_samples (int) – Number of samples to use ramp up global batch size from start_global_batch_size to global_batch_size.

update(
consumed_samples: int,
consistency_check: bool,
verbose: bool = False,
) None#

Update number of microbatches.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool) – Option to check current schedule’s consistency.

  • verbose (bool, optional) – Option to control logging. Defaults to False.

core.num_microbatches_calculator.destroy_num_microbatches_calculator()#

Destroy number of microbatches calculator.

core.num_microbatches_calculator.get_current_global_batch_size() int#

Get current global batch size.

core.num_microbatches_calculator.get_current_running_global_batch_size() int#

Get current running global batch size, taking into account number of DP replicas might be incompatible with true global batch size if decrease_batch_size_if_needed is True.

core.num_microbatches_calculator.get_micro_batch_size() int#

Get micro batch size.

core.num_microbatches_calculator.get_num_microbatches() int#

Get number of microbatches.

core.num_microbatches_calculator.init_num_microbatches_calculator(
rank: int,
rampup_batch_size: List[int] | None,
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
) None#

Initialize number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

core.num_microbatches_calculator.reconfigure_num_microbatches_calculator(
rank: int,
rampup_batch_size: List[int] | None,
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
) None#

Reconfigure number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

core.num_microbatches_calculator.unset_num_microbatches_calculator()#

Unset microbatches calculator.

Useful for multiple runs. See tests/unit_tests/ckpt_converter/test_ckpt_converter.py for an example.

core.num_microbatches_calculator.update_num_microbatches(
consumed_samples: int,
consistency_check: bool = True,
verbose: bool = False,
) None#

Update number of microbatches.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool, optional) – Option to check current schedule’s consistency. Defaults to True.

  • verbose (bool, optional) – Option to control logging. Defaults to False.