core.num_microbatches_calculator#

Megatron Core number of microbatches calculators.

Module Contents#

Classes#

NumMicroBatchesCalculator

Base class for number of microbatches calculator.

ConstantNumMicroBatchesCalculator

Calculator of number of microbatches with constant global batch size.

RampupBatchsizeNumMicroBatchesCalculator

Calculator of number of microbatches with batch size rampup. Over steps = (global-batch-size - start-batch-size) / batch_size_increment increment batch size from start-batch-size to global-batch-size using rampup-samples / steps samples.

Functions#

get_num_microbatches

Get number of microbatches.

get_current_global_batch_size

Get current global batch size.

get_micro_batch_size

Get micro batch size.

get_current_running_global_batch_size

Get current running global batch size, taking into account number of DP replicas might be incompatible with true global batch size if decrease_batch_size_if_needed is True.

update_num_microbatches

Update number of microbatches.

unset_num_microbatches_calculator

Unset microbatches calculator.

init_num_microbatches_calculator

Initialize number of microbatches calculator. Supporting backward compatibility.

destroy_num_microbatches_calculator

Destroy number of microbatches calculator.

reconfigure_num_microbatches_calculator

Reconfigure number of microbatches calculator. Supporting backward compatibility.

_configure_global_num_microbatches_calculator

Configure number of microbatches calculator. Can be used for initialization and reconfiguration.

_build_num_microbatches_calculator

Build number of microbatches calculator. Internal helper method.

_round

Round batch_size down to nearest batch size divisible by divisor.

Data#

API#

core.num_microbatches_calculator.logger#

‘getLogger(…)’

core.num_microbatches_calculator._GLOBAL_NUM_MICROBATCHES_CALCULATOR: Union[core.num_microbatches_calculator.ConstantNumMicroBatchesCalculator, core.num_microbatches_calculator.RampupBatchsizeNumMicroBatchesCalculator]#

None

core.num_microbatches_calculator.get_num_microbatches() int#

Get number of microbatches.

core.num_microbatches_calculator.get_current_global_batch_size() int#

Get current global batch size.

core.num_microbatches_calculator.get_micro_batch_size() int#

Get micro batch size.

core.num_microbatches_calculator.get_current_running_global_batch_size() int#

Get current running global batch size, taking into account number of DP replicas might be incompatible with true global batch size if decrease_batch_size_if_needed is True.

core.num_microbatches_calculator.update_num_microbatches(
consumed_samples: int,
consistency_check: bool = True,
verbose: bool = False,
) None#

Update number of microbatches.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool, optional) – Option to check current schedule’s consistency. Defaults to True.

  • verbose (bool, optional) – Option to control logging. Defaults to False.

core.num_microbatches_calculator.unset_num_microbatches_calculator()#

Unset microbatches calculator.

Useful for multiple runs. See tests/unit_tests/ckpt_converter/test_ckpt_converter.py for an example.

core.num_microbatches_calculator.init_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]],
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
) None#

Initialize number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

core.num_microbatches_calculator.destroy_num_microbatches_calculator()#

Destroy number of microbatches calculator.

core.num_microbatches_calculator.reconfigure_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]],
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
) None#

Reconfigure number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

core.num_microbatches_calculator._configure_global_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]],
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
init: bool = False,
) None#

Configure number of microbatches calculator. Can be used for initialization and reconfiguration.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

  • init (bool, optional) – If true, initialize the calculator. Defaults to False.

core.num_microbatches_calculator._build_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]],
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
) Union[ConstantNumMicroBatchesCalculator, RampupBatchsizeNumMicroBatchesCalculator]#

Build number of microbatches calculator. Internal helper method.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Rampup batch size, should be in format of [start_global_batch_size, batch_size_increment, ramup_samples].

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, scale down batch size to ensure divisibility by DP size * microbatch size.

core.num_microbatches_calculator._round(batch_size: int, divisor: int) int#

Round batch_size down to nearest batch size divisible by divisor.

class core.num_microbatches_calculator.NumMicroBatchesCalculator#

Bases: abc.ABC

Base class for number of microbatches calculator.

Initialization

get() int#

Get number of microbatches.

get_current_global_batch_size() int#

Get current global batch size.

get_micro_batch_size() int#

Get current global batch size.

get_current_running_global_batch_size() int#

Get current running global batch size. If decrease_batch_size_if_needed is False, this just equals global batch size.

abstractmethod update(consumed_samples, consistency_check, verbose=False) None#

Update number of microbatches depending on batch size rampup.

class core.num_microbatches_calculator.ConstantNumMicroBatchesCalculator(
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
)#

Bases: core.num_microbatches_calculator.NumMicroBatchesCalculator

Calculator of number of microbatches with constant global batch size.

Parameters:
  • global_batch_size (int) – Global batch size.

  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, decrease batch size to ensure divisibility by DP size * microbatch size (if needed).

  • rank (int) – Rank (to determine whether logging should be performed).

Initialization

update(consumed_samples, consistency_check, verbose=False) None#
class core.num_microbatches_calculator.RampupBatchsizeNumMicroBatchesCalculator(
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
start_global_batch_size: int,
batch_size_increment: int,
ramup_samples: int,
)#

Bases: core.num_microbatches_calculator.NumMicroBatchesCalculator

Calculator of number of microbatches with batch size rampup. Over steps = (global-batch-size - start-batch-size) / batch_size_increment increment batch size from start-batch-size to global-batch-size using rampup-samples / steps samples.

Parameters:
  • global_batch_size (int) – Global batch size post rampup.

  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, decrease batch size to ensure divisibility by DP size * microbatch size (if needed).

  • rank (int) – Rank (to determine whether logging should be performed).

  • start_global_batch_size (int) – Global batch size to start with.

  • batch_size_increment (int) – Global batch size increments.

  • ramup_samples (int) – Number of samples to use ramp up global batch size from start_global_batch_size to global_batch_size.

Initialization

update(
consumed_samples: int,
consistency_check: bool,
verbose: bool = False,
) None#

Update number of microbatches.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool) – Option to check current schedule’s consistency.

  • verbose (bool, optional) – Option to control logging. Defaults to False.