core.num_microbatches_calculator#

Megatron Core number of microbatches calculators.

Module Contents#

Classes#

NumMicroBatchesCalculator

Base class for number of microbatches calculator.

ConstantNumMicroBatchesCalculator

Calculator of number of microbatches with constant global batch size.

StepBatchsizeNumMicroBatchesCalculator

Calculator of number of microbatches with arbitrary step-wise batch size schedule.

Functions#

get_num_microbatches

Get number of microbatches.

get_current_global_batch_size

Get current global batch size.

get_micro_batch_size

Get micro batch size.

get_current_running_global_batch_size

Get current running global batch size, taking into account number of DP replicas might be incompatible with true global batch size if decrease_batch_size_if_needed is True.

update_num_microbatches

Update number of microbatches.

unset_num_microbatches_calculator

Unset microbatches calculator.

init_num_microbatches_calculator

Initialize number of microbatches calculator. Supporting backward compatibility.

destroy_num_microbatches_calculator

Destroy number of microbatches calculator.

reconfigure_num_microbatches_calculator

Reconfigure number of microbatches calculator. Supporting backward compatibility.

_configure_global_num_microbatches_calculator

Configure number of microbatches calculator. Can be used for initialization and reconfiguration.

_build_num_microbatches_calculator

Build number of microbatches calculator. Internal helper method.

_round

Round batch_size down to nearest batch size divisible by divisor.

Data#

API#

core.num_microbatches_calculator.logger#

‘getLogger(…)’

core.num_microbatches_calculator._GLOBAL_NUM_MICROBATCHES_CALCULATOR: Union[core.num_microbatches_calculator.ConstantNumMicroBatchesCalculator, core.num_microbatches_calculator.StepBatchsizeNumMicroBatchesCalculator]#

None

core.num_microbatches_calculator.get_num_microbatches() int#

Get number of microbatches.

core.num_microbatches_calculator.get_current_global_batch_size() int#

Get current global batch size.

core.num_microbatches_calculator.get_micro_batch_size() int#

Get micro batch size.

core.num_microbatches_calculator.get_current_running_global_batch_size() int#

Get current running global batch size, taking into account number of DP replicas might be incompatible with true global batch size if decrease_batch_size_if_needed is True.

core.num_microbatches_calculator.update_num_microbatches(
consumed_samples: int,
consistency_check: bool = True,
verbose: bool = False,
) None#

Update number of microbatches.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool, optional) – Option to check current schedule’s consistency. Defaults to True.

  • verbose (bool, optional) – Option to control logging. Defaults to False.

core.num_microbatches_calculator.unset_num_microbatches_calculator()#

Unset microbatches calculator.

Useful for multiple runs. See tests/unit_tests/ckpt_converter/test_ckpt_converter.py for an example.

core.num_microbatches_calculator.init_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]] = None,
global_batch_size: Optional[int] = None,
micro_batch_size: Optional[int] = None,
data_parallel_size: Optional[int] = None,
decrease_batch_size_if_needed: bool = False,
step_batch_size_schedule: Optional[str] = None,
seq_length: Optional[int] = None,
) None#

Initialize number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Deprecated. This argument is ignored. Use step_batch_size_schedule instead.

  • global_batch_size (Optional[int]) – Global batch size for the model.

  • micro_batch_size (Optional[int]) – Micro batch size at initialization.

  • data_parallel_size (Optional[int]) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

  • step_batch_size_schedule (Optional[str]) – Step batch size schedule string in format “THRESHOLD:BS THRESHOLD:BS …”. Thresholds are interpreted as samples unless seq_length is provided, in which case thresholds are interpreted as tokens and converted to samples. Thresholds support suffixes: K (1e3), M (1e6), B (1e9), T (1e12). Example: “0:768 250B:1536 500B:3072 750B:6144”

  • seq_length (Optional[int]) – Sequence length for token-to-sample conversion when using step_batch_size_schedule. If provided, thresholds are interpreted as tokens. If None, thresholds are samples.

core.num_microbatches_calculator.destroy_num_microbatches_calculator()#

Destroy number of microbatches calculator.

core.num_microbatches_calculator.reconfigure_num_microbatches_calculator(
rank: int,
rampup_batch_size: Optional[List[int]] = None,
global_batch_size: Optional[int] = None,
micro_batch_size: Optional[int] = None,
data_parallel_size: Optional[int] = None,
decrease_batch_size_if_needed: bool = False,
step_batch_size_schedule: Optional[str] = None,
seq_length: Optional[int] = None,
) None#

Reconfigure number of microbatches calculator. Supporting backward compatibility.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • rampup_batch_size (Optional[List[int]]) – Deprecated. This argument is ignored. Use step_batch_size_schedule instead.

  • global_batch_size (Optional[int]) – Global batch size for the model.

  • micro_batch_size (Optional[int]) – Micro batch size at initialization.

  • data_parallel_size (Optional[int]) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

  • step_batch_size_schedule (Optional[str]) – Step batch size schedule string in format “THRESHOLD:BS THRESHOLD:BS …”. Thresholds support suffixes: K (1e3), M (1e6), B (1e9), T (1e12). Example: “0:768 250B:1536 500B:3072 750B:6144”

  • seq_length (Optional[int]) – Sequence length for token-to-sample conversion when using step_batch_size_schedule. If provided, thresholds are interpreted as tokens. If None, thresholds are samples.

core.num_microbatches_calculator._configure_global_num_microbatches_calculator(
rank: int,
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool = False,
step_batch_size_schedule: Optional[str] = None,
seq_length: Optional[int] = None,
init: bool = False,
) None#

Configure number of microbatches calculator. Can be used for initialization and reconfiguration.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • global_batch_size (int) – Global batch size for the model.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool, optional) – If true, scale down batch size to ensure divisibility by DP size * microbatch size. Defaults to False.

  • step_batch_size_schedule (Optional[str]) – Step batch size schedule string in format “THRESHOLD:BS THRESHOLD:BS …”. Thresholds support suffixes: K (1e3), M (1e6), B (1e9), T (1e12). Example: “0:768 250B:1536 500B:3072 750B:6144”

  • seq_length (Optional[int]) – Sequence length for token-to-sample conversion when using step_batch_size_schedule. If provided, thresholds are interpreted as tokens. If None, thresholds are samples.

  • init (bool, optional) – If true, initialize the calculator. Defaults to False.

core.num_microbatches_calculator._build_num_microbatches_calculator(
rank: int,
global_batch_size: Optional[int],
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
step_batch_size_schedule: Optional[str] = None,
seq_length: Optional[int] = None,
) Union[ConstantNumMicroBatchesCalculator, StepBatchsizeNumMicroBatchesCalculator]#

Build number of microbatches calculator. Internal helper method.

Parameters:
  • rank (int) – Rank of the GPU, only rank 0 will log the information.

  • global_batch_size (Optional[int]) – Global batch size for the model. Required for constant mode. Ignored when step_batch_size_schedule is provided.

  • micro_batch_size (int) – Micro batch size at initialization.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, scale down batch size to ensure divisibility by DP size * microbatch size.

  • step_batch_size_schedule (Optional[str]) – Step batch size schedule string in format “THRESHOLD:BS THRESHOLD:BS …”. Thresholds support suffixes: K (1e3), M (1e6), B (1e9), T (1e12). Example: “0:768 250B:1536 500B:3072 750B:6144”

  • seq_length (Optional[int]) – Sequence length for token-to-sample conversion when using step_batch_size_schedule. If provided, thresholds are interpreted as tokens. If None, thresholds are samples.

core.num_microbatches_calculator._round(batch_size: int, divisor: int) int#

Round batch_size down to nearest batch size divisible by divisor.

class core.num_microbatches_calculator.NumMicroBatchesCalculator#

Bases: abc.ABC

Base class for number of microbatches calculator.

Initialization

get() int#

Get number of microbatches.

get_current_global_batch_size() int#

Get current global batch size.

get_micro_batch_size() int#

Get current global batch size.

get_current_running_global_batch_size() int#

Get current running global batch size. If decrease_batch_size_if_needed is False, this just equals global batch size.

abstractmethod update(consumed_samples, consistency_check, verbose=False) None#

Update number of microbatches.

class core.num_microbatches_calculator.ConstantNumMicroBatchesCalculator(
global_batch_size: int,
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
)#

Bases: core.num_microbatches_calculator.NumMicroBatchesCalculator

Calculator of number of microbatches with constant global batch size.

Parameters:
  • global_batch_size (int) – Global batch size.

  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – If true, decrease batch size to ensure divisibility by DP size * microbatch size (if needed).

  • rank (int) – Rank (to determine whether logging should be performed).

Initialization

update(consumed_samples, consistency_check, verbose=False) None#
class core.num_microbatches_calculator.StepBatchsizeNumMicroBatchesCalculator(
micro_batch_size: int,
data_parallel_size: int,
decrease_batch_size_if_needed: bool,
rank: int,
schedule: str,
seq_length: Optional[int] = None,
)#

Bases: core.num_microbatches_calculator.NumMicroBatchesCalculator

Calculator of number of microbatches with arbitrary step-wise batch size schedule.

Parameters:
  • micro_batch_size (int) – Micro batch size.

  • data_parallel_size (int) – Data parallel size.

  • decrease_batch_size_if_needed (bool) – Must be False. Step schedules do not support decreasing batch size for divisibility.

  • rank (int) – Rank for logging.

  • schedule (str) –

    Schedule string in format “THRESHOLD:BS THRESHOLD:BS …”. Thresholds support suffixes: K (1e3), M (1e6), B (1e9), T (1e12). .. rubric:: Examples

    ”0:768 250B:1536 500B:3072 750B:6144” (thresholds in tokens) “0:768 61035156250:1536” (thresholds in samples)

  • seq_length (int, optional) – Sequence length for token-to-sample conversion. If provided, thresholds are interpreted as tokens and converted to samples. If None, thresholds are interpreted as samples directly.

Initialization

static _parse_numeric_value(value_str: str) int#

Parse numeric value with optional suffix (K, M, B, T).

classmethod _parse_schedule(
schedule_str: str,
seq_length: Optional[int],
) List[Tuple[int, int]]#

Parse schedule string into list of (threshold_samples, batch_size) tuples.

Parameters:
  • schedule_str – Space-separated “THRESHOLD:BATCH_SIZE” pairs.

  • seq_length – If provided, convert thresholds from tokens to samples.

Returns:

List of (threshold_samples, batch_size) tuples, sorted by threshold.

_validate_schedule() None#

Validate the parsed schedule.

_get_batch_size_for_samples(consumed_samples: int) int#

Get the batch size for the given number of consumed samples.

update(
consumed_samples: int,
consistency_check: bool,
verbose: bool = False,
) None#

Update number of microbatches based on consumed samples.

Parameters:
  • consumed_samples (int) – Number of samples consumed.

  • consistency_check (bool) – Check divisibility constraints.

  • verbose (bool) – Enable logging.