`bridge.training.gpt_step`#

Module Contents#

Functions#

`get_batch_from_iterator`	Get a batch of data from the iterator.
`get_batch`	Generate a batch.
`_forward_step_common`	Forward training step.
`forward_step`	Forward training step.
`_create_loss_function`	Create a partial loss function with the specified configuration.
`forward_step_modelopt`	Forward training step with ModelOpt required modifications.
`_create_loss_function_modelopt`	Create a partial loss function with the specified configuration.

Data#

logger

API#

bridge.training.gpt_step.logger#: ‘getLogger(…)’

bridge.training.gpt_step.get_batch_from_iterator( data_iterator: Iterable, use_mtp: bool = False, skip_getting_attention_mask_from_dataset: bool = True, *, is_first_pp_stage: bool, is_last_pp_stage: bool, ) → dict[str, torch.Tensor]#

Get a batch of data from the iterator.

Parameters:

data_iterator – The data iterator to get the batch from.
use_mtp – Whether Multi-Token Prediction layers are enabled.
skip_getting_attention_mask_from_dataset – If set, the dataset will pass a None attention mask.

Returns:

A dictionary containing the batch data.

Return type:

dict[str, torch.Tensor]

bridge.training.gpt_step.get_batch( data_iterator: Iterable, cfg: megatron.bridge.training.config.ConfigContainer, use_mtp: bool = False, *, pg_collection, ) → tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]#

Generate a batch.

Parameters:

data_iterator – Input data iterator
cfg – Configuration container
use_mtp – Whether Multi-Token Prediction layers are enabled

Returns:

tuple of tensors containing tokens, labels, loss_mask, attention_mask, position_ids, cu_seqlens, cu_seqlens_argmin, and max_seqlen

bridge.training.gpt_step._forward_step_common( state: megatron.bridge.training.state.GlobalState, data_iterator: Iterable, model: megatron.core.models.gpt.GPTModel, return_schedule_plan: bool = False, ) → tuple[torch.Tensor, torch.Tensor]#

Forward training step.

Parameters:

state – Global state for the run
data_iterator – Input data iterator
model – The GPT Model
return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor

Returns:

tuple containing the output tensor and loss mask

bridge.training.gpt_step.forward_step( state: megatron.bridge.training.state.GlobalState, data_iterator: Iterable, model: megatron.core.models.gpt.GPTModel, return_schedule_plan: bool = False, ) → tuple[torch.Tensor, functools.partial]#

Forward training step.

Parameters:

state – Global state for the run
data_iterator – Input data iterator
model – The GPT Model
return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor

Returns:

tuple containing the output tensor and the loss function

bridge.training.gpt_step._create_loss_function( loss_mask: torch.Tensor, check_for_nan_in_loss: bool, check_for_spiky_loss: bool, ) → functools.partial#

Create a partial loss function with the specified configuration.

Parameters:

loss_mask – Used to mask out some portions of the loss
check_for_nan_in_loss – Whether to check for NaN values in the loss
check_for_spiky_loss – Whether to check for spiky loss values

Returns:

A partial function that can be called with output_tensor to compute the loss

bridge.training.gpt_step.forward_step_modelopt( state: megatron.bridge.training.state.GlobalState, data_iterator: Iterable, model: megatron.core.models.gpt.GPTModel, return_schedule_plan: bool = False, ) → tuple[torch.Tensor, functools.partial]#

Forward training step with ModelOpt required modifications.

Parameters:

state – Global state for the run
data_iterator – Input data iterator
model – The GPT Model
return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor

Returns:

tuple containing the output tensor and the loss function

bridge.training.gpt_step._create_loss_function_modelopt( loss_mask: torch.Tensor, model: megatron.core.models.gpt.GPTModel, check_for_nan_in_loss: bool, check_for_spiky_loss: bool, ) → functools.partial#

Create a partial loss function with the specified configuration.

Kept here for backward compatibility with tests and callers that patch megatron.bridge.training.gpt_step.masked_next_token_loss.

Parameters:

loss_mask – Used to mask out some portions of the loss
model – The GPT Model
check_for_nan_in_loss – Whether to check for NaN values in the loss
check_for_spiky_loss – Whether to check for spiky loss values

Returns:

A partial function that can be called with output_tensor to compute the loss

bridge.training.gpt_step#

Module Contents#

Functions#

Data#

API#

`bridge.training.gpt_step`#