bridge.training.audio_lm_step#
Audio-language model training step, independent of vlm_step.py.
Module Contents#
Functions#
Get a batch of data from the iterator for audio-language models. |
|
Generate a batch for audio-language models. |
|
Forward training step for audio-language models. |
Data#
API#
- bridge.training.audio_lm_step.logger#
‘getLogger(…)’
- bridge.training.audio_lm_step.get_batch_from_iterator(
- data_iterator: Iterable,
- use_mtp: bool = False,
- skip_getting_attention_mask_from_dataset: bool = True,
- *,
- is_first_pp_stage: bool,
- is_last_pp_stage: bool,
Get a batch of data from the iterator for audio-language models.
Uses the
audio_inputsbatch key instead ofvisual_inputs.
- bridge.training.audio_lm_step.get_batch(
- data_iterator: Iterable,
- cfg: megatron.bridge.training.config.ConfigContainer,
- use_mtp: bool = False,
- *,
- pg_collection,
Generate a batch for audio-language models.
Adapted from vlm_step.get_batch but uses
audio_inputskey.
- bridge.training.audio_lm_step.forward_step(
- state: megatron.bridge.training.state.GlobalState,
- data_iterator: Iterable,
- model: megatron.core.models.gpt.GPTModel,
- return_schedule_plan: bool = False,
Forward training step for audio-language models.
Uses a local get_batch that extracts audio_inputs instead of visual_inputs.
- Parameters:
state – Global state for the run
data_iterator – Input data iterator
model – The audio-language model
return_schedule_plan (bool) – Whether to return the schedule plan instead of the output tensor
- Returns:
tuple containing the output tensor and the loss function