bridge.models.nemotron_omni.nemotron_omni_sound#

Module Contents#

Classes#

BridgeSoundEncoder

Sound encoder wrapper for Bridge that wraps HF transformers’ ParakeetEncoder.

API#

class bridge.models.nemotron_omni.nemotron_omni_sound.BridgeSoundEncoder(config)#

Bases: megatron.core.transformer.module.MegatronModule

Sound encoder wrapper for Bridge that wraps HF transformers’ ParakeetEncoder.

Uses the public ParakeetEncoder from transformers so that Megatron-side parameter names line up 1:1 with the Nemotron-Omni HF checkpoint’s sound_encoder.encoder.* state dict.

The outer config carries fields required by LLaVAModel’s sound interface (sound_model_type, sound_pad_to_clip_duration, sound_batch_split) plus the ParakeetEncoderConfig fields needed to build the inner encoder.

Does NOT include a feature extractor – input is pre-processed mel spectrograms of shape (batch, frames, mel_bins), not raw audio waveforms.

Initialization

__setattr__(name, value)#
set_input_tensor(input_tensor)#

Dummy for pipeline parallel set_input_tensor hook.

forward(sound_clips, sound_length)#