`bridge.models.nemotron_omni.nemotron_omni_utils`#

Module Contents#

Functions#

`load_audio`	Load an audio file and resample to `target_sr` Hz.
`compute_mel_features`	Convert a raw waveform to a mel spectrogram tensor.
`compute_audio_token_count`	Compute the expected number of audio tokens for a waveform.

API#

bridge.models.nemotron_omni.nemotron_omni_utils.load_audio(path: str, target_sr: int = 16000) → numpy.ndarray#

Load an audio file and resample to target_sr Hz.

Supports WAV, MP3, FLAC, and other formats handled by soundfile (with librosa as a fallback for MP3 and other FFmpeg-decoded formats).

Parameters:

path – Path to the audio file.
target_sr – Target sampling rate in Hz.

Returns:

1-D float32 numpy array of the mono waveform at target_sr.

bridge.models.nemotron_omni.nemotron_omni_utils.compute_mel_features( waveform: Union[numpy.ndarray, list], sampling_rate: int = 16000, num_mel_bins: int = 128, ) → torch.Tensor#

Convert a raw waveform to a mel spectrogram tensor.

Uses HF ParakeetFeatureExtractor (from transformers) to produce mel features compatible with BridgeSoundEncoder / ParakeetEncoder.

Parameters:

waveform – 1-D float32 numpy array (or list) of the mono waveform.
sampling_rate – Sampling rate of waveform (must match the extractor).
num_mel_bins – Number of mel frequency bins.

Returns:

Float tensor of shape (frames, num_mel_bins) – a single clip ready to be batched and passed as sound_clips to the model.

bridge.models.nemotron_omni.nemotron_omni_utils.compute_audio_token_count( waveform: Union[numpy.ndarray, list], hop_length: int = 160, subsampling_factor: int = 8, ) → int#

Compute the expected number of audio tokens for a waveform.

Uses the same Conv2D subsampling math as ParakeetEncoder / ParakeetEncoderSubsamplingConv2D: kernel_size=3, stride=2, padding=1, applied log2(subsampling_factor) times to the mel frame count.

Parameters:

waveform – 1-D waveform array (only its length is used).
hop_length – Hop length in samples for mel feature extraction.
subsampling_factor – Subsampling factor of the conformer encoder.

Returns:

Number of audio tokens (at least 1).

bridge.models.nemotron_omni.nemotron_omni_utils#

Module Contents#

Functions#

API#

`bridge.models.nemotron_omni.nemotron_omni_utils`#