nvidia.dali.fn.decoders.audio#

nvidia.dali.fn.decoders.audio(__input, /, *, bytes_per_sample_hint=[0], downmix=False, dtype=DALIDataType.FLOAT, preserve=False, quality=50.0, sample_rate=0.0, device=None, name=None)#

Decodes waveforms from encoded audio data.

It supports the following audio formats: WAV, FLAC, and OGG (including both OGG Vorbis and OGG Opus).

This operator produces the following outputs:

output[0]: A batch of decoded data
output[1]: A batch of sampling rates [Hz].

Supported backends

‘cpu’

Parameters:

__input¶ (TensorList) – Input to the operator.

Keyword Arguments:

bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.

If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
downmix¶ (bool, optional, default = False) –
If set to True, downmix all input channels to mono.

If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels.
dtype¶ (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Output data type.

Supported types: INT16, INT32, FLOAT.
preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
quality¶ (float, optional, default = 50.0) –
Resampling quality, where 0 is the lowest, and 100 is the highest.

0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.
sample_rate¶ (float or TensorList of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.