nvidia.dali.fn.decoders.audio

nvidia.dali.fn.decoders.audio(__input, /, *, bytes_per_sample_hint=[0], downmix=False, dtype=DALIDataType.FLOAT, preserve=False, quality=50.0, sample_rate=0.0, seed=-1, device=None, name=None)

Decodes waveforms from encoded audio data.

It supports the following audio formats: wav, flac and ogg. This operator produces the following outputs:

  • output[0]: A batch of decoded data

  • output[1]: A batch of sampling rates [Hz].

Supported backends
  • ‘cpu’

Parameters:

__input (TensorList) – Input to the operator.

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • downmix (bool, optional, default = False) –

    If set to True, downmix all input channels to mono.

    If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Output data type.

    Supported types: INT16, INT32, FLOAT.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • quality (float, optional, default = 50.0) –

    Resampling quality, where 0 is the lowest, and 100 is the highest.

    0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.

  • sample_rate (float or TensorList of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.