nvidia.dali.experimental.dynamic.mel_filter_bank#

nvidia.dali.experimental.dynamic.mel_filter_bank(input, /, *, batch_size=None, device=None, freq_high=None, freq_low=None, mel_formula=None, nfilter=None, normalize=None, sample_rate=None)#

Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.

The frequency (‘f’) dimension is selected from the input layout. In case of no layout, “f”, “ft”, or “*ft” is assumed, depending on the number of dimensions.

Supported backends
  • ‘cpu’

  • ‘gpu’

Parameters:

input (Tensor/Batch) – Input to the operator.

Keyword Arguments:
  • freq_high (float, optional, default = 0.0) –

    The maximum frequency.

    If this value is not provided, sample_rate/2 is used.

  • freq_low (float, optional, default = 0.0) – The minimum frequency.

  • mel_formula (str, optional, default = ‘slaney’) –

    Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.

    The mel scale is a perceptual scale of pitches, so there is no single formula.

    The supported values are:

    • slaney, which follows Slaney’s MATLAB Auditory Modelling Work behavior.
      This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.
    • htk, which follows O’Shaughnessy’s book formula, m = 2595 * log10(1 + (f/700)).
      This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).

  • nfilter (int, optional, default = 128) – Number of mel filters.

  • normalize (bool, optional, default = True) –

    Determines whether to normalize the triangular filter weights by the width of their frequency bands.

    • If set to True, the integral of the filter function is 1.

    • If set to False, the peak of the filter function will be 1.

  • sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal.