# nvidia.dali.fn.mel_filter_bank¶

nvidia.dali.fn.mel_filter_bank(*inputs, **kwargs)

Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.

The frequency (‘f’) dimension is selected from the input layout. In case of no layout, “f”, “ft”, or “*ft” is assumed, depending on the number of dimensions.

Supported backends
• ‘cpu’

• ‘gpu’

Parameters

input (TensorList) – Input to the operator.

Keyword Arguments
• bytes_per_sample_hint (int or list of int, optional, default = [0]) –

Output size hint, in bytes per sample.

If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

• freq_high (float, optional, default = 0.0) –

The maximum frequency.

If this value is not provided, sample_rate/2 is used.

• freq_low (float, optional, default = 0.0) – The minimum frequency.

• mel_formula (str, optional, default = ‘slaney’) –

Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.

The mel scale is a perceptual scale of pitches, so there is no single formula.

The supported values are:

• slaney, which follows Slaney’s MATLAB Auditory Modelling Work behavior.
This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.
• htk, which follows O’Shaughnessy’s book formula, m = 2595 * log10(1 + (f/700)).
This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).

• nfilter (int, optional, default = 128) – Number of mel filters.

• normalize (bool, optional, default = True) –

Determines whether to normalize the triangular filter weights by the width of their frequency bands.

• If set to True, the integral of the filter function is 1.

• If set to False, the peak of the filter function will be 1.

• preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

• sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal.

• seed (int, optional, default = -1) –

Random seed.

If not provided, it will be populated based on the global seed of the pipeline.