nvidia.dali.fn.mel_filter_bank#

nvidia.dali.fn.mel_filter_bank(__input, /, *, bytes_per_sample_hint=[0], freq_high=0.0, freq_low=0.0, mel_formula='slaney', nfilter=128, normalize=True, preserve=False, sample_rate=44100.0, device=None, name=None)#

Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.

The frequency (‘f’) dimension is selected from the input layout. In case of no layout, “f”, “ft”, or “*ft” is assumed, depending on the number of dimensions.

Supported backends

‘cpu’
‘gpu’

Parameters:

__input¶ (TensorList) – Input to the operator.

Keyword Arguments:

bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.

If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
freq_high¶ (float, optional, default = 0.0) –
The maximum frequency.

If this value is not provided, sample_rate/2 is used.
freq_low¶ (float, optional, default = 0.0) – The minimum frequency.
mel_formula¶ (str, optional, default = ‘slaney’) –
Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.

The mel scale is a perceptual scale of pitches, so there is no single formula.

The supported values are:
- slaney, which follows Slaney’s MATLAB Auditory Modelling Work behavior.
  
  This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.
- htk, which follows O’Shaughnessy’s book formula, m = 2595 * log10(1 + (f/700)).
  
  This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).
nfilter¶ (int, optional, default = 128) – Number of mel filters.
normalize¶ (bool, optional, default = True) –
Determines whether to normalize the triangular filter weights by the width of their frequency bands.
- If set to True, the integral of the filter function is 1.
- If set to False, the peak of the filter function will be 1.
preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
sample_rate¶ (float, optional, default = 44100.0) – Sampling rate of the audio signal.