nvidia.dali.fn.mel_filter_bank¶
-
nvidia.dali.fn.
mel_filter_bank
(*inputs, **kwargs)¶ Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.
The frequency (‘f’) dimension is selected from the input layout. In case of no layout, “f”, “ft”, or “*ft” is assumed, depending on the number of dimensions.
- Supported backends
‘cpu’
‘gpu’
- Parameters
input (TensorList) – Input to the operator.
- Keyword Arguments
bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
freq_high (float, optional, default = 0.0) –
The maximum frequency.
If this value is not provided,
sample_rate/2
is used.freq_low (float, optional, default = 0.0) – The minimum frequency.
mel_formula (str, optional, default = ‘slaney’) –
Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.
The mel scale is a perceptual scale of pitches, so there is no single formula.
The supported values are:
slaney
, which follows Slaney’s MATLAB Auditory Modelling Work behavior.This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.htk
, which follows O’Shaughnessy’s book formula,m = 2595 * log10(1 + (f/700))
.This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).
nfilter (int, optional, default = 128) – Number of mel filters.
normalize (bool, optional, default = True) –
Determines whether to normalize the triangular filter weights by the width of their frequency bands.
If set to True, the integral of the filter function is 1.
If set to False, the peak of the filter function will be 1.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.