nvidia.dali.fn.nonsilent_region

nvidia.dali.fn.nonsilent_region(__input, /, *, bytes_per_sample_hint=[0], cutoff_db=-60.0, preserve=False, reference_power=0.0, reset_interval=8192, seed=-1, window_length=2048, device=None, name=None)

Performs leading and trailing silence detection in an audio buffer.

The operator returns the beginning and length of the non-silent region by comparing the short term power calculated for window_length of the signal with a silence cut-off threshold. The signal is considered to be silent when the short_term_power_db is less than the cutoff_db. where:

short_term_power_db = 10 * log10( short_term_power / reference_power )

Unless specified otherwise, reference_power is the maximum power of the signal.

Inputs and outputs:

  • Input 0 - 1D audio buffer.

  • Output 0 - Index of the first sample in the nonsilent region.

  • Output 1 - Length of nonsilent region.

Note

If Outputs[1] == 0, the value in Outputs[0] is undefined.

Warning

At this moment, the ‘gpu’ backend of this operator is implemented in terms of the ‘cpu’ implementation. This results in a device-to-host copy of the inputs and a host-to-device copy of the outputs. While using the ‘gpu’ implementation of this operator doesn’t add any performance benefit on its own, using it might make sense in order to enable moving preceding operations in the pipeline to the GPU.

Supported backends
  • ‘cpu’

  • ‘gpu’

Parameters:

__input (TensorList) – Input to the operator.

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cutoff_db (float, optional, default = -60.0) – The threshold, in dB, below which the signal is considered silent.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reference_power (float, optional, default = 0.0) –

    The reference power that is used to convert the signal to dB.

    If a value is not provided, the maximum power of the signal will be used as the reference.

  • reset_interval (int, optional, default = 8192) –

    The number of samples after which the moving mean average is recalculated to avoid loss of precision.

    If reset_interval == -1, or the input type allows exact calculation, the average will not be reset. The default value can be used for most of the use cases.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • window_length (int, optional, default = 2048) – Size of the sliding window used to calculate of the short-term power of the signal.