Sound Source Localization

The Sound Source Localization feature analyzes the audio packets and computes the direction of the dominant sound source with respect to the reference axis. The direction is computed as an azimuth angle measured as radians in counter-clockwise direction from the reference axis. This feature supports only circular microphone arrays with at least 4 microphones in this release.

This feature is available as Sound Source Localization codelet.

Parameter

Description

audio_duration

The duration of audio in seconds for computing sound source localization. This duration should be an integer multiple of the input audio duration.

microphone_distance

The distance in meters between two diagonally opposite microphones which form one pair in the microphone_pairs list. Eg., for a Respeaker 4 mic array, the microphone distance is 0.064 m.

microphone_pairs

List of pairs of diagonally opposite microphone raw audio channel indices. The pairs should be provided with microphones in anti-clockwise direction. Eg., microphone_pairs: [[1,3], [2,4]] which correspond to microphones 1-4 that are arranged in counter-clockwise direction.

reference_offset_angle

This is the angle (in degrees) from the reference axis to the first microphone pair measured in counter-clockwise direction. Only integer values are accepted for offset angle.

Message

Proto Type

Name

Input

AudioDataProto

audio_packets

Output

SourceAngleState

audio_angle

The computed azimuth angle (in radians) is converted to degrees. This converted angle in degrees is available in Sight as angle.

The Sound source localization sample application demonstrates both sound source localization and audio energy calculation features. This application contains four codelets:

  1. Audio Capture codelet: Captures multi-channel audio data from microphone array.

  2. Sound Source Localization codelet: This codelet computes the angle in radians for the direction of the dominant sound source in every 100 milliseconds audio packet. A dominant sound is one which spans the frequencies with higher energy when captured by the microphone array. This codelet receives audio packets from the Audio Capture codelet.

  3. Audio Energy Calculation codelet: Computes average energy of the raw audio channels in the audio packets captured from the microphone array. This codelet receives audio packets from the Audio Capture codelet.

  4. Direction Of Audio Event codelet: This is custom codelet which receives the azimuth angles of dominant sound sources from the Sound Source Localization codelet and the audio energies of the audio packets from the Audio Energy Calculation codelet. Both the input messages are synchronized with timestamps. The azimuth angles of direction of dominant sound sources where the energy of the corresponding audio packets is higher than the configured energy_threshold are plotted in Sight.

This application requires a microphone array connected to the host/device and set as default audio input device in the system settings. The capture volume of the microphone should be set to 100%. The specifications of the connected microphone array should be used to configure the audio capture component (num_channels and sample_rate), the sound source localization component (microphone_distance and microphone_pairs) and the audio energy calculation component (channel_indices and reference_energy).

The application is configured for a ReSpeaker 4-mic array v2.0 with microphone_distance as 0.064 meters and reference_energy as 120 dB. The reference_offset_angle is configured for the reference axis as shown in the image for the ReSpeaker 4-mic array.

respeaker_ssl_reference.png

The energy_threshold configuration parameter of the direction of audio event component defines the volume threshold of the audio packets for publishing angle. This is measured in dB and is configured to 80 dB. The appropriate value for this parameter can be determined by observing the average_energy_per_audio_packet value in the Audio Energy Calculator window in Websight (as shown in the image below). The threshold value can be set to the highest value of this plot with ambient noise present in the environment.

audio_energy.png

The application can be launched with following command:

Copy
Copied!
            

bob@desktop:~/isaac$ bazel run apps/samples/sound_source_localization

The application plots the angle in degrees and the average energy in dB for every audio packet published by the audio capture component, and the angle of direction of audio event which is the angle in degrees for those audio packets whose energy is above the threshold. These plots are accessible in the Sight UI at http://localhost:3000 for the desktop or http://ROBOTIP:3000 for Jetson.

Platforms: Desktop, Jetson TX/2, Jetson Xavier, Jetson Nano

Hardware: Requires a ReSpeaker 4 microphone array with the default configuration. Any circular microphone array with at least 4 channels can be used by updating the configuration accordingly.

© Copyright 2018-2020, NVIDIA Corporation. Last updated on Feb 1, 2023.