Sound Source Localization
The Sound Source Localization feature analyzes the audio packets and computes the direction of the dominant sound source with respect to the reference axis. The direction is computed as an azimuth angle measured as radians in counter-clockwise direction from the reference axis. This feature supports only circular microphone arrays with at least 4 microphones in this release.
This feature is available as Sound Source Localization codelet.
The duration of audio in seconds for computing sound source localization. This duration should be an integer multiple of the input audio duration.
The distance in meters between two diagonally opposite microphones which form one pair in the microphone_pairs list. Eg., for a Respeaker 4 mic array, the microphone distance is 0.064 m.
List of pairs of diagonally opposite microphone raw audio channel indices. The pairs should be provided with microphones in anti-clockwise direction. Eg., microphone_pairs: [[1,3], [2,4]] which correspond to microphones 1-4 that are arranged in counter-clockwise direction.
This is the angle (in degrees) from the reference axis to the first microphone pair measured in counter-clockwise direction. Only integer values are accepted for offset angle.
The computed azimuth angle (in radians) is converted to degrees. This converted angle in degrees
is available in Sight as
The Sound source localization sample application demonstrates both sound source localization and audio energy calculation features. This application contains four codelets:
Audio Capture codelet: Captures multi-channel audio data from microphone array.
Sound Source Localization codelet: This codelet computes the angle in radians for the direction of the dominant sound source in every 100 milliseconds audio packet. A dominant sound is one which spans the frequencies with higher energy when captured by the microphone array. This codelet receives audio packets from the Audio Capture codelet.
Audio Energy Calculation codelet: Computes average energy of the raw audio channels in the audio packets captured from the microphone array. This codelet receives audio packets from the Audio Capture codelet.
Direction Of Audio Event codelet: This is custom codelet which receives the azimuth angles of dominant sound sources from the Sound Source Localization codelet and the audio energies of the audio packets from the Audio Energy Calculation codelet. Both the input messages are synchronized with timestamps. The azimuth angles of direction of dominant sound sources where the energy of the corresponding audio packets is higher than the configured
energy_thresholdare plotted in Sight.
This application requires a microphone array connected to the host/device and set as default audio
input device in the system settings. The capture volume of the microphone should be set to 100%.
The specifications of the connected microphone array should be used to configure the audio capture
sample_rate), the sound source localization component
microphone_pairs) and the audio energy calculation
The application is configured for a ReSpeaker 4-mic array v2.0 with
0.064 meters and
reference_energy as 120 dB. The
configured for the reference axis as shown in the image for
the ReSpeaker 4-mic array.
energy_threshold configuration parameter of the direction of audio event component
defines the volume threshold of the audio packets for publishing angle. This is measured in dB and
is configured to 80 dB. The appropriate value for this parameter can be determined by observing
average_energy_per_audio_packet value in the Audio Energy Calculator window in Websight
(as shown in the image below). The threshold value can be set to the
highest value of this plot with ambient noise present in the environment.
The application can be launched with following command:
bob@desktop:~/isaac$ bazel run apps/samples/sound_source_localization
The application plots the angle in degrees and the average energy in dB for every audio packet
published by the audio capture component, and the angle of direction of audio event which is the
angle in degrees for those audio packets whose energy is above the threshold. These plots are
accessible in the Sight UI at
http://localhost:3000 for the desktop or
http://ROBOTIP:3000 for Jetson.
Platforms: Desktop, Jetson TX/2, Jetson Xavier, Jetson Nano
Hardware: Requires a ReSpeaker 4 microphone array with the default configuration. Any circular microphone array with at least 4 channels can be used by updating the configuration accordingly.