{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Audio Decoder in DALI\n", "\n", "This tutorial presents, how to set up a simple pipeline, that loads and decodes audio data using DALI. We will use a simple example from Speech Commands Data Set. While this dataset consists of samples in .wav format, the following procedure can be used for most of the well-known digital audio coding formats as well.\n", "\n", "## Step-by-step guide\n", "1. Let's start by importing DALI and a handful of utils." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from nvidia.dali.pipeline import Pipeline\n", "import nvidia.dali.ops as ops \n", "import nvidia.dali.types as types\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "batch_size = 1\n", "audio_files = \"../data/audio\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "used `batch_size` is `1`, to keep things simple.\n", "\n", "2. Next, let's implement the pipeline. Firstly, we need to load data from disk (or any other source). FileReader is able to load data, as well as it's labels. For more information, refer to FileReader docs. Furthermore, similarly to image data, you can use Reader operators that are specific for a given dataset or a dataset format (see [CaffeReader](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/supported_ops.html#nvidia.dali.ops.CaffeReader)). After loading the input data, the pipeline decodes the audio data. As stated above, the AudioDecoder operator is able to decode most of the well-known audio formats.\n", " \n", " Note: Please remember that you shall pass proper data type (argument `dtype`) to the operator. Supported data types can be found in the documentation. If you have 24-bit audio data and you set `dtype=INT16`, it will result in loosing some information from the samples. The default `dtype` for this operator is `INT16`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class AudioDecoderExample(Pipeline): \n", " def __init__(self, batch_size, num_threads, device_id):\n", " super(AudioDecoderExample, self).__init__(batch_size, num_threads, device_id)\n", " self.input = ops.FileReader(device=\"cpu\", file_root=audio_files)\n", " self.decode = ops.AudioDecoder(device=\"cpu\", dtype=types.INT16)\n", "\n", " def define_graph(self): \n", " read, _ = self.input()\n", " audio, rate = self.decode(read)\n", " return audio, rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Now let's just build and run the pipeline." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "pipecpu = AudioDecoderExample(batch_size=batch_size, num_threads=1, device_id=0)\n", "pipecpu.build() \n", "cpu_output = pipecpu.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Outputs from `AudioDecoder` consist of a tensor with the decoded data, as well as some metadata (e.g. sampling rate). To access them just check another output. On top of that, AudioDecoder returns data in interleaved format, so we need to reshape the output tensor, to properly display it. Here's how to do that:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sampling rate: 16000 [Hz]\n", "Audio data: [[ -5]\n", " [ -95]\n", " [-156]\n", " ...\n", " [ 116]\n", " [ 102]\n", " [ 82]]\n", "Audio data flattened: [ -5 -95 -156 ... 116 102 82]\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "audio_data = cpu_output[0].at(0)\n", "sampling_rate = cpu_output[1].at(0)[0]\n", "print(\"Sampling rate:\", sampling_rate, \"[Hz]\")\n", "print(\"Audio data:\", audio_data)\n", "audio_data = audio_data.flatten()\n", "print(\"Audio data flattened:\", audio_data)\n", "plt.plot(audio_data)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Verification\n", "\n", "Let's verify, that the AudioDecoder actually works. The presented method can also come in handy for debugging DALI pipeline, in case something doesn't go as planned. \n", "\n", "We will use external tool to decode used data and compare the results against data decoded by DALI.\n", "\n", "### Important!\n", "\n", "Following snippet installs the external dependency (`simpleaudio`). In case you already have it, or don't want to install it, you might want to stop here and not run this one." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "!{sys.executable} -m pip install simpleaudio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is the side-by-side comparision of decoded data. If you have the `simpleaudio` module installed, you can run the snippet and see it for yourself." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "src: simpleaudio\n", "shape: (16000,)\n", "data: [ -5 -95 -156 ... 116 102 82]\n", "\n", "\n", "src: DALI\n", "shape: (16000,)\n", "data: [ -5 -95 -156 ... 116 102 82]\n", "\n", "Are the arrays equal? YES\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import simpleaudio as sa\n", "\n", "wav = sa.WaveObject.from_wave_file(\"../data/audio/wav/three.wav\")\n", "three_audio = np.frombuffer(wav.audio_data, dtype=np.int16)\n", "\n", "print(\"src: simpleaudio\")\n", "print(\"shape: \", three_audio.shape)\n", "print(\"data: \", three_audio)\n", "print(\"\\n\")\n", "print(\"src: DALI\")\n", "print(\"shape: \", audio_data.shape)\n", "print(\"data: \", audio_data)\n", "print(\"\\nAre the arrays equal?\", \"YES\" if np.all(audio_data == three_audio) else \"NO\")\n", "\n", "fig, ax = plt.subplots(1,2)\n", "ax[0].plot(three_audio)\n", "ax[1].plot(audio_data)\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }