{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Tensorflow DALI plugin: DALI and tf.data\n",
    "\n",
    "### Overview\n",
    "\n",
    "DALI offers integration with [tf.data API](https://www.tensorflow.org/guide/data). Using this approach you can easily connect DALI pipeline with various TensorFlow APIs and use it as a data source for your model. This tutorial shows how to do it using well known [MNIST](http://yann.lecun.com/exdb/mnist/) converted to LMDB format. You can find it in [DALI_extra](https://github.com/NVIDIA/DALI_extra) - DALI test data repository.\n",
    "\n",
    "We start with creating a DALI pipeline to read, decode and normalize MNIST images and read corresponding labels.\n",
    "\n",
    "`DALI_EXTRA_PATH` environment variable should point to the place where data from [DALI extra repository](https://github.com/NVIDIA/DALI_extra) is downloaded. Please make sure that the proper release tag is checked out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from nvidia.dali import pipeline_def, Pipeline\n",
    "import nvidia.dali.fn as fn\n",
    "import nvidia.dali.types as types\n",
    "import os\n",
    "\n",
    "BATCH_SIZE = 64\n",
    "DROPOUT = 0.2\n",
    "IMAGE_SIZE = 28\n",
    "NUM_CLASSES = 10\n",
    "HIDDEN_SIZE = 128\n",
    "EPOCHS = 5\n",
    "ITERATIONS_PER_EPOCH = 100\n",
    "\n",
    "\n",
    "# Path to MNIST dataset\n",
    "data_path = os.path.join(os.environ[\"DALI_EXTRA_PATH\"], \"db/MNIST/training/\")\n",
    "\n",
    "\n",
    "@pipeline_def(device_id=0, batch_size=BATCH_SIZE)\n",
    "def mnist_pipeline(device):\n",
    "    jpegs, labels = fn.readers.caffe2(path=data_path, random_shuffle=True)\n",
    "    images = fn.decoders.image(\n",
    "        jpegs,\n",
    "        device=\"mixed\" if device == \"gpu\" else \"cpu\",\n",
    "        output_type=types.GRAY,\n",
    "    )\n",
    "    images = fn.crop_mirror_normalize(\n",
    "        images,\n",
    "        device=device,\n",
    "        dtype=types.FLOAT,\n",
    "        std=[255.0],\n",
    "        output_layout=\"CHW\",\n",
    "    )\n",
    "\n",
    "    if device == \"gpu\":\n",
    "        labels = labels.gpu()\n",
    "\n",
    "    return images, labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next step is to wrap an instance of `MnistPipeline` with a `DALIDataset` object from DALI TensorFlow plugin. This class is compatible with `tf.data.Dataset`. Other parameters are shapes and types of the outputs of the pipeline. Here we return images and labels. It means we have two outputs one of type `tf.float32` for images and on of type `tf.int32` for labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nvidia.dali.plugin.tf as dali_tf\n",
    "import tensorflow as tf\n",
    "import tensorflow.compat.v1 as tf_v1\n",
    "import logging\n",
    "\n",
    "tf.get_logger().setLevel(logging.ERROR)\n",
    "\n",
    "# Create pipeline\n",
    "pipeline = mnist_pipeline(device=\"cpu\")\n",
    "\n",
    "# Define shapes and types of the outputs\n",
    "shapes = ((BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE), (BATCH_SIZE))\n",
    "dtypes = (tf.float32, tf.int32)\n",
    "\n",
    "# Create dataset\n",
    "with tf.device(\"/cpu:0\"):\n",
    "    mnist_set = dali_tf.DALIDataset(\n",
    "        pipeline=pipeline,\n",
    "        batch_size=BATCH_SIZE,\n",
    "        output_shapes=shapes,\n",
    "        output_dtypes=dtypes,\n",
    "        device_id=0,\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are ready to start the training. Following sections show how to do it with different APIs availible in TensorFlow.\n",
    "\n",
    "### Keras\n",
    "\n",
    "First, we pass `mnist_set` to model created with `tf.keras` and use `model.fit` method to train it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/5\n",
      "100/100 [==============================] - 1s 3ms/step - loss: 1.3511 - accuracy: 0.5834\n",
      "Epoch 2/5\n",
      "100/100 [==============================] - 0s 3ms/step - loss: 0.4534 - accuracy: 0.8690\n",
      "Epoch 3/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.3380 - accuracy: 0.9003\n",
      "Epoch 4/5\n",
      "100/100 [==============================] - 0s 3ms/step - loss: 0.2927 - accuracy: 0.9218\n",
      "Epoch 5/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.2736 - accuracy: 0.9217\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<tensorflow.python.keras.callbacks.History at 0x7f678122cbe0>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create the model\n",
    "model = tf.keras.models.Sequential(\n",
    "    [\n",
    "        tf.keras.layers.Input(shape=(IMAGE_SIZE, IMAGE_SIZE), name=\"images\"),\n",
    "        tf.keras.layers.Flatten(input_shape=(IMAGE_SIZE, IMAGE_SIZE)),\n",
    "        tf.keras.layers.Dense(HIDDEN_SIZE, activation=\"relu\"),\n",
    "        tf.keras.layers.Dropout(DROPOUT),\n",
    "        tf.keras.layers.Dense(NUM_CLASSES, activation=\"softmax\"),\n",
    "    ]\n",
    ")\n",
    "model.compile(\n",
    "    optimizer=\"adam\",\n",
    "    loss=\"sparse_categorical_crossentropy\",\n",
    "    metrics=[\"accuracy\"],\n",
    ")\n",
    "\n",
    "# Train using DALI dataset\n",
    "model.fit(mnist_set, epochs=EPOCHS, steps_per_epoch=ITERATIONS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, it was very easy to integrate DALI pipeline with `tf.keras` API.\n",
    "\n",
    "The code above performed the training using the CPU. Both the DALI pipeline and the model were using the CPU.\n",
    "\n",
    "We can easily move the whole processing to the GPU. First, we create a pipeline that uses the GPU with ID = 0. Next we place both the DALI dataset and the model on the same GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the model and place it on the GPU\n",
    "with tf.device(\"/gpu:0\"):\n",
    "    mnist_set = dali_tf.DALIDataset(\n",
    "        pipeline=mnist_pipeline(device=\"gpu\"),\n",
    "        batch_size=BATCH_SIZE,\n",
    "        output_shapes=shapes,\n",
    "        output_dtypes=dtypes,\n",
    "        device_id=0,\n",
    "    )\n",
    "    model = tf.keras.models.Sequential(\n",
    "        [\n",
    "            tf.keras.layers.Input(\n",
    "                shape=(IMAGE_SIZE, IMAGE_SIZE), name=\"images\"\n",
    "            ),\n",
    "            tf.keras.layers.Flatten(input_shape=(IMAGE_SIZE, IMAGE_SIZE)),\n",
    "            tf.keras.layers.Dense(HIDDEN_SIZE, activation=\"relu\"),\n",
    "            tf.keras.layers.Dropout(DROPOUT),\n",
    "            tf.keras.layers.Dense(NUM_CLASSES, activation=\"softmax\"),\n",
    "        ]\n",
    "    )\n",
    "    model.compile(\n",
    "        optimizer=\"adam\",\n",
    "        loss=\"sparse_categorical_crossentropy\",\n",
    "        metrics=[\"accuracy\"],\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We move the training to the GPU as well. This allows TensorFlow to pick up GPU instance of DALI dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/5\n",
      "100/100 [==============================] - 1s 4ms/step - loss: 1.3734 - accuracy: 0.5844\n",
      "Epoch 2/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.4566 - accuracy: 0.8690\n",
      "Epoch 3/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.3375 - accuracy: 0.8991\n",
      "Epoch 4/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.3017 - accuracy: 0.9156\n",
      "Epoch 5/5\n",
      "100/100 [==============================] - 0s 4ms/step - loss: 0.2925 - accuracy: 0.9167\n"
     ]
    }
   ],
   "source": [
    "# Train on the GPU\n",
    "with tf.device(\"/gpu:0\"):\n",
    "    model.fit(mnist_set, epochs=EPOCHS, steps_per_epoch=ITERATIONS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is important to note here, that there is no intermediate CPU buffer between DALI and TensorFlow in the execution above. DALI GPU outputs are copied straight to TF GPU Tensors used by the model.\n",
    "\n",
    "In this particular toy example performance of the GPU variant is lower than the CPU one. The MNIST images are small and nvJPEG decoder used in the GPU DALI pipeline to decode them is not well suited for such circumstance. We use it here to show how to integrate it properly in the real life case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "### Estimators\n",
    "\n",
    "Another popular TensorFlow API is `tf.estimator` API. This section shows how to use DALI dataset as a data source for model based on this API. \n",
    "\n",
    "First we create the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the feature columns for Estimator\n",
    "feature_columns = [\n",
    "    tf.feature_column.numeric_column(\"images\", shape=[IMAGE_SIZE, IMAGE_SIZE])\n",
    "]\n",
    "\n",
    "# And the run config\n",
    "run_config = tf.estimator.RunConfig(\n",
    "    model_dir=\"/tmp/tensorflow-checkpoints\", device_fn=lambda op: \"/gpu:0\"\n",
    ")\n",
    "\n",
    "# Finally create the model based on `DNNClassifier`\n",
    "model = tf.estimator.DNNClassifier(\n",
    "    feature_columns=feature_columns,\n",
    "    hidden_units=[HIDDEN_SIZE],\n",
    "    n_classes=NUM_CLASSES,\n",
    "    dropout=DROPOUT,\n",
    "    config=run_config,\n",
    "    optimizer=\"Adam\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In `tf.estimator` API data is passed to the model with the function returning the dataset. We define this function to return DALI dataset placed on the GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train_data_fn():\n",
    "    with tf.device(\"/gpu:0\"):\n",
    "        mnist_set = dali_tf.DALIDataset(\n",
    "            fail_on_device_mismatch=False,\n",
    "            pipeline=mnist_pipeline(device=\"gpu\"),\n",
    "            batch_size=BATCH_SIZE,\n",
    "            output_shapes=shapes,\n",
    "            output_dtypes=dtypes,\n",
    "            device_id=0,\n",
    "        )\n",
    "        mnist_set = mnist_set.map(\n",
    "            lambda features, labels: ({\"images\": features}, labels)\n",
    "        )\n",
    "\n",
    "    return mnist_set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With everything set up we are ready to run the training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f677012beb0>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Running the training on the GPU\n",
    "model.train(input_fn=train_data_fn, steps=EPOCHS * ITERATIONS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'accuracy': 0.9915625,\n",
       " 'average_loss': 0.030411616,\n",
       " 'loss': 0.030411616,\n",
       " 'global_step': 5500}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def test_data_fn():\n",
    "    with tf.device(\"/cpu:0\"):\n",
    "        mnist_set = dali_tf.DALIDataset(\n",
    "            fail_on_device_mismatch=False,\n",
    "            pipeline=mnist_pipeline(device=\"cpu\"),\n",
    "            batch_size=BATCH_SIZE,\n",
    "            output_shapes=shapes,\n",
    "            output_dtypes=dtypes,\n",
    "            device_id=0,\n",
    "        )\n",
    "        mnist_set = mnist_set.map(\n",
    "            lambda features, labels: ({\"images\": features}, labels)\n",
    "        )\n",
    "\n",
    "    return mnist_set\n",
    "\n",
    "\n",
    "model.evaluate(input_fn=test_data_fn, steps=ITERATIONS_PER_EPOCH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom Models and Training Loops\n",
    "\n",
    "Finally, the last part of this tutorial focuses on integrating DALI dataset with custom models and training loops. A complete example below shows from start to finish how to use DALI dataset with native TensorFlow model and run training using `tf.Session`.\n",
    "\n",
    "First step is to define the model and the dataset and place both on the GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/awolant/.local/lib/python3.8/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:329: UserWarning: `tf.layers.flatten` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Flatten` instead.\n",
      "  warnings.warn('`tf.layers.flatten` is deprecated and '\n",
      "/home/awolant/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer_v1.py:1693: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.\n",
      "  warnings.warn('`layer.apply` is deprecated and '\n",
      "/home/awolant/.local/lib/python3.8/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:171: UserWarning: `tf.layers.dense` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dense` instead.\n",
      "  warnings.warn('`tf.layers.dense` is deprecated and '\n",
      "/home/awolant/.local/lib/python3.8/site-packages/tensorflow/python/keras/legacy_tf_layers/core.py:268: UserWarning: `tf.layers.dropout` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dropout` instead.\n",
      "  warnings.warn('`tf.layers.dropout` is deprecated and '\n"
     ]
    }
   ],
   "source": [
    "tf.compat.v1.disable_eager_execution()\n",
    "tf_v1.reset_default_graph()\n",
    "\n",
    "with tf.device(\"/gpu:0\"):\n",
    "    mnist_set = dali_tf.DALIDataset(\n",
    "        pipeline=mnist_pipeline(device=\"gpu\"),\n",
    "        batch_size=BATCH_SIZE,\n",
    "        output_shapes=shapes,\n",
    "        output_dtypes=dtypes,\n",
    "        device_id=0,\n",
    "    )\n",
    "\n",
    "    iterator = tf_v1.data.make_initializable_iterator(mnist_set)\n",
    "    images, labels = iterator.get_next()\n",
    "\n",
    "    labels = tf_v1.reshape(\n",
    "        tf_v1.one_hot(labels, NUM_CLASSES), [BATCH_SIZE, NUM_CLASSES]\n",
    "    )\n",
    "\n",
    "    with tf_v1.variable_scope(\"mnist_net\", reuse=False):\n",
    "        images = tf_v1.layers.flatten(images)\n",
    "        images = tf_v1.layers.dense(\n",
    "            images, HIDDEN_SIZE, activation=tf_v1.nn.relu\n",
    "        )\n",
    "        images = tf_v1.layers.dropout(images, rate=DROPOUT, training=True)\n",
    "        images = tf_v1.layers.dense(\n",
    "            images, NUM_CLASSES, activation=tf_v1.nn.softmax\n",
    "        )\n",
    "\n",
    "    logits_train = images\n",
    "    loss_op = tf_v1.reduce_mean(\n",
    "        tf.nn.softmax_cross_entropy_with_logits(\n",
    "            logits=logits_train, labels=labels\n",
    "        )\n",
    "    )\n",
    "    train_step = tf_v1.train.AdamOptimizer().minimize(loss_op)\n",
    "\n",
    "    correct_pred = tf_v1.equal(\n",
    "        tf_v1.argmax(logits_train, 1), tf_v1.argmax(labels, 1)\n",
    "    )\n",
    "    accuracy = tf_v1.reduce_mean(tf_v1.cast(correct_pred, tf_v1.float32))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With `tf.Session` we can run this model and train it on the GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Step 0, accuracy: 0.140625\n",
      "Step 100, accuracy: 0.84375\n",
      "Step 200, accuracy: 0.9375\n",
      "Step 300, accuracy: 0.875\n",
      "Step 400, accuracy: 0.90625\n",
      "Final accuracy:  0.90640625\n"
     ]
    }
   ],
   "source": [
    "with tf_v1.Session() as sess:\n",
    "    sess.run(tf_v1.global_variables_initializer())\n",
    "    sess.run(iterator.initializer)\n",
    "\n",
    "    for i in range(EPOCHS * ITERATIONS_PER_EPOCH):\n",
    "        sess.run(train_step)\n",
    "        if i % ITERATIONS_PER_EPOCH == 0:\n",
    "            train_accuracy = sess.run(accuracy)\n",
    "            print(\"Step %d, accuracy: %g\" % (i, train_accuracy))\n",
    "\n",
    "    final_accuracy = 0\n",
    "    for _ in range(ITERATIONS_PER_EPOCH):\n",
    "        final_accuracy = final_accuracy + sess.run(accuracy)\n",
    "    final_accuracy = final_accuracy / ITERATIONS_PER_EPOCH\n",
    "\n",
    "    print(\"Final accuracy: \", final_accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}