{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a custom operator\n",
    "\n",
    "DALI allows you to create a custom operator in C++ and load it at runtime. There are several reasons you might need to write your custom operator, for instance\n",
    "\n",
    "- DALI doesn't support the operation you want to perform and it cannot be expressed by a composition of other operators\n",
    "- You want to write an operator that depends on a third party library\n",
    "- You want to optimize your pipeline by providing a manually fused operation in C++\n",
    "\n",
    "In this tutorial we will walk you through the process of writing, compiling and loading a plugin with a DALI custom operator. For demonstration purposes we will provide a CPU and a GPU implementation for the 'CustomDummy' operator. The implementation will just copy the input data to the output without any modifications.\n",
    "\n",
    "PREREQUISITES:\n",
    "\n",
    "- Knowledge of C++\n",
    "- DALI installed from the binary distribution or compiled from source\n",
    "- Basic knowledge of CMake"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Operator definition\n",
    "\n",
    "First we define the operator in a header file "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#ifndef EXAMPLE_DUMMY_H_\r\n",
      "#define EXAMPLE_DUMMY_H_\r\n",
      "\r\n",
      "#include \"dali/pipeline/operators/operator.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template <typename Backend>\r\n",
      "class Dummy : public ::dali::Operator<Backend> {\r\n",
      " public:\r\n",
      "  inline explicit Dummy(const ::dali::OpSpec &spec) :\r\n",
      "    ::dali::Operator<Backend>(spec) {}\r\n",
      "\r\n",
      "  virtual inline ~Dummy() = default;\r\n",
      "\r\n",
      "  Dummy(const Dummy&) = delete;\r\n",
      "  Dummy& operator=(const Dummy&) = delete;\r\n",
      "  Dummy(Dummy&&) = delete;\r\n",
      "  Dummy& operator=(Dummy&&) = delete;\r\n",
      "\r\n",
      " protected:\r\n",
      "  void RunImpl(::dali::Workspace<Backend> *ws, const int idx) override;\r\n",
      "};\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "#endif  // EXAMPLE_DUMMY_H_\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CPU operator implementation\n",
    "\n",
    "Next, we provide the CPU implementation in a C++ implementation file. We register the schema for the custom operator with DALI_REGISTER_SCHEMA macro and define the CPU version of the operator with DALI_REGISTER_OPERATOR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#include \"dummy.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template<>\r\n",
      "void Dummy<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace *ws, const int idx) {\r\n",
      "  auto &input = ws->Input<::dali::CPUBackend>(idx);\r\n",
      "  auto output = ws->Output<::dali::CPUBackend>(idx);\r\n",
      "  output->set_type(input.type());\r\n",
      "  output->ResizeLike(input);\r\n",
      "\r\n",
      "  ::dali::TypeInfo type = input.type();\r\n",
      "  type.Copy<::dali::CPUBackend, ::dali::CPUBackend>(\r\n",
      "      output->raw_mutable_data(),\r\n",
      "      input.raw_data(), input.size(), 0);\r\n",
      "}\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "DALI_REGISTER_OPERATOR(CustomDummy, ::other_ns::Dummy<::dali::CPUBackend>, ::dali::CPU);\r\n",
      "\r\n",
      "DALI_SCHEMA(CustomDummy)\r\n",
      "  .DocStr(\"Make a copy of the input tensor\")\r\n",
      "  .NumInput(1)\r\n",
      "  .NumOutput(1);\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.cc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GPU operator implementation\n",
    "\n",
    "Similarly, we provide a GPU implementation in a CUDA implementation file and register it with DALI_REGISTER_OPERATOR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#include <cuda_runtime_api.h>\r\n",
      "#include \"dummy.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template<>\r\n",
      "void Dummy<::dali::GPUBackend>::RunImpl(::dali::DeviceWorkspace *ws, const int idx) {\r\n",
      "  auto &input = ws->Input<::dali::GPUBackend>(idx);\r\n",
      "  auto output = ws->Output<::dali::GPUBackend>(idx);\r\n",
      "  output->set_type(input.type());\r\n",
      "  output->ResizeLike(input);\r\n",
      "  CUDA_CALL(cudaMemcpyAsync(\r\n",
      "          output->raw_mutable_data(),\r\n",
      "          input.raw_data(),\r\n",
      "          input.nbytes(),\r\n",
      "          cudaMemcpyDeviceToDevice,\r\n",
      "          ws->stream()));\r\n",
      "}\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "DALI_REGISTER_OPERATOR(CustomDummy, ::other_ns::Dummy<::dali::GPUBackend>, ::dali::GPU);\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.cu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the plugin\n",
    "The last step is to specify the build configuration.\n",
    "\n",
    "We can use nvidia.dali.sysconfig to retrieve the build configuration parameters\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import nvidia.dali.sysconfig as sysconfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python2.7/dist-packages/nvidia/dali/include\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_include_dir())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python2.7/dist-packages/nvidia/dali\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_lib_dir())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['-I/usr/local/lib/python2.7/dist-packages/nvidia/dali/include', '-D_GLIBCXX_USE_CXX11_ABI=0']\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_compile_flags())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['-L/usr/local/lib/python2.7/dist-packages/nvidia/dali', '-ldali']\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_link_flags())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Important:** *Only one version of libdali.so should be loaded in the process at the same time. Therefore, a plugin must be linked against the exact same library in the python package directory of DALI that you intend to use to load your plugin. Due to this limitation, upgrading your DALI version will require you to relink your plugin against the new library.*\n",
    "\n",
    "In this example we used CMake to build the plugin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cmake_minimum_required(VERSION 3.5)\r\n",
      "find_package(CUDA 8.0 REQUIRED)\r\n",
      "\r\n",
      "execute_process(\r\n",
      "        COMMAND python -c \"import nvidia.dali as dali; print(dali.sysconfig.get_lib_dir())\"\r\n",
      "        OUTPUT_VARIABLE DALI_LIB_DIR)\r\n",
      "string(STRIP ${DALI_LIB_DIR} DALI_LIB_DIR)\r\n",
      "\r\n",
      "execute_process(\r\n",
      "        COMMAND python -c \"import nvidia.dali as dali; print(\\\" \\\".join(dali.sysconfig.get_compile_flags()))\"\r\n",
      "        OUTPUT_VARIABLE DALI_COMPILE_FLAGS)\r\n",
      "string(STRIP ${DALI_COMPILE_FLAGS} DALI_COMPILE_FLAGS)\r\n",
      "\r\n",
      "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -std=c++11 ${DALI_COMPILE_FLAGS} \")\r\n",
      "link_directories( \"${DALI_LIB_DIR}\" )\r\n",
      "\r\n",
      "cuda_add_library(customdummy SHARED dummy.cc dummy.cu )\r\n",
      "target_link_libraries(customdummy dali)\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/CMakeLists.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to compile our plugin containing the custom operator 'CustomDummy'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-- The C compiler identification is GNU 5.4.0\n",
      "-- The CXX compiler identification is GNU 5.4.0\n",
      "-- Check for working C compiler: /usr/bin/cc\n",
      "-- Check for working C compiler: /usr/bin/cc -- works\n",
      "-- Detecting C compiler ABI info\n",
      "-- Detecting C compiler ABI info - done\n",
      "-- Detecting C compile features\n",
      "-- Detecting C compile features - done\n",
      "-- Check for working CXX compiler: /usr/bin/c++\n",
      "-- Check for working CXX compiler: /usr/bin/c++ -- works\n",
      "-- Detecting CXX compiler ABI info\n",
      "-- Detecting CXX compiler ABI info - done\n",
      "-- Detecting CXX compile features\n",
      "-- Detecting CXX compile features - done\n",
      "-- Looking for pthread.h\n",
      "-- Looking for pthread.h - found\n",
      "-- Looking for pthread_create\n",
      "-- Looking for pthread_create - not found\n",
      "-- Looking for pthread_create in pthreads\n",
      "-- Looking for pthread_create in pthreads - not found\n",
      "-- Looking for pthread_create in pthread\n",
      "-- Looking for pthread_create in pthread - found\n",
      "-- Found Threads: TRUE  \n",
      "-- Found CUDA: /usr/local/cuda (found suitable version \"9.0\", minimum required is \"8.0\") \n",
      "-- Configuring done\n",
      "-- Generating done\n",
      "-- Build files have been written to: /home/dali/git/dali/docs/examples/extend/customdummy/build\n",
      "[ 33%] \u001b[34m\u001b[1mBuilding NVCC (Device) object CMakeFiles/customdummy.dir/customdummy_generated_dummy.cu.o\u001b[0m\n",
      "\u001b[35m\u001b[1mScanning dependencies of target customdummy\u001b[0m\n",
      "[ 66%] \u001b[32mBuilding CXX object CMakeFiles/customdummy.dir/dummy.cc.o\u001b[0m\n",
      "[100%] \u001b[32m\u001b[1mLinking CXX shared library libcustomdummy.so\u001b[0m\n",
      "[100%] Built target customdummy\n"
     ]
    }
   ],
   "source": [
    "! rm -rf customdummy/build\n",
    "! mkdir -p customdummy/build\n",
    "! cd customdummy/build && \\\n",
    "  cmake .. && \\\n",
    "  make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the build steps we should have a dynamic library file created and ready to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "customdummy/build/libcustomdummy.so\r\n"
     ]
    }
   ],
   "source": [
    "! ls customdummy/build/*.so"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing the plugin\n",
    "\n",
    "First we can see that there is no such plugin named 'CustomDummy'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error: 'module' object has no attribute 'CustomDummy'\n"
     ]
    }
   ],
   "source": [
    "import nvidia.dali.ops as ops\n",
    "try:\n",
    "    help(ops.CustomDummy)\n",
    "except Exception as e:\n",
    "    print('Error: ' + str(e))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now load the plugin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nvidia.dali.plugin_manager as plugin_manager\n",
    "plugin_manager.load_library('./customdummy/build/libcustomdummy.so')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and verify that the new operator is available"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on class CustomDummy in module nvidia.dali.ops:\n",
      "\n",
      "class CustomDummy(__builtin__.object)\n",
      " |  This is 'CPU', 'GPU' operator\n",
      " |  \n",
      " |  Make a copy of the input tensor\n",
      " |  \n",
      " |  Parameters\n",
      " |  ----------\n",
      " |  \n",
      " |  Methods defined here:\n",
      " |  \n",
      " |  __call__(self, *inputs, **kwargs)\n",
      " |  \n",
      " |  __init__(self, **kwargs)\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Data descriptors defined here:\n",
      " |  \n",
      " |  __dict__\n",
      " |      dictionary for instance variables (if defined)\n",
      " |  \n",
      " |  __weakref__\n",
      " |      list of weak references to the object (if defined)\n",
      " |  \n",
      " |  device\n",
      " |  \n",
      " |  schema\n",
      " |  \n",
      " |  spec\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(ops.CustomDummy)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}