{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a custom operator\n",
    "\n",
    "DALI allows you to create a custom operator in C++ and load it at runtime. There are several reasons you might need to write your custom operator, for instance\n",
    "\n",
    "- DALI doesn't support the operation you want to perform and it cannot be expressed by a composition of other operators\n",
    "- You want to write an operator that depends on a third party library\n",
    "- You want to optimize your pipeline by providing a manually fused operation in C++\n",
    "\n",
    "In this tutorial we will walk you through the process of writing, compiling and loading a plugin with a DALI custom operator. For demonstration purposes we will provide a CPU and a GPU implementation for the 'CustomDummy' operator. The implementation will just copy the input data to the output without any modifications.\n",
    "\n",
    "PREREQUISITES:\n",
    "\n",
    "- Knowledge of C++\n",
    "- DALI installed from the binary distribution or compiled from source\n",
    "- Basic knowledge of CMake"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Operator definition\n",
    "\n",
    "First we define the operator in a header file. We will create an Operator that copies its only input to the output. The implementation of `CanInferOutputs` and `SetupImpl` can be shared across backends. `SetupImpl` provides shape and type description of the output based on the input and `CanInferOutputs` informs the executor that the Operator can provide that output description for the whole batch before executing `RunImpl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#ifndef EXAMPLE_DUMMY_H_\r\n",
      "#define EXAMPLE_DUMMY_H_\r\n",
      "\r\n",
      "#include <vector>\r\n",
      "\r\n",
      "#include \"dali/pipeline/operator/operator.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template <typename Backend>\r\n",
      "class Dummy : public ::dali::Operator<Backend> {\r\n",
      " public:\r\n",
      "  inline explicit Dummy(const ::dali::OpSpec &spec) :\r\n",
      "    ::dali::Operator<Backend>(spec) {}\r\n",
      "\r\n",
      "  virtual inline ~Dummy() = default;\r\n",
      "\r\n",
      "  Dummy(const Dummy&) = delete;\r\n",
      "  Dummy& operator=(const Dummy&) = delete;\r\n",
      "  Dummy(Dummy&&) = delete;\r\n",
      "  Dummy& operator=(Dummy&&) = delete;\r\n",
      "\r\n",
      " protected:\r\n",
      "  bool CanInferOutputs() const override {\r\n",
      "    return true;\r\n",
      "  }\r\n",
      "\r\n",
      "  bool SetupImpl(std::vector<::dali::OutputDesc> &output_desc,\r\n",
      "                 const ::dali::workspace_t<Backend> &ws) override {\r\n",
      "    const auto &input = ws.template InputRef<Backend>(0);\r\n",
      "    output_desc.resize(1);\r\n",
      "    output_desc[0] = {input.shape(), input.type()};\r\n",
      "    return true;\r\n",
      "  }\r\n",
      "\r\n",
      "  void RunImpl(::dali::Workspace<Backend> &ws) override;\r\n",
      "};\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "#endif  // EXAMPLE_DUMMY_H_\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CPU operator implementation\n",
    "\n",
    "Next, we provide the CPU implementation in a C++ implementation file. We register the schema for the custom operator with DALI_REGISTER_SCHEMA macro and define the CPU version of the operator with DALI_REGISTER_OPERATOR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#include \"dummy.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template<>\r\n",
      "void Dummy<::dali::CPUBackend>::RunImpl(::dali::SampleWorkspace &ws) {\r\n",
      "  const auto &input = ws.Input<::dali::CPUBackend>(0);\r\n",
      "  auto &output = ws.Output<::dali::CPUBackend>(0);\r\n",
      "\r\n",
      "  ::dali::TypeInfo type = input.type();\r\n",
      "  type.Copy<::dali::CPUBackend, ::dali::CPUBackend>(\r\n",
      "      output.raw_mutable_data(),\r\n",
      "      input.raw_data(), input.size(), 0);\r\n",
      "}\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "DALI_REGISTER_OPERATOR(CustomDummy, ::other_ns::Dummy<::dali::CPUBackend>, ::dali::CPU);\r\n",
      "\r\n",
      "DALI_SCHEMA(CustomDummy)\r\n",
      "  .DocStr(\"Make a copy of the input tensor\")\r\n",
      "  .NumInput(1)\r\n",
      "  .NumOutput(1);\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.cc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GPU operator implementation\n",
    "\n",
    "Similarly, we provide a GPU implementation in a CUDA implementation file and register it with DALI_REGISTER_OPERATOR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#include <cuda_runtime_api.h>\r\n",
      "#include \"dummy.h\"\r\n",
      "\r\n",
      "namespace other_ns {\r\n",
      "\r\n",
      "template<>\r\n",
      "void Dummy<::dali::GPUBackend>::RunImpl(::dali::DeviceWorkspace &ws) {\r\n",
      "  const auto &input = ws.Input<::dali::GPUBackend>(0);\r\n",
      "  auto &output = ws.Output<::dali::GPUBackend>(0);\r\n",
      "  CUDA_CALL(cudaMemcpyAsync(\r\n",
      "          output.raw_mutable_data(),\r\n",
      "          input.raw_data(),\r\n",
      "          input.nbytes(),\r\n",
      "          cudaMemcpyDeviceToDevice,\r\n",
      "          ws.stream()));\r\n",
      "}\r\n",
      "\r\n",
      "}  // namespace other_ns\r\n",
      "\r\n",
      "DALI_REGISTER_OPERATOR(CustomDummy, ::other_ns::Dummy<::dali::GPUBackend>, ::dali::GPU);\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/dummy.cu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the plugin\n",
    "The last step is to specify the build configuration.\n",
    "\n",
    "We can use nvidia.dali.sysconfig to retrieve the build configuration parameters\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import nvidia.dali.sysconfig as sysconfig"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.6/dist-packages/nvidia/dali/include\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_include_dir())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.6/dist-packages/nvidia/dali\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_lib_dir())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['-I/usr/local/lib/python3.6/dist-packages/nvidia/dali/include', '-D_GLIBCXX_USE_CXX11_ABI=1']\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_compile_flags())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['-L/usr/local/lib/python3.6/dist-packages/nvidia/dali', '-ldali']\n"
     ]
    }
   ],
   "source": [
    "print(sysconfig.get_link_flags())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Important:** *Only one version of libdali.so should be loaded in the process at the same time. Therefore, a plugin must be linked against the exact same library in the python package directory of DALI that you intend to use to load your plugin. Due to this limitation, upgrading your DALI version will require you to relink your plugin against the new library.*\n",
    "\n",
    "In this example we used CMake to build the plugin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cmake_minimum_required(VERSION 3.5)\r\n",
      "find_package(CUDA 8.0 REQUIRED)\r\n",
      "\r\n",
      "execute_process(\r\n",
      "        COMMAND python -c \"import nvidia.dali as dali; print(dali.sysconfig.get_lib_dir())\"\r\n",
      "        OUTPUT_VARIABLE DALI_LIB_DIR)\r\n",
      "string(STRIP ${DALI_LIB_DIR} DALI_LIB_DIR)\r\n",
      "\r\n",
      "execute_process(\r\n",
      "        COMMAND python -c \"import nvidia.dali as dali; print(\\\" \\\".join(dali.sysconfig.get_compile_flags()))\"\r\n",
      "        OUTPUT_VARIABLE DALI_COMPILE_FLAGS)\r\n",
      "string(STRIP ${DALI_COMPILE_FLAGS} DALI_COMPILE_FLAGS)\r\n",
      "\r\n",
      "set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS} -std=c++14 ${DALI_COMPILE_FLAGS} \")\r\n",
      "set(CUDA_NVCC_FLAGS \"${CUDA_NVCC_FLAGS} -std=c++14 ${DALI_COMPILE_FLAGS} \")\r\n",
      "link_directories( \"${DALI_LIB_DIR}\" )\r\n",
      "\r\n",
      "cuda_add_library(customdummy SHARED dummy.cc dummy.cu )\r\n",
      "target_link_libraries(customdummy dali)\r\n"
     ]
    }
   ],
   "source": [
    "! cat customdummy/CMakeLists.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to compile our plugin containing the custom operator 'CustomDummy'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-- The C compiler identification is GNU 7.4.0\n",
      "-- The CXX compiler identification is GNU 7.4.0\n",
      "-- Check for working C compiler: /usr/bin/cc\n",
      "-- Check for working C compiler: /usr/bin/cc -- works\n",
      "-- Detecting C compiler ABI info\n",
      "-- Detecting C compiler ABI info - done\n",
      "-- Detecting C compile features\n",
      "-- Detecting C compile features - done\n",
      "-- Check for working CXX compiler: /usr/bin/c++\n",
      "-- Check for working CXX compiler: /usr/bin/c++ -- works\n",
      "-- Detecting CXX compiler ABI info\n",
      "-- Detecting CXX compiler ABI info - done\n",
      "-- Detecting CXX compile features\n",
      "-- Detecting CXX compile features - done\n",
      "-- Looking for pthread.h\n",
      "-- Looking for pthread.h - found\n",
      "-- Looking for pthread_create\n",
      "-- Looking for pthread_create - not found\n",
      "-- Looking for pthread_create in pthreads\n",
      "-- Looking for pthread_create in pthreads - not found\n",
      "-- Looking for pthread_create in pthread\n",
      "-- Looking for pthread_create in pthread - found\n",
      "-- Found Threads: TRUE  \n",
      "-- Found CUDA: /usr/local/cuda (found suitable version \"10.0\", minimum required is \"8.0\") \n",
      "-- Configuring done\n",
      "-- Generating done\n",
      "-- Build files have been written to: /home/git/dali/docs/examples/custom_operations/custom_operator/customdummy/build\n",
      "[ 33%] \u001b[34m\u001b[1mBuilding NVCC (Device) object CMakeFiles/customdummy.dir/customdummy_generated_dummy.cu.o\u001b[0m\n",
      "\u001b[35m\u001b[1mScanning dependencies of target customdummy\u001b[0m\n",
      "[ 66%] \u001b[32mBuilding CXX object CMakeFiles/customdummy.dir/dummy.cc.o\u001b[0m\n",
      "[100%] \u001b[32m\u001b[1mLinking CXX shared library libcustomdummy.so\u001b[0m\n",
      "[100%] Built target customdummy\n"
     ]
    }
   ],
   "source": [
    "! rm -rf customdummy/build\n",
    "! mkdir -p customdummy/build\n",
    "! cd customdummy/build && \\\n",
    "  cmake .. && \\\n",
    "  make"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the build steps we should have a dynamic library file created and ready to use."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "customdummy/build/libcustomdummy.so\r\n"
     ]
    }
   ],
   "source": [
    "! ls customdummy/build/*.so"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing the plugin\n",
    "\n",
    "First we can see that there is no such plugin named 'CustomDummy'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Error: module 'nvidia.dali.ops' has no attribute 'CustomDummy'\n"
     ]
    }
   ],
   "source": [
    "import nvidia.dali.ops as ops\n",
    "try:\n",
    "    help(ops.CustomDummy)\n",
    "except Exception as e:\n",
    "    print('Error: ' + str(e))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now load the plugin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import nvidia.dali.plugin_manager as plugin_manager\n",
    "plugin_manager.load_library('./customdummy/build/libcustomdummy.so')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and verify that the new operator is available"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on class CustomDummy in module nvidia.dali.ops:\n",
      "\n",
      "class CustomDummy(builtins.object)\n",
      " |  Make a copy of the input tensor\n",
      " |  \n",
      " |  Supported backends\n",
      " |   * 'cpu'\n",
      " |   * 'gpu'\n",
      " |  \n",
      " |  \n",
      " |  Keyword args\n",
      " |  ------------\n",
      " |  `bytes_per_sample_hint` : int, optional, default = 0\n",
      " |      Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory\n",
      " |  `preserve` : bool, optional, default = False\n",
      " |      Do not remove the Op from the graph even if its outputs are unused.\n",
      " |  `seed` : int, optional, default = -1\n",
      " |      Random seed (If not provided it will be populated based on the global seed of the pipeline)\n",
      " |  \n",
      " |  Methods defined here:\n",
      " |  \n",
      " |  __call__(self, *inputs, **kwargs)\n",
      " |      __call__(data, **kwargs)\n",
      " |      \n",
      " |      Operator call to be used in `define_graph` step.\n",
      " |      \n",
      " |      Args\n",
      " |      ----\n",
      " |      `data`: TensorList\n",
      " |          Input to the operator.\n",
      " |  \n",
      " |  __init__(self, **kwargs)\n",
      " |  \n",
      " |  ----------------------------------------------------------------------\n",
      " |  Data descriptors defined here:\n",
      " |  \n",
      " |  __dict__\n",
      " |      dictionary for instance variables (if defined)\n",
      " |  \n",
      " |  __weakref__\n",
      " |      list of weak references to the object (if defined)\n",
      " |  \n",
      " |  device\n",
      " |  \n",
      " |  preserve\n",
      " |  \n",
      " |  schema\n",
      " |  \n",
      " |  spec\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(ops.CustomDummy)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}