{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating TensorRT Engines from TensorFlow\n", "\n", "The UFF Toolkit allows you to convert TensorFlow models to UFF. The UFF parser can build TensorRT engines from these UFF models.\n", "\n", "For this example, we train a LeNet5 model to classify handwritten digits and then build a TensorRT Engine for inference. Please make sure you have TensorFlow installed.\n", "\n", "First, we need to import TensorFlow (Note: there is a know bug where TensorFlow needs to be imported before TensorRT. This will addressed in the future). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then import TensorRT and the UFF parser." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import tensorrt as trt\n", "from tensorrt.parsers import uffparser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use PyCUDA to transfer data to/from the GPU and NumPy to store data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import pycuda.driver as cuda\n", "import pycuda.autoinit\n", "import numpy as np\n", "from random import randint # generate a random test case\n", "from PIL import Image\n", "from matplotlib.pyplot import imshow # To show test case\n", "import time\n", "import os" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need to import the UFF toolkit to convert the graph from a serialized frozen TensorFlow model to UFF." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import uff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training a Model in TensorFlow \n", "\n", "For more detailed information about training models in TensorFlow, see https://www.tensorflow.org/get_started/get_started.\n", "\n", "First, we define hyper-parameters and helper functions for convenience. We will then define a network, our loss metrics, training/test steps, our input nodes, and a data loader." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "STARTER_LEARNING_RATE = 1e-4\n", "BATCH_SIZE = 10\n", "NUM_CLASSES = 10\n", "MAX_STEPS = 3000\n", "IMAGE_SIZE = 28\n", "IMAGE_PIXELS = IMAGE_SIZE ** 2\n", "OUTPUT_NAMES = [\"fc2/Relu\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Notice that we are padding our Conv2d layer. TensorRT expects symmetric padding for layers.*" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def WeightsVariable(shape):\n", " return tf.Variable(tf.truncated_normal(shape, stddev=0.1, name='weights'))\n", "\n", "def BiasVariable(shape):\n", " return tf.Variable(tf.constant(0.1, shape=shape, name='biases'))\n", "\n", "def Conv2d(x, W, b, strides=1):\n", " # Conv2D wrapper, with bias and relu activation\n", " filter_size = W.get_shape().as_list()\n", " pad_size = filter_size[0]//2\n", " pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])\n", " x = tf.pad(x, pad_mat)\n", " x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='VALID')\n", " x = tf.nn.bias_add(x, b)\n", " return tf.nn.relu(x)\n", "\n", "def MaxPool2x2(x, k=2):\n", " # MaxPool2D wrapper\n", " pad_size = k//2\n", " pad_mat = np.array([[0,0],[pad_size,pad_size],[pad_size,pad_size],[0,0]])\n", " return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1], padding='VALID')\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def network(images):\n", " # Convolution 1\n", " with tf.name_scope('conv1'):\n", " weights = WeightsVariable([5,5,1,32])\n", " biases = BiasVariable([32])\n", " conv1 = tf.nn.relu(Conv2d(images, weights, biases))\n", " pool1 = MaxPool2x2(conv1)\n", "\n", " # Convolution 2\n", " with tf.name_scope('conv2'):\n", " weights = WeightsVariable([5,5,32,64])\n", " biases = BiasVariable([64])\n", " conv2 = tf.nn.relu(Conv2d(pool1, weights, biases))\n", " pool2 = MaxPool2x2(conv2)\n", " pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])\n", "\n", " # Fully Connected 1\n", " with tf.name_scope('fc1'):\n", " weights = WeightsVariable([7 * 7 * 64, 1024])\n", " biases = BiasVariable([1024])\n", " fc1 = tf.nn.relu(tf.matmul(pool2_flat, weights) + biases)\n", "\n", " # Fully Connected 2\n", " with tf.name_scope('fc2'):\n", " weights = WeightsVariable([1024, 10])\n", " biases = BiasVariable([10])\n", " fc2 = tf.nn.relu(tf.matmul(fc1, weights) + biases)\n", "\n", " return fc2\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def loss_metrics(logits, labels):\n", " cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, \n", " logits=logits, \n", " name='softmax')\n", " return tf.reduce_mean(cross_entropy, name='softmax_mean')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def training(loss):\n", " tf.summary.scalar('loss', loss)\n", " global_step = tf.Variable(0, name='global_step', trainable=False)\n", " learning_rate = tf.train.exponential_decay(STARTER_LEARNING_RATE, \n", " global_step, \n", " 100000, \n", " 0.75, \n", " staircase=True)\n", " tf.summary.scalar('learning_rate', learning_rate)\n", " optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)\n", " train_op = optimizer.minimize(loss, global_step=global_step)\n", " return train_op" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def evaluation(logits, labels):\n", " correct = tf.nn.in_top_k(logits, labels, 1)\n", " return tf.reduce_sum(tf.cast(correct, tf.int32))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def do_eval(sess,\n", " eval_correct,\n", " images_placeholder,\n", " labels_placeholder,\n", " data_set,\n", " summary):\n", "\n", " true_count = 0\n", " steps_per_epoch = data_set.num_examples // BATCH_SIZE\n", " num_examples = steps_per_epoch * BATCH_SIZE\n", " for step in range(steps_per_epoch):\n", " feed_dict = fill_feed_dict(data_set,\n", " images_placeholder,\n", " labels_placeholder)\n", " log, correctness = sess.run([summary, eval_correct], feed_dict=feed_dict)\n", " true_count += correctness\n", " precision = float(true_count) / num_examples\n", " tf.summary.scalar('precision', tf.constant(precision))\n", " print('Num examples %d, Num Correct: %d Precision @ 1: %0.04f' % \n", " (num_examples, true_count, precision))\n", " return log" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def placeholder_inputs(batch_size):\n", " images_placeholder = tf.placeholder(tf.float32, shape=(None, 28, 28, 1))\n", " labels_placeholder = tf.placeholder(tf.int32, shape=(None))\n", " return images_placeholder, labels_placeholder\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def fill_feed_dict(data_set, images_pl, labels_pl):\n", " images_feed, labels_feed = data_set.next_batch(BATCH_SIZE)\n", " feed_dict = {\n", " images_pl: np.reshape(images_feed, (-1,28,28,1)),\n", " labels_pl: labels_feed,\n", " }\n", " return feed_dict\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define our training pipeline in a function that will return a frozen model with the training nodes removed." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def run_training(data_sets):\n", " with tf.Graph().as_default():\n", " images_placeholder, labels_placeholder = placeholder_inputs(BATCH_SIZE)\n", " logits = network(images_placeholder)\n", " loss = loss_metrics(logits, labels_placeholder)\n", " train_op = training(loss)\n", " eval_correct = evaluation(logits, labels_placeholder)\n", " summary = tf.summary.merge_all()\n", " init = tf.global_variables_initializer()\n", " saver = tf.train.Saver()\n", " gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)\n", " sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))\n", " summary_writer = tf.summary.FileWriter(\"/tmp/tensorflow/mnist/log\", \n", " graph=tf.get_default_graph())\n", " test_writer = tf.summary.FileWriter(\"/tmp/tensorflow/mnist/log/validation\", \n", " graph=tf.get_default_graph())\n", " sess.run(init)\n", " for step in range(MAX_STEPS):\n", " start_time = time.time()\n", " feed_dict = fill_feed_dict(data_sets.train,\n", " images_placeholder,\n", " labels_placeholder)\n", " _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)\n", " duration = time.time() - start_time\n", " if step % 100 == 0:\n", " print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))\n", " summary_str = sess.run(summary, feed_dict=feed_dict)\n", " summary_writer.add_summary(summary_str, step)\n", " summary_writer.flush()\n", " if (step + 1) % 1000 == 0 or (step + 1) == MAX_STEPS:\n", " checkpoint_file = os.path.join(\"/tmp/tensorflow/mnist/log\", \"model.ckpt\")\n", " saver.save(sess, checkpoint_file, global_step=step)\n", " print('Validation Data Eval:')\n", " log = do_eval(sess,\n", " eval_correct,\n", " images_placeholder,\n", " labels_placeholder,\n", " data_sets.validation,\n", " summary)\n", " test_writer.add_summary(log, step)\n", " # Return sess\n", "\n", " graphdef = tf.get_default_graph().as_graph_def()\n", " frozen_graph = tf.graph_util.convert_variables_to_constants(sess, \n", " graphdef, \n", " OUTPUT_NAMES)\n", " return tf.graph_util.remove_training_nodes(frozen_graph)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we load the TensorFlow MNIST data loader and run training. The model has summaries included, so you can visualize training in TensorBoard." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting MNIST-data/train-images-idx3-ubyte.gz\n", "Extracting MNIST-data/train-labels-idx1-ubyte.gz\n", "Extracting MNIST-data/t10k-images-idx3-ubyte.gz\n", "Extracting MNIST-data/t10k-labels-idx1-ubyte.gz\n", "Step 0: loss = 10.72 (0.123 sec)\n", "Step 100: loss = 2.30 (0.019 sec)\n", "Step 200: loss = 2.30 (0.018 sec)\n", "Step 300: loss = 2.30 (0.018 sec)\n", "Step 400: loss = 2.30 (0.017 sec)\n", "Step 500: loss = 2.30 (0.017 sec)\n", "Step 600: loss = 2.30 (0.018 sec)\n", "Step 700: loss = 2.30 (0.017 sec)\n", "Step 800: loss = 2.30 (0.017 sec)\n", "Step 900: loss = 2.30 (0.017 sec)\n", "Validation Data Eval:\n", "Num examples 5000, Num Correct: 4912 Precision @ 1: 0.9824\n", "Step 1000: loss = 2.30 (0.021 sec)\n", "Step 1100: loss = 2.30 (0.018 sec)\n", "Step 1200: loss = 2.30 (0.018 sec)\n", "Step 1300: loss = 2.30 (0.018 sec)\n", "Step 1400: loss = 2.30 (0.018 sec)\n", "Step 1500: loss = 2.23 (0.018 sec)\n", "Step 1600: loss = 2.31 (0.018 sec)\n", "Step 1700: loss = 1.96 (0.017 sec)\n", "Step 1800: loss = 2.31 (0.017 sec)\n", "Step 1900: loss = 2.35 (0.017 sec)\n", "Validation Data Eval:\n", "Num examples 5000, Num Correct: 3884 Precision @ 1: 0.7768\n", "Step 2000: loss = 2.12 (0.022 sec)\n", "Step 2100: loss = 2.46 (0.017 sec)\n", "Step 2200: loss = 2.31 (0.019 sec)\n", "Step 2300: loss = 1.87 (0.018 sec)\n", "Step 2400: loss = 1.94 (0.017 sec)\n", "Step 2500: loss = 2.31 (0.017 sec)\n", "Step 2600: loss = 1.62 (0.018 sec)\n", "Step 2700: loss = 2.17 (0.018 sec)\n", "Step 2800: loss = 1.91 (0.018 sec)\n", "Step 2900: loss = 2.07 (0.018 sec)\n", "Validation Data Eval:\n", "Num examples 5000, Num Correct: 4830 Precision @ 1: 0.9660\n", "INFO:tensorflow:Froze 8 variables.\n", "Converted 8 variables to const ops.\n" ] } ], "source": [ "MNIST_DATASETS = tf.contrib.learn.datasets.load_dataset(\"mnist\")\n", "tf_model = run_training(MNIST_DATASETS)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converting the TensorFlow Model to UFF\n", "We can now convert the model into a serialized UFF model. To convert a model, we need to provide at least the model stream and the name(s) of the desired output node(s) to the `uff.from_tensorflow` function. The UFF Toolkit also includes a `uff.from_tensorflow_frozen_model` function which can create a UFF model from a TensorFlow frozen protobuf file. These functions have options for:\n", "\n", "\n", "- `quiet` mode to suppress conversion logging\n", "\n", "\n", "- `input_nodes` to allow you to define a set of input nodes in the graph (the defaults are Placeholder nodes)\n", "\n", "\n", "- `text` will let you save a human readable version of UFF model alongside the binary UFF\n", "\n", "\n", "- `list_nodes` will list the nodes in the graph \n", "\n", "\n", "- `output_filename` will write the model out to the filepath specified in addition to returning a serialized model\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using output node fc2/Relu\n", "Converting to UFF graph\n", "No. nodes: 28\n" ] } ], "source": [ "uff_model = uff.from_tensorflow(tf_model, [\"fc2/Relu\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing the UFF Model into TensorRT and Building an Engine \n", "\n", "We now have a UFF model stream with which we can build a TensorRT engine. We start by creating a logger for TensorRT." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we create a UFF parser and identify the desired input and output nodes." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parser = uffparser.create_uff_parser()\n", "parser.register_input(\"Placeholder\", (1,28,28), 0)\n", "parser.register_output(\"fc2/Relu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we pass the logger, parser, uff model stream and some settings (max batch size and max workspace size) to a utility function that will build the engine for us." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1 << 20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now get rid of the parser." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "parser.destroy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we get a test case from the TensorFlow dataloader (converting it to FP32)." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADIhJREFUeJzt3X+oJfV5x/H3k811TTe2uKZut0bUyBIwkpr2om1ibVKbsIpEA0Gy0LAttptAlBpCW7F/xPYvqU1C+iuw1q2bkhiLRpQiTexSKpJgvFp/bxKtXavLrhtZQVPqelef/nHHcHXvnXM9Z86Zc/d5v+Bw58x3zszDsJ+dH99z5huZiaR63tZ3AZL6Yfilogy/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxX19klu7JhYm8eybpKblEp5mf/llTwUK1l2pPBHxGbgq8Aa4B8y89q25Y9lHefE+aNsUlKLe3PXipcd+rQ/ItYAfwdcAJwBbImIM4Zdn6TJGuWa/2zgycx8KjNfAb4FXNxNWZLGbZTwnwQ8s+j9s828N4iIbRExFxFz8xwaYXOSujT2u/2ZuT0zZzNzdoa1496cpBUaJfx7gZMXvX93M0/SKjBK+O8DNkXEaRFxDPAp4I5uypI0bkN39WXm4Yi4HPgOC119OzLzsc4qkzRWI/XzZ+adwJ0d1SJpgvx6r1SU4ZeKMvxSUYZfKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFTXSIbmmxbT9+qrX94+teaG2/6KRf67KccjzyS0UZfqkowy8VZfilogy/VJThl4oy/FJRI/XzR8Qe4CXgVeBwZs52UZSOHvs//8Fl28459p7Wz87nMV2Xo0W6+JLPRzLz+Q7WI2mCPO2Xiho1/Al8NyLuj4htXRQkaTJGPe0/NzP3RsSJwF0R8cPMvHvxAs1/CtsAjuXnRtycpK6MdOTPzL3N3wPAbcDZSyyzPTNnM3N2hrWjbE5Sh4YOf0Ssi4jjXp8GPgY82lVhksZrlNP+DcBtEfH6er6Zmf/aSVWSxm7o8GfmU8CvdFiLVqE1G05sbX/fJ3cv27b+bfbj98muPqkowy8VZfilogy/VJThl4oy/FJRPrpbI3nhxuNa22855aYJVaK3yiO/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxVlP79avXzREQ9neoM/3/SPE6pEXfPILxVl+KWiDL9UlOGXijL8UlGGXyrK8EtF2c+vVvt+9+XW9o+8o70d1gy97dnrrmht/yW+N/S65ZFfKsvwS0UZfqkowy8VZfilogy/VJThl4oa2M8fETuAi4ADmXlmM289cDNwKrAHuDQzXxhfmRqX/772N1rbH/vNv21tn89Xh97286+90to+81IOvW4NtpIj/43A5jfNuwrYlZmbgF3Ne0mryMDwZ+bdwME3zb4Y2NlM7wQu6bguSWM27DX/hszc10zvBzZ0VI+kCRn5hl9mJrDsxVlEbIuIuYiYm+fQqJuT1JFhw/9cRGwEaP4eWG7BzNyembOZOTvD2iE3J6lrw4b/DmBrM70VuL2bciRNysDwR8RNwPeB90bEsxFxGXAt8NGIeAL4nea9pFVkYD9/Zm5Zpun8jmvRGKx533tb27dccPeEKjnS5uv/pLX95Bv8vf44+Q0/qSjDLxVl+KWiDL9UlOGXijL8UlE+uvso9/THT2htv+WE/xywhuEfvT3Iqbc+39o+/I+FtRIe+aWiDL9UlOGXijL8UlGGXyrK8EtFGX6pKPv5jwIHf3/5x2/f9tnrBnx6pttitGp45JeKMvxSUYZfKsrwS0UZfqkowy8VZfilouznPwocfP/yQ1mf9vZjR1r3TLT/nv9/Dv9fa/uWL/7xsm3HP/79oWpSNzzyS0UZfqkowy8VZfilogy/VJThl4oy/FJRA/v5I2IHcBFwIDPPbOZdA/wh8JNmsasz885xFakBlu/mZz7H+/T7v3/+vNb242+0L39areTIfyOweYn5X8nMs5qXwZdWmYHhz8y7gYMTqEXSBI1yzX95RDwcETsi4vjOKpI0EcOG/2vA6cBZwD7gS8stGBHbImIuIubmOTTk5iR1bajwZ+ZzmflqZr4GXA+c3bLs9syczczZGdYOW6ekjg0V/ojYuOjtJ4BHuylH0qSspKvvJuDDwLsi4lngi8CHI+IsFjqZ9gCfGWONksZgYPgzc8sSs28YQy1axpoNJ7a2n3POjyZUyZHu+4vZ1vZ38IMJVaK3ym/4SUUZfqkowy8VZfilogy/VJThl4ry0d2rwAu//Z7W9ltO+esJVaKjiUd+qSjDLxVl+KWiDL9UlOGXijL8UlGGXyrKfv5V4PQrftjbtj/7zG+1th/30P7W9sNdFqNOeeSXijL8UlGGXyrK8EtFGX6pKMMvFWX4paLs558C+z//wdb2fznlbwasYU13xbzJD259f2v7L+/53ti2rfHyyC8VZfilogy/VJThl4oy/FJRhl8qyvBLRQ3s54+Ik4GvAxuABLZn5lcjYj1wM3AqsAe4NDNfGF+pdc3nq2Nb95m3XdHavuk6+/GPVis58h8GvpCZZwC/DnwuIs4ArgJ2ZeYmYFfzXtIqMTD8mbkvMx9opl8CdgMnARcDO5vFdgKXjKtISd17S9f8EXEq8AHgXmBDZu5rmvazcFkgaZVYcfgj4p3ArcCVmfni4rbMTBbuByz1uW0RMRcRc/McGqlYSd1ZUfgjYoaF4H8jM7/dzH4uIjY27RuBA0t9NjO3Z+ZsZs7OsLaLmiV1YGD4IyKAG4DdmfnlRU13AFub6a3A7d2XJ2lcVvKT3g8BnwYeiYgHm3lXA9cC/xwRlwFPA5eOp8Sj3y9s3jd4oTE57snx/RxY021g+DPzHiCWaT6/23IkTYrf8JOKMvxSUYZfKsrwS0UZfqkowy8V5aO7p8CuM29pbZ9f8ovTK7N7vr193f7Xhl+5VjWP/FJRhl8qyvBLRRl+qSjDLxVl+KWiDL9UlP38U+CM/7istf2h87YPve4/uO7K1vYTb/bR3FV55JeKMvxSUYZfKsrwS0UZfqkowy8VZfilomJhpK3J+PlYn+eET/uWxuXe3MWLeXC5R+2/gUd+qSjDLxVl+KWiDL9UlOGXijL8UlGGXypqYPgj4uSI+PeIeDwiHouIP2rmXxMReyPiweZ14fjLldSVlTzM4zDwhcx8ICKOA+6PiLuatq9k5l+NrzxJ4zIw/Jm5D9jXTL8UEbuBk8ZdmKTxekvX/BFxKvAB4N5m1uUR8XBE7IiI45f5zLaImIuIuXkOjVSspO6sOPwR8U7gVuDKzHwR+BpwOnAWC2cGX1rqc5m5PTNnM3N2hrUdlCypCysKf0TMsBD8b2TmtwEy87nMfDUzXwOuB84eX5mSuraSu/0B3ADszswvL5q/cdFinwAe7b48SeOykrv9HwI+DTwSEQ82864GtkTEWUACe4DPjKVCSWOxkrv99wBL/T74zu7LkTQpfsNPKsrwS0UZfqkowy8VZfilogy/VJThl4oy/FJRhl8qyvBLRRl+qSjDLxVl+KWiDL9U1ESH6I6InwBPL5r1LuD5iRXw1kxrbdNaF1jbsLqs7ZTM/MWVLDjR8B+x8Yi5zJztrYAW01rbtNYF1jasvmrztF8qyvBLRfUd/u09b7/NtNY2rXWBtQ2rl9p6veaX1J++j/ySetJL+CNic0T8KCKejIir+qhhORGxJyIeaUYenuu5lh0RcSAiHl00b31E3BURTzR/lxwmrafapmLk5paRpXvdd9M24vXET/sjYg3wY+CjwLPAfcCWzHx8ooUsIyL2ALOZ2XufcEScB/wU+HpmntnM+0vgYGZe2/zHeXxm/umU1HYN8NO+R25uBpTZuHhkaeAS4Pfocd+11HUpPey3Po78ZwNPZuZTmfkK8C3g4h7qmHqZeTdw8E2zLwZ2NtM7WfjHM3HL1DYVMnNfZj7QTL8EvD6ydK/7rqWuXvQR/pOAZxa9f5bpGvI7ge9GxP0Rsa3vYpawoRk2HWA/sKHPYpYwcOTmSXrTyNJTs++GGfG6a97wO9K5mfmrwAXA55rT26mUC9ds09Rds6KRmydliZGlf6bPfTfsiNdd6yP8e4GTF71/dzNvKmTm3ubvAeA2pm/04edeHyS1+Xug53p+ZppGbl5qZGmmYN9N04jXfYT/PmBTRJwWEccAnwLu6KGOI0TEuuZGDBGxDvgY0zf68B3A1mZ6K3B7j7W8wbSM3LzcyNL0vO+mbsTrzJz4C7iQhTv+/wX8WR81LFPXe4CHmtdjfdcG3MTCaeA8C/dGLgNOAHYBTwD/Bqyfotr+CXgEeJiFoG3sqbZzWTilfxh4sHld2Pe+a6mrl/3mN/ykorzhJxVl+KWiDL9UlOGXijL8UlGGXyrK8EtFGX6pqP8HAZbQtNs/yTsAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "img, label = MNIST_DATASETS.test.next_batch(1)\n", "img = img[0]\n", "# Convert input data to Float32\n", "img = img.astype(np.float32)\n", "label = label[0]\n", "%matplotlib inline\n", "imshow(img.reshape(28,28))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we create a runtime and an execution context for the engine." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "runtime = trt.infer.create_infer_runtime(G_LOGGER)\n", "context = engine.create_execution_context()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we allocate memory on the GPU, as well as on the host to hold results after inference. The size of these allocations is the size of the input/expected output * the batch size." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "output = np.empty(10, dtype = np.float32)\n", "\n", "# Alocate device memory\n", "d_input = cuda.mem_alloc(1 * img.nbytes)\n", "d_output = cuda.mem_alloc(1 * output.nbytes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The engine requires bindings (pointers to GPU memory). PyCUDA lets us do this by casting the results of memory allocations to ints." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "bindings = [int(d_input), int(d_output)] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a cuda stream to run inference." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "stream = cuda.Stream()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we transfer the data to the GPU, run inference, then transfer the results to the host." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Transfer input data to device\n", "cuda.memcpy_htod_async(d_input, img, stream)\n", "# Execute model \n", "context.enqueue(1, bindings, stream.handle, None)\n", "# Transfer predictions back\n", "cuda.memcpy_dtoh_async(output, d_output, stream)\n", "# Syncronize threads\n", "stream.synchronize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `np.argmax` to get a prediction." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test Case: 1\n", "Prediction: 0\n" ] } ], "source": [ "print(\"Test Case: \" + str(label))\n", "print (\"Prediction: \" + str(np.argmax(output)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also save our engine to a file to use later." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trt.utils.write_engine_to_file(\"./tf_mnist.engine\", engine.serialize()) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can load this engine later by using `tensorrt.utils.load_engine`." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "new_engine = trt.utils.load_engine(G_LOGGER, \"./tf_mnist.engine\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we clean up our context, engine and runtime." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "context.destroy()\n", "engine.destroy()\n", "new_engine.destroy()\n", "runtime.destroy()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }