DriveWorks SDK Reference
3.0.4260 Release
For Test and Development only

dnn/docs/dnn_usecase1.md
Go to the documentation of this file.
1 # Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.
2 
3 @page dnn_usecase1 DNN Workflow
4 
5 @note SW Release Applicability: This tutorial is applicable to modules in both **NVIDIA DriveWorks** and **NVIDIA DRIVE Software** releases.
6 
7 This code snippet demonstrates the how the DNN module is typically used. Note that error handling is left out for clarity.
8 
9 Initialize network from file.
10 
11 If the model has been generated on DLA using `--useDLA` option with tensorrt_optimization tool,
12 the processor type should be either `DW_PROCESSOR_TYPE_DLA_0` or `DW_PROCESSOR_TYPE_DLA_1` depending on which DLA engine the inference should take place. Otherwise, the processor type should always be `DW_PROCESSOR_TYPE_GPU`.
13 
14 `contextHandle` is assumed to be a previously initialized `::dwContextHandle_t`.
15 
16 ```{.cpp}
17  // Load the DNN from a file. Note that the DNN model has to be generated with the tensorrt_optimization tool.
18  dwDNNHandle_t dnn = nullptr;
19  dwDNN_initializeTensorRTFromFile(&dnn, "network.fp32", nullptr, DW_PROCESSOR_TYPE_GPU, contextHandle);
20 ```
21 
22 Check that the loaded network has the expected number of inputs and outputs.
23 
24 ```{.cpp}
25  // Find out the number of input and output blobs in the netowrk
26  uint32_t numInputs = 0;
27  uint32_t numOutputs = 0;
28  dwDNN_getInputBlobCount(&numInputs, dnn);
29  dwDNN_getOutputBlobCount(&numOutputs, dnn);
30 
31  if (numInputs != 1) {
32  std::cerr << "Expected a DNN with one input blob." << std::endl;
33  return -1;
34  }
35  if (numOutputs != 2) {
36  std::cerr << "Expected a DNN with two output blobs." << std::endl;
37  return -1;
38  }
39 ```
40 
41 Ask the DNN about the order of the input and output blobs. The network is assumed to contain the input blob "data_in" and output blobs "data_out1" and "data_out2".
42 
43 ```{.cpp}
44  uint32_t inputIndex = 0;
45  uint32_t output1Index = 0;
46  uint32_t output2Index = 0;
47 
48  // Find indices of blobs by their name.
49  dwDNN_getInputIndex(&inputIndex, "data_in", dnn);
50  dwDNN_getOutputIndex(&output1Index, "data_out1", dnn);
51  dwDNN_getOutputIndex(&output2Index, "data_out2", dnn);
52 ```
53 
54 Initialize host and device memory to hold the inputs and outputs of the network.
55 
56 ```{.cpp}
57  std::vector<float32_t*> dnnInputs(numInputs, nullptr);
58  std::vector<float32_t*> dnnOutputs(numOutputs, nullptr);
59 
60  std::vector<float32_t> dnnInputHost;
61  std::vector<std::vector<float32_t>> dnnOutputHost(numOutputs);
62 
63  // Allocate device memory for DNN input.
64  dwBlobSize sizeInput;
65  dwDNN_getInputSize(&sizeInput, inputIndex, dnn);
66  size_t numInputElements = sizeInput.batchsize * sizeInput.channels * sizeInput.height * sizeInput.width;
67  cudaMalloc(&dnnInputs[inputIndex], sizeof(float32_t) * numInputElements);
68  dnnInputHost.resize(numInputElements);
69 
70  // Allocate device and host memory for DNN outputs
71  dwBlobSize size1, size2;
72 
73  dwDNN_getOutputSize(&size1, output1Index, dnn);
74  dwDNN_getOutputSize(&size2, output2Index, dnn);
75  size_t numElements1 = size1.batchsize * size1.channels * size1.height * size1.width;
76  size_t numElements2 = size2.batchsize * size2.channels * size2.height * size2.width;
77 
78  cudaMalloc(&dnnOutputs[output1Index], sizeof(float32_t) * numElements1);
79  cudaMalloc(&dnnOutputs[output2Index], sizeof(float32_t) * numElements2);
80  dnnOutputHost[output1Index].resize(numElements1);
81  dnnOutputHost[output2Index].resize(numElements2);
82 
83  // Fill dnnInputHost with application data.
84 ```
85 
86 Copy DNN input from host buffers to device, then perform DNN inference and copy results back. All operations are performed asynchronously with the host code.
87 
88 ```{.cpp}
89  // Enqueue asynchronous copy of network input data from host to device memory.
90  cudaMemcpyAsync(dnnInputs[inputIndex], dnnInputHost.data(), sizeof(float32_t) * numInputElements, cudaMemcpyHostToDevice);
91 
92  // Begin DNN inference in the currently selected CUDA stream.
93  dwDNN_infer(dnnInputs.data(), dnnOutputs.data(), dnn);
94 
95  // Enqueue asynchronous copy of the inference results to host memory
96  cudaMemcpyAsync(dnnOutputHost[output1Index].data(), dnnOutputs[output1Index], sizeof(float32_t) * numElements1, cudaMemcpyDeviceToHost);
97  cudaMemcpyAsync(dnnOutputHost[output2Index].data(), dnnOutputs[output2Index], sizeof(float32_t) * numElements2, cudaMemcpyDeviceToHost);
98 
99  // Do something while inference results are being calculated.
100  otherUsefulWork();
101 
102  // Wait until all pending operations on the CUDA device have finished.
103  cudaDeviceSynchronize();
104 
105  // Inference and memory copies are done. Read results from dnnOutputHost[output1Index] and dnnOutputHost[output2Index].
106 ```
107 
108 Finally, free previously allocated memory.
109 
110 ```{.cpp}
111  // Free resources.
112  cudaFree(dnnInputs[inputIndex]);
113  cudaFree(dnnOutputs[output1Index]);
114  cudaFree(dnnOutputs[output2Index]);
115  dwDNN_release(&dnn);
116 ```
117 
118 For more information see:
119 - @ref dwx_object_detector_tracker_sample