DriveWorks SDK Reference
4.0.0 Release
For Test and Development only

src/dw/dnn/docs/dnn_usecase1.md
Go to the documentation of this file.
1 # Copyright (c) 2019-2020 NVIDIA CORPORATION. All rights reserved.
2 
3 @page dnn_usecase1 DNN Workflow
4 
5 This code snippet demonstrates the how the DNN module is typically used. Note that error handling is left out for clarity.
6 
7 Initialize network from file.
8 
9 If the model has been generated on DLA using `--useDLA` option with tensorrt_optimization tool,
10 the processor type should be either `DW_PROCESSOR_TYPE_DLA_0` or `DW_PROCESSOR_TYPE_DLA_1` depending on which DLA engine the inference should take place. Otherwise, the processor type should always be `DW_PROCESSOR_TYPE_GPU`.
11 
12 `contextHandle` is assumed to be a previously initialized `::dwContextHandle_t`.
13 
14 ```{.cpp}
15  // Load the DNN from a file. Note that the DNN model has to be generated with the tensorrt_optimization tool.
16  dwDNNHandle_t dnn = nullptr;
17  dwDNN_initializeTensorRTFromFile(&dnn, "network.fp32", nullptr, DW_PROCESSOR_TYPE_GPU, contextHandle);
18 ```
19 
20 Check that the loaded network has the expected number of inputs and outputs.
21 
22 ```{.cpp}
23  // Find out the number of input and output blobs in the netowrk
24  uint32_t numInputs = 0;
25  uint32_t numOutputs = 0;
26  dwDNN_getInputBlobCount(&numInputs, dnn);
27  dwDNN_getOutputBlobCount(&numOutputs, dnn);
28 
29  if (numInputs != 1) {
30  std::cerr << "Expected a DNN with one input blob." << std::endl;
31  return -1;
32  }
33  if (numOutputs != 2) {
34  std::cerr << "Expected a DNN with two output blobs." << std::endl;
35  return -1;
36  }
37 ```
38 
39 Ask the DNN about the order of the input and output blobs. The network is assumed to contain the input blob "data_in" and output blobs "data_out1" and "data_out2".
40 
41 ```{.cpp}
42  uint32_t inputIndex = 0;
43  uint32_t output1Index = 0;
44  uint32_t output2Index = 0;
45 
46  // Find indices of blobs by their name.
47  dwDNN_getInputIndex(&inputIndex, "data_in", dnn);
48  dwDNN_getOutputIndex(&output1Index, "data_out1", dnn);
49  dwDNN_getOutputIndex(&output2Index, "data_out2", dnn);
50 ```
51 
52 Initialize host and device memory to hold the inputs and outputs of the network.
53 
54 ```{.cpp}
55  std::vector<float32_t*> dnnInputs(numInputs, nullptr);
56  std::vector<float32_t*> dnnOutputs(numOutputs, nullptr);
57 
58  std::vector<float32_t> dnnInputHost;
59  std::vector<std::vector<float32_t>> dnnOutputHost(numOutputs);
60 
61  // Allocate device memory for DNN input.
62  dwBlobSize sizeInput;
63  dwDNN_getInputSize(&sizeInput, inputIndex, dnn);
64  size_t numInputElements = sizeInput.batchsize * sizeInput.channels * sizeInput.height * sizeInput.width;
65  cudaMalloc(&dnnInputs[inputIndex], sizeof(float32_t) * numInputElements);
66  dnnInputHost.resize(numInputElements);
67 
68  // Allocate device and host memory for DNN outputs
69  dwBlobSize size1, size2;
70 
71  dwDNN_getOutputSize(&size1, output1Index, dnn);
72  dwDNN_getOutputSize(&size2, output2Index, dnn);
73  size_t numElements1 = size1.batchsize * size1.channels * size1.height * size1.width;
74  size_t numElements2 = size2.batchsize * size2.channels * size2.height * size2.width;
75 
76  cudaMalloc(&dnnOutputs[output1Index], sizeof(float32_t) * numElements1);
77  cudaMalloc(&dnnOutputs[output2Index], sizeof(float32_t) * numElements2);
78  dnnOutputHost[output1Index].resize(numElements1);
79  dnnOutputHost[output2Index].resize(numElements2);
80 
81  // Fill dnnInputHost with application data.
82 ```
83 
84 Copy DNN input from host buffers to device, then perform DNN inference and copy results back. All operations are performed asynchronously with the host code.
85 
86 ```{.cpp}
87  // Enqueue asynchronous copy of network input data from host to device memory.
88  cudaMemcpyAsync(dnnInputs[inputIndex], dnnInputHost.data(), sizeof(float32_t) * numInputElements, cudaMemcpyHostToDevice);
89 
90  // Begin DNN inference in the currently selected CUDA stream.
91  dwDNN_infer(dnnInputs.data(), dnnOutputs.data(), dnn);
92 
93  // Enqueue asynchronous copy of the inference results to host memory
94  cudaMemcpyAsync(dnnOutputHost[output1Index].data(), dnnOutputs[output1Index], sizeof(float32_t) * numElements1, cudaMemcpyDeviceToHost);
95  cudaMemcpyAsync(dnnOutputHost[output2Index].data(), dnnOutputs[output2Index], sizeof(float32_t) * numElements2, cudaMemcpyDeviceToHost);
96 
97  // Do something while inference results are being calculated.
98  otherUsefulWork();
99 
100  // Wait until all pending operations on the CUDA device have finished.
101  cudaDeviceSynchronize();
102 
103  // Inference and memory copies are done. Read results from dnnOutputHost[output1Index] and dnnOutputHost[output2Index].
104 ```
105 
106 Finally, free previously allocated memory.
107 
108 ```{.cpp}
109  // Free resources.
110  cudaFree(dnnInputs[inputIndex]);
111  cudaFree(dnnOutputs[output1Index]);
112  cudaFree(dnnOutputs[output2Index]);
113  dwDNN_release(&dnn);
114 ```
115 
116 For more information see:
117 - @ref dwx_object_detector_tracker_sample