The following instructions are intended to be a shortcut to getting started with benchmarking. In the working directory of each benchmark, there is a README file (named either README.md or README.txt) that provides more details of data download, pre-processing, and running the code.
For demonstration purposes, we will run Deep Learning inferencing. Please refer to the NVIDIA Multi-node Training Deployment document for additional information regarding running Deep Learning training workflows.
TensorRT RN50 Inference¶
The container used in this example
Binary needed is included with the container at
The Resnet50 model prototxt and caffemodel files are within the container at
The command may take several minutes to run because NVIDIA® TensorRT™ is building the optimized plan before running. If you wish to see what it is doing, add
--verboseto the command.
Commands to the Run Test¶
1 2 3 4
$ sudo docker pull nvcr.io/nvaie/tensorrt:21.07-py3 $ sudo docker run --gpus all -it --rm -v $(pwd):/work nvcr.io/nvaie/tensorrt:21.07-py3 # cd /workspace/tensorrt/data/resnet50 (to exit container, type “exit”) # /workspace/tensorrt/bin/trtexec --batch=128 --iterations=400 --workspace=1024 --percentile=99 --deploy=ResNet50_N2.prototxt --model=ResNet50_fp32.caffemodel --output=prob --int8
Interpreting the Results¶
Results are reported in time to infer the given batch size. To convert to images per second, compute BATCH_SIZE/AVERAGE_TIME. The Average Time can be found as the mean GPU Compute value of the
tensorrt:21.07-py3 inferencing output.