PyTorch Release 18.09

The NVIDIA container image of PyTorch, release 18.09, is available.

Contents of PyTorch

This container image contains the complete source of the version of PyTorch in /opt/pytorch. It is pre-built and installed in the pytorch-py3.6 Conda™ environment in the container image.

The container also includes the following:

Ubuntu 16.04 including Python 3.6 environment
NVIDIA CUDA 10.0.130 including CUDA^® Basic Linear Algebra Subroutines library™ (cuBLAS) 10.0.130
NVIDIA CUDA^® Deep Neural Network library™ (cuDNN) 7.3.0
NCCL 2.3.4 (optimized for NVLink™ )
Caffe2
TensorRT 5.0.0 RC
DALI 0.2 Beta
Tensor Core Optimized Examples:
- ResNet50 v1.5
- GNMT v2

Driver Requirements

Release 18.09 is based on CUDA 10, which requires NVIDIA Driver release 410.xx. However, if you are running on Tesla (Tesla V100, Tesla P4, Tesla P40, or Tesla P100), you may use NVIDIA driver release 384. For more information, see CUDA Compatibility and Upgrades.

Key Features and Enhancements

This PyTorch release includes the following key features and enhancements.

PyTorch container image version 18.09 is based on PyTorch 0.4.1+. PyTorch 0.4.1 is released and included with this container.
Latest version of cuDNN 7.3.0.
Latest version of CUDA 10.0.130 which includes support for DGX-2, Turing, and Jetson Xavier.
Latest version of cuBLAS 10.0.130.
Latest version of NCCL 2.3.4.
Latest version of TensorRT 5.0.0 RC.

Note:

All 18.09 containers inherit TensorRT 5.0.0 RC from the base container, however, some containers may not use TensorRT if there is no support for TensorRT in the given framework.
An implementation of ResNet50. The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model.
Stream pool: PyTorch now uses per GPU stream pools behind the scenes. This means that CUDA streams are created when first used on a GPU and destroyed on exit. As a result, networks that use multiple streams may see the same stream used repeatedly in their profiles, and networks that retain streams for long periods may accidentally schedule parallelizable work to the same stream. It’s recommended that streams be acquired, used, and released as needed.
Reliability: Some cases where a dataloader could hang if shutdown during its iteration has been fixed.
Fusion: Tensor and constant scalar operations, like add(t, 1), and chunk operations are now fusable.
Performance improvements: dropout, 1x1 convolutions for NCHW, and weightnorm should be faster in a majority of scenarios.
Latest version of DALI 0.2 Beta
Ubuntu 16.04 with August 2018 updates

Tensor Core Examples

An implementation of ResNet50. The ResNet50 v1.5 model is a modified version of the original ResNet50 v1 model.
An implementation of GNMT v2. The GNMT v2 model is similar to the one discussed in the Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation paper.

Known Issues

The DALI integrated ResNet-50 samples in the 18.09 NGC TensorFlow and PyTorch containers may result in lower than expected performance results. We are working to address the issue in the next release.
There is a chance that PyTorch will hang on exit when running multi-gpu training. This hang does not affect any results of the run; however, the process will have to be terminated manually.