Caffe2 Release 18.03
The NVIDIA container image of Caffe2, release 18.03, is available.
Contents of Caffe2
This container image contains the complete source of the version of Caffe2 in
/opt/caffe2. It is pre-built and installed into the
/opt/caffe2/[binaries,lib] directories in the container image.
The container also includes the following:
- Ubuntu 16.04
- NVIDIA CUDA 9.0.176 (see Errata section and 2.1) including CUDA® Basic Linear Algebra Subroutines library™ (cuBLAS) 9.0.333 (see section 2.3.1)
- NVIDIA CUDA® Deep Neural Network library™ (cuDNN) 7.1.1
- NCCL 2.1.2 (optimized for NVLink™ )
- OpenMPI™ 1.10.3
Release 18.03 is based on CUDA 9, which requires NVIDIA Driver release 384.xx.
Key Features and Enhancements
- Caffe2 container image version 18.03 is based on Caffe2 0.8.1.
- When using ImageNet training scripts in
nvidia-exampleson multiple GPUs, the printed metrics in the log for weak scaling was wrong. Also, the number of epochs the model is trained for was wrong. Both of these issues are fixed in the this release.
- Gradient clipping used to be done by executing a series of small operators that compute a ratio by which the learning rate gets scaled, which has the same effect as gradient clipping for SGD optimizers. However, that method is wrong with optimizers that use momentum or history such as AdaGrad and Adam. In this release, we added a new operator
ClipByGlobalNormthat explicitly clips the gradient. This operator also supports mixed precision for inputs and outputs.
- Caffe2 already supported cuDNN RNN, however that integration does not provide enough features and flexibility to use cuDNN RNN in seq2seq. We improved this integration and also enabled using cuDNN RNN in the seq2seq example in
- Incorporated GitHub Caffe2 code as of February 16, 2018.
- Latest version of cuBLAS 9.0.333
- Latest version of cuDNN 7.1.1
- Ubuntu 16.04 with February 2018 updates
Starting with the next major version of CUDA release, we will no longer provide Python 2 containers and will only maintain Python 3 containers.
nvidia-examples/seq2seq, there is a bug that causes training to skip one epoch in case of loading a snapshot. This bug will be fixed in 18.04.