ResNet-N with TensorFlow and DALI ================================= This demo implements residual networks model and use DALI for the data augmentation pipeline from `the original paper`_. It implements the ResNet50 v1.5 CNN model and demonstrates efficient single-node training on multi-GPU systems. They can be used for benchmarking, or as a starting point for implementing and training your own network. Common utilities for defining CNN networks and performing basic training are located in the nvutils directory inside :fileref:`docs/examples/use_cases/tensorflow/resnet-n`. The utilities are written in Tensorflow 2.0. Use of nvutils is demonstrated in the model script (i.e. resnet.py). The scripts support both Keras Fit/Compile and Custom Training Loop (CTL) modes with Horovod. To use DALI pipeline for data loading and preprocessing ``--dali_mode=GPU`` or ``--dali_mode=CPU`` Training in Keras Fit/Compile mode ---------------------------------- For the full training on 8 GPUs:: mpiexec --allow-run-as-root --bind-to socket -np 8 \ python resnet.py --num_iter=90 --iter_unit=epoch \ --data_dir=/data/imagenet/train-val-tfrecord-480/ \ --precision=fp16 --display_every=100 \ --export_dir=/tmp --dali_mode="GPU" For the benchmark training on 8 GPUs:: mpiexec --allow-run-as-root --bind-to socket -np 8 \ python resnet.py --num_iter=400 --iter_unit=batch \ --data_dir=/data/imagenet/train-val-tfrecord-480/ \ --precision=fp16 --display_every=100 --dali_mode="GPU" Predicting in Keras Fit/Compile mode ------------------------------------ For predicting with previously saved mode in `/tmp`:: python resnet.py --predict --export_dir=/tmp --dali_mode="GPU" Training in CTL (Custom Training Loop) mode ------------------------------------------- For the full training on 8 GPUs:: mpiexec --allow-run-as-root --bind-to socket -np 8 \ python resnet_ctl.py --num_iter=90 --iter_unit=epoch \ --data_dir=/data/imagenet/train-val-tfrecord-480/ \ --precision=fp16 --display_every=100 \ --export_dir=/tmp --dali_mode="GPU" For the benchmark training on 8 GPUs:: mpiexec --allow-run-as-root --bind-to socket -np 8 \ python resnet_ctl.py --num_iter=400 --iter_unit=batch \ --data_dir=/data/imagenet/train-val-tfrecord-480/ \ --precision=fp16 --display_every=100 --dali_mode="GPU" Predicting in CTL (Custom Training Loop) mode --------------------------------------------- For predicting with previously saved mode in `/tmp`:: python resnet_ctl.py --predict --export_dir=/tmp --dali_mode="GPU" Other useful options -------------------- To use tensorboard (Note, `/tmp/some_dir` needs to be created by users):: --tensorboard_dir=/tmp/some_dir To export saved model at the end of training (Note, `/tmp/some_dir` needs to be created by users):: --export_dir=/tmp/some_dir To store checkpoints at the end of every epoch (Note, `/tmp/some_dir` needs to be created by users):: --log_dir=/tmp/some_dir To enable XLA:: --use_xla Requirements ~~~~~~~~~~~~ TensorFlow ^^^^^^^^^^ :: pip install tensorflow-gpu==2.4.1 OpenMPI ^^^^^^^ :: wget -q -O - https://www.open-mpi.org/software/ompi/v3.0/downloads/openmpi-3.0.0.tar.gz | tar -xz cd openmpi-3.0.0 ./configure --enable-orterun-prefix-by-default --with-cuda --prefix=/usr/local/mpi --disable-getpwuid make -j"$(nproc)" install cd .. && rm -rf openmpi-3.0.0 echo "/usr/local/mpi/lib" >> /etc/ld.so.conf.d/openmpi.conf && ldconfig export PATH=/usr/local/mpi/bin:$PATH The following works around a segfault in OpenMPI 3.0 when run within a single node without ssh being installed. :: /bin/echo -e '#!/bin/bash'\ '\ncat <> /usr/local/mpi/bin/rsh_warn.sh && \ chmod +x /usr/local/mpi/bin/rsh_warn.sh && \ echo "plm_rsh_agent = /usr/local/mpi/bin/rsh_warn.sh" >> /usr/local/mpi/etc/openmpi-mca-params.conf Horovod ^^^^^^^ :: export HOROVOD_GPU_ALLREDUCE=NCCL export HOROVOD_NCCL_INCLUDE=/usr/include export HOROVOD_NCCL_LIB=/usr/lib/x86_64-linux-gnu export HOROVOD_NCCL_LINK=SHARED export HOROVOD_WITHOUT_PYTORCH=1 pip install horovod==0.21.0 .. _the original paper: https://arxiv.org/pdf/1512.03385.pdf .. _NGC TensorFlow Container: https://www.nvidia.com/en-us/gpu-cloud/deep-learning-containers/