.. _efficientnet_autoaugment: EfficientNet for PyTorch with DALI and AutoAugment ================================================== This example shows how DALI's implementation of automatic augmentations - most notably `AutoAugment `_ and `TrivialAugment `_ - can be used in training. It shows the training of EfficientNet, an image classification model first described in `EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks `_. The code is based on `NVIDIA Deep Learning Examples `_ - it has been extended with DALI pipeline supporting automatic augmentations, which can be found in :fileref:`here `. Differences to the Deep Learning Examples configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * The default values of the parameters were adjusted to values used in EfficientNet training. * ``--data-backend`` parameter was changed to accept ``dali``, ``pytorch``, or ``synthetic``. It is set to ``dali`` by default. * ``--dali-device`` was added to control placement of some of DALI operators. * ``--augmentation`` was replaced with ``--automatic-augmentation``, now supporting ``disabled``, ``autoaugment``, and ``trivialaugment`` values. * ``--workers`` defaults were halved to accommodate DALI. The value is automatically doubled when ``pytorch`` data loader is used. Thanks to this the default value performs well with both loaders. * The model is restricted to EfficientNet-B0 architecture. Data backends ^^^^^^^^^^^^^ This model uses the following data augmentation: * For training: * Random resized crop to target images size (in this case 224) * Scale from 8% to 100% * Aspect ratio from 3/4 to 4/3 * Random horizontal flip * [Optional: AutoAugment or TrivialAugment] * Normalization * For inference: * Scale to target image size + additional size margin (in this case it is 224 + 32 = 266) * Center crop to target image size (in this case 224) * Normalization Setup ^^^^^ The EfficientNet script operates on ImageNet 1k, a widely popular image classification dataset from the ILSVRC challenge. 1. Download the dataset from http://image-net.org/download-images 2. Extract the training data: .. code-block:: bash mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done cd .. 3. Extract the validation data and move the images to subfolders: .. code-block:: bash mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash The directory in which the ``train/`` and ``val/`` directories are placed, is referred to as ``$PATH_TO_IMAGENET`` in this document. 4. Make sure you are either using the `NVIDIA PyTorch NGC container `_ or you have `DALI `_ and `PyTorch `_ installed. 5. Install `NVIDIA DLLogger `_ and `pynvml `_. Running the model ^^^^^^^^^^^^^^^^^ Training -------- To run training on a single GPU, use the ``main.py`` entry point: * For FP32: ``python ./main.py --batch-size 64 $PATH_TO_IMAGENET`` * For AMP: ``python ./main.py --batch-size 64 --amp --static-loss-scale 128 $PATH_TO_IMAGENET`` You may need to adjust ``--batch-size`` parameter for your machine. You can change the data loader and automatic augmentation scheme that are used by adding: * ``--data-backend``: ``dali`` | ``pytorch`` | ``synthetic``, * ``--automatic-augmentation``: ``disabled`` | ``autoaugment`` | ``trivialaugment`` (the last one only for DALI), * ``--dali-device``: ``cpu`` | ``gpu`` (only for DALI). By default DALI GPU-variant with AutoAugment is used. For example to run the EfficientNet with AMP on a batch size of 128 with DALI using TrivialAugment you need to invoke: .. code-block:: bash python ./main.py --amp --static-loss-scale 128 --batch-size 128 --data-backend dali --automatic-augmentation trivialaugment $PATH_TO_IMAGENET To run on multiple GPUs, use the ``multiproc.py`` to launch the ``main.py`` entry point script, passing the number of GPUs as ``--nproc_per_node`` argument. For example, to run the model on 8 GPUs using AMP and DALI with AutoAugment you need to invoke: .. code-block:: bash python ./multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --data-backend dali --automatic-augmentation autoaugment $PATH_TO_IMAGENET To see the full list of available options and their descriptions, use the ``-h`` or ``--help`` command-line option, for example: .. code-block:: bash python main.py -h Training with standard configuration ------------------------------------ To run the training in a standard configuration (DGX A100/DGX-1V, AMP, 400 Epochs, DALI with AutoAugment) invoke the following command: * for DGX1V-16G: ``python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 $PATH_TO_IMAGENET`` * for DGX-A100: ``python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 256 $PATH_TO_IMAGENET``` Benchmarking ------------ To run training benchmarks with different data loaders and automatic augmentations, you can use following commands, assuming that they are running on DGX1V-16G with 8 GPUs, 128 batch size and AMP: .. code-block:: bash # Adjust the following variable to control where to store the results of the benchmark runs export RESULT_WORKSPACE=./ # synthetic benchmark python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 1 --prof 1000 --no-checkpoints --training-only --data-backend synthetic --workspace $RESULT_WORKSPACE --report-file bench_report_synthetic.json $PATH_TO_IMAGENET # DALI without automatic augmentations python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only --data-backend dali --automatic-augmentation disabled --workspace $RESULT_WORKSPACE --report-file bench_report_dali.json $PATH_TO_IMAGENET # DALI with AutoAugment python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only --data-backend dali --automatic-augmentation autoaugment --workspace $RESULT_WORKSPACE --report-file bench_report_dali_aa.json $PATH_TO_IMAGENET # DALI with TrivialAugment python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only --data-backend dali --automatic-augmentation trivialaugment --workspace $RESULT_WORKSPACE --report-file bench_report_dali_ta.json $PATH_TO_IMAGENET # PyTorch without automatic augmentations python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only --data-backend pytorch --automatic-augmentation disabled --workspace $RESULT_WORKSPACE --report-file bench_report_pytorch.json $PATH_TO_IMAGENET # PyTorch with AutoAugment: python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 --epochs 4 --no-checkpoints --training-only --data-backend pytorch --automatic-augmentation autoaugment --workspace $RESULT_WORKSPACE --report-file bench_report_pytorch_aa.json $PATH_TO_IMAGENET Inference --------- Validation is done every epoch, and can be also run separately on a checkpointed model. .. code-block:: bash python ./main.py --evaluate --epochs 1 --resume -b $PATH_TO_IMAGENET To run inference on JPEG image, you have to first extract the model weights from checkpoint: .. code-block:: bash python checkpoint2model.py --checkpoint-path --weight-path Then, run the classification script: .. code-block:: bash python classify.py --pretrained-from-file --precision AMP|FP32 --image