Training the Model
==================

.. _training_the_model:

You can use the :code:`tlt-train` command to train models with single and multiple GPUs. The
NVIDIA Transfer Learning Toolkit provides a simple command line interface to train a deep
learning model for classification, object detection, and instance segmentation. It includes
the :code:`tlt-train` command to do this. To speed up the training process, the :code:`tlt-train`
command supports multiGPU training. You can invoke a multi GPU training session by using the
:code:`--gpus N` option, where :code:`N` is the number of GPUs you want to use. :code:`N` must
be less than the number of GPUs available in the given node for training.

.. Note:: Currently, only single-node multiGPU is supported.

The other optimizations included with tlt-train are:

* Quantization Aware Training (QAT)
* Automatic Mixed Precision (AMP)

Quantization Aware Training
---------------------------

TLT now supports Quantization-Aware-Training (QAT) for its object detection networks namely,
DetectNet_v2, SSD, DSSD, YOLOv3, RetinaNet and FasterRCNN. Quantization Aware Training emulates
the inference time quantization when training a model that may then be used by downstream
inference platforms to generate actual quantized models. The error from quantizating weights
and tensors to INT8 is modeled during training, allowing the model to adapt and mitigate the
error. During QAT, the model constructed in the training graph is modified to:

1. Replace existing nodes with nodes that support fake quantization of its weights.
2. Convert existing activations to ReLU-6 (except the output nodes).
3. Add Quantize and De-Quantize(QDQ) nodes to compute the dynamic ranges of the intermediate
   tensors.

The dynamic ranges computed during training, are serialized to a cache file using
:code:`tlt-export` that may then be parsed by TensorRT to create an optimized inference engine.
To enable QAT during training, simply set the :code:`enable_qat` parameter to be :code:`true` in the
:code:`training_config` field of the corresponding spec file of each of the supported apps.
The benefit of QAT training is usually a better accuracy when doing INT8 inference with TensorRT
compared with traditional calibration based INT8 TensorRT inference.

.. Note:: The number of scales present in the cache file is less than that generated by the Post
          Training Quantization technique using TensorRT. This is because the QDQ nodes are added
          only after operations that are fused by TensorRT (in GPU) eg: operation sequences such
          as Conv2d -> Bias -> Relu or Conv2d -> Bias -> BatchNormalization -> Activation, whereas
          during PTQ, the scales are applied to all the intermediate tensors in the model. Also,
          the final output regression nodes are not quantized in the current training graphs.
          So these layers currently run in fp32.

.. Note:: When deploying a model with platforms that have DLA, please note that currently using
          Quantization cache files generated by peeling the scales from the model is not
          supported, since DLA requires a scale factor for all layers. Inorder to use a QAT
          trained model with DLA, we recommend using the post training quantization at export
          (see :ref:`Exporting the Model <exporting_the_model>`). The Post Training Quantization
          method takes the current QAT trained model and generates scale factors for all
          intermediate tensors in the model since the DLA doesn’t fuse operations as done by the
          GPU.

Automatic Mixed Precision
-------------------------

TLT now supports Automatic-Mixed-Precision(AMP) training. DNN training has traditionally relied
on training using the IEEE-single precision format for its tensors. With mixed precision
training however, one may use a mixture for FP16 and FP32 operations in the training graph
to help speed up training while not compromising accuracy. There are several benefits to
using AMP:

* Speed up math-intensive operations, such as linear and convolution layers.
* Speed up memory-limited operations by accessing half the bytes compared to single-precision
* Reduce memory requirements for training models, enabling larger models or larger minibatches.

In TLT, enabling AMP is as simple as setting the environment variable :code:`TF_ENABLE_AUTO_MIXED_PRECISION=1`
when running :code:`tlt-train`. This will help speedup the training by using FP16 tensor cores.
Note that AMP is only supported on GPUs with Volta or above architecture.

Training a classification model
-------------------------------

Use the :code:`tlt-train` command to tune a pre-trained model:

.. code::

    tlt-train [-h] classification --gpus <num GPUs>
               -k <encoding key>
               -r <result directory>
               -e <spec file>

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir`: Path to a folder where the experiment outputs should be written.
* :code:`-k, --key`: User specific encoding key to save or load a :code:`.tlt` model.
* :code:`-e, --experiment_spec_file`: Path to the experiment spec file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus`: Number of GPUs to use and processes to launch for training. The default
  value is 1.

.. Note:: See the :ref:`Specification File for Classification <specification_file_for_classification>`
          section for more details.

Here's an example of using the :code:`tlt-train` command:

.. code::

    tlt-train classification -e /workspace/tlt_drive/spec/spec.cfg -r /workspace/output -k $YOUR_KEY

Training a DetectNet_v2 model
-----------------------------

.. _training_a_detectnet_v2_model:

After following the steps, go :ref:`here <preparing_the_input_data_structure>` to create TFRecords
ingestible by the TLT training, and setting up a :ref:`spec file
<creating_an_experiment_spec_file>`. You are now ready to start training an object detection
network.

DetectNet_v2 training command:

.. code::

    tlt-train [-h] detectnet_v2 
               -k <key>
               -r <result directory>
               -e <spec_file>
               [--gpus <num GPUs>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir`: Path to a folder where experiment outputs should be written.
* :code:`-k, –key`: User specific encoding key to save or load a :code:`.tlt` model.
* :code:`-e, --experiment_spec_file`: Path to spec file. Absolute path or relative to working
  directory. By default, the spec from :code:`spec_loader.py` is used).

Optional Arguments
^^^^^^^^^^^^^^^^^^

:code:`--gpus`: Number of GPUs to use and processes to launch for training. The default value is 1.
:code:`-h, --help`: To print help message

Sample Usage
^^^^^^^^^^^^

Here is an example of command for a 2 GPU training:

.. code::

    tlt-train detectnet_v2 -e <path_to_spec_file> 
                                     -r <path_to_experiment_output> 
                                     -k <key_to_load_the_model> 
                                     -n <name_string_for_the_model> 
                                     --gpus 2

.. Note:: The :code:`tlt-train` tool does not support training on images of multiple resolutions,
          or resizing images during training. All of the images must be resized offline to the
          final training size and the corresponding bounding boxes must be scaled accordingly.

.. Note:: DetectNet_v2 now supports resuming training from intermediate checkpoints. In case a
          previously running training experiment is stopped prematurely, one may restart the
          training from the last checkpoint by simply re-running the detectnet_v2 training
          command with the same command line arguments as before. The trainer for detectnet_v2
          finds the last saved checkpoint in the results directory and resumes the training from
          there. The interval at which the checkpoints are saved are defined by the
          `checkpoint_interval` parameter under the “training_config” for detectnet_v2.

Training a FasterRCNN Model
---------------------------

Use this command to execute the FasterRCNN training command:

.. code::

    tlt-train [-h] faster_rcnn -e <experiment_spec>
                               [-k <enc_key>]
                               [--gpus <num_gpus>]

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-h, --help`: Show this help message and exit.
* :code:`-k, --enc_key`: TLT encoding key, can override the one in the spec file.
* :code:`--gpus`: The number of GPUs to be used in the training in a multi-gpu
  scenario (default: 1).

Sample Usage
^^^^^^^^^^^^

Here's an example of using the FasterRCNN training command:

.. code::

   tlt-train faster_rcnn -e <experiment_spec>

Using a Pretrained Weights File
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Usually, using a pretrained weights file for the initial training of FasterRCNN helps get better
accuracy. NVIDIA recommends using the pretrained weights provided in NVIDIA GPU Cloud(NGC).
FasterRCNN loads the pretrained weights by name. That is, layer by layer, if TLT finds a layer
whose name and weights(bias) shape in the pretrained weights file matches a layer in the TLT
model, it will load that layer's weights(and bias, if any) into the model. If some layer in the
TLT cannot find a matching layer in the pretrained weights, then TLT will skip that layer and
will use random initialization for that layer instead. An exception is that if TLT finds a
matching layer in the pretrained weights(and bias, if any) but the shape of the pretrained
weights(or bias, if any) in that layer does not match the shape of weights(bias) for the
corresponding layer in TLT model, it will also skip that layer.

For some layers that have no weights(bias), nothing will be done for it(hence will be skipped).
So, in total, there are three possible statuses to indicate how a layer's pretrained weights
loading is going on:

* "Yes" means a layer has weights(bias) and is loaded from the pretrained weights file
  successfully for initialization.
* "No" means a layer has weights(bias) but due to mismatched weights(bias) shape(or probably
  something else), the weights(bias) cannot be loaded successfully and will use random
  initialization instead.
* "None" means a layer has no weights(bias) at all and will not load any weights. In the
  FasterRCNN training log, there is a table that shows the pretrained weights loading status for
  each layer in the model.

Training an SSD Model
---------------------

Train the SSD model using this command:

.. code::

    tlt-train [-h] ssd -e <experiment_spec> 
                       -r <output_dir> 
                       -k <key> 
                       -m <pretrained_model>
                       --gpus <num_gpus>

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir:code:`: Path to the folder where the experiment output is written.
* :code:`-k, --key`: Provide the encryption key to decrypt the model.
* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as the training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus num_gpus`: Number of GPUs to use and processes to launch for training.
  The default = 1.
* :code:`-m, --resume_model_weights`: Path to a pre-trained model or model to continue training.
* :code:`--initial_epoch`: Epoch number to resume from.
* :code:`-h, --help`: Show this help message and exit.

Sample Usage
^^^^^^^^^^^^

Here's an example of using the train command on an SSD model:

.. code::

   tlt-train ssd --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY

Training a DSSD Model
---------------------

Train the DSSD model using this command:

.. code::

    tlt-train [-h] dssd -e <experiment_spec> 
                        -r <output_dir> 
                        -k <key> 
                        -m <pretrained_model>
                        --gpus <num_gpus>

Required arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir`: Path to the folder where the experiment output is written.
* :code:`-k, --key`: Provide the encryption key to decrypt the model.
* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus num_gpus`: Number of GPUs to use and processes to launch for training.
  The default = 1.
* :code:`-m, --resume_model_weights`: Path to a pre-trained model or model to continue training.
* :code:`--initial_epoch`: Epoch number to resume from.
* :code:`-h, --help`: Show this help message and exit.

Sample Usage
^^^^^^^^^^^^

Here's an example of using the train command on an DSSD model:

.. code::

    tlt-train dssd --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY

Training a YOLOv3 Model
-----------------------

Train the YOLOv3 model using this command:

.. code::

    tlt-train [-h] yolo -e <experiment_spec> 
                        -r <output_dir> 
                        -k <key> 
                        -m <pretrained_model>
                        --gpus <num_gpus>

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir`: Path to the folder where the experiment output is written.
* :code:`-k, --key`: Provide the encryption key to decrypt the model.
* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as the training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus num_gpus`: Number of GPUs to use and processes to launch for training.
  The default = 1.
* :code:`-m, --resume_model_weights`: Path to a pre-trained model or model to continue training.
* :code:`--initial_epoch`: Epoch number to resume from.
* :code:`-h, --help`: Show this help message and exit.

Sample Usage
^^^^^^^^^^^^

Here's an example of using the train command on a YOLOv3 model:

.. code::

   tlt-train yolo --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY

Training a RetinaNet model
--------------------------

Train the RetinaNet model using this command:

.. code::

    tlt-train [-h] retinanet -e <experiment_spec> 
                       -r <output_dir> 
                       -k <key> 
                       -m <pretrained_model>
                       --gpus <num_gpus>

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-r, --results_dir`: Path to the folder where the experiment output is written.
* :code:`-k, --key`: Provide the encryption key to decrypt the model.
* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as the training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus num_gpus`: Number of GPUs to use and processes to launch for training. The default = 1.
* :code:`-m, --resume_model_weights`: Path to a pre-trained model or model to continue training.
* :code:`--initial_epoch`: Epoch number to resume from.
* :code:`-h, --help`: Show this help message and exit.

Sample Usage
^^^^^^^^^^^^

Here's an example of using the train command on a RetinaNet model:

.. code::

   tlt-train retinanet --gpus 2 -e /path/to/spec.txt -r /path/to/result -k $KEY

Training a MaskRCNN Model
-------------------------

Train the MaskRCNN model using this command:

.. code::

    tlt-train [-h] mask_rcnn -e <experiment_spec> 
                             -d <output_dir> 
                             -k <key>
                             --gpus <num_gpus>

Required Arguments
^^^^^^^^^^^^^^^^^^

* :code:`-d, --model_dir`: Path to the folder where the experiment output is written.
* :code:`-k, --key`: Provide the encryption key to decrypt the model.
* :code:`-e, --experiment_spec_file`: Experiment specification file to set up the evaluation
  experiment. This should be the same as the training specification file.

Optional Arguments
^^^^^^^^^^^^^^^^^^

* :code:`--gpus num_gpus`: Number of GPUs to use and processes to launch for training. The
  default = 1.
* :code:`-h, --help`: Show this help message and exit.

Sample Usage
^^^^^^^^^^^^

Here's an example of using the train command on a MaskRCNN model:

.. code::

    tlt-train mask_rcnn --gpus 2 -e /path/to/spec.txt -d /path/to/result -k $KEY