TAO v5.5.0
NVIDIA TAO v5.5.0

Checkpointing and Resuming Training

At every train.checkpoint_interval, a PyTorch Lightning checkpoint is saved. It is called model_epoch_<epoch_num>.pth. These are saved in train.results_dir, like so:

Copy
Copied!
            

$ ls /results/train 'model_epoch_000.pth' 'model_epoch_001.pth' 'model_epoch_002.pth' 'model_epoch_003.pth' 'model_epoch_004.pth'

© Copyright 2024, NVIDIA. Last updated on Aug 30, 2024.