NVIDIA DALI Documentation

Warning

You are currently viewing unstable developer preview of the documentation. To see the documentation for the latest stable release, refer to:

The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built in data loaders and data iterators in popular deep learning frameworks.

Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference.

DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline. Features such as prefetching, parallel execution, and batch processing are handled transparently for the user.

In addition, the deep learning frameworks have multiple data pre-processing implementations, resulting in challenges such as portability of training and inference workflows, and code maintainability. Data processing pipelines implemented using DALI are portable because they can easily be retargeted to TensorFlow, PyTorch, MXNet and PaddlePaddle.

DALI Diagram

Highlights

  • Easy-to-use functional style Python API.

  • Multiple data formats support - LMDB, RecordIO, TFRecord, COCO, JPEG, JPEG 2000, WAV, FLAC, OGG, H.264, VP9 and HEVC.

  • Portable accross popular deep learning frameworks: TensorFlow, PyTorch, MXNet, PaddlePaddle.

  • Supports CPU and GPU execution.

  • Scalable across multiple GPUs.

  • Flexible graphs let developers create custom pipelines.

  • Extensible for user-specific needs with custom operators.

  • Accelerates image classification (ResNet-50), object detection (SSD) workloads as well as ASR models (Jasper, RNN-T).

  • Allows direct data path between storage and GPU memory with GPUDirect Storage.

  • Easy integration with NVIDIA Triton Inference Server with DALI TRITON Backend.

  • Open source.