Release Notes#

This document describes the key features, software enhancements and improvements, and known issues for DALI 1.46.0. For previously released DALI documentation, see DALI Archives.

Overview#

DALI offers both performance and flexibility of accelerating different data pipelines (graphs that can have multiple outputs and inputs), as a single library, that can be easily integrated into different deep learning training and inference applications.

Using DALI#

Note

DALI builds for NVIDIA® CUDA® 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

To upgrade to DALI 1.46.0 from a previous version of DALI, follow the installation and usage information in the DALI User Guide.

Note

The internal DALI C++ API used for operator’s implementation, and the C++ API that enables using DALI as a library from native code, is not yet officially supported. Hence these APIs may change in the next release without advance notice.

Key Features and Enhancements#

This DALI release includes the following key features and enhancements:

  • Added support for CUDA 12.8 (#5711).

  • Optimized workspace and operator specification (#5740, #5770).

  • Introduced Common Subgraph Elimination for DALI pipeline/graph (#5752, #5755).

  • Added support for nvImageCodec 0.4.1 (#5576, #5774, #5780).

  • Improved documentation for supported environment variables (#5756).

  • Made the pipeline’s build call optional (#5754).

Fixed Issues#

The following issues were fixed in this release:

  • Fixed DALIDataType printing in global namespace (for custom C++ builds) (#5748).

Breaking Changes#

  • There are no breaking changes in this DALI release.

Deprecated Features#

  • Passing a seed argument to non-random operators is deprecated. Passing it has no effect, but it does trigger a warning.

Known Issues#

This DALI release includes the following known issues:

  • The following operators do not currently support checkpointing: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop.

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.

    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.

  • The experimental VideoReaderDecoder does not support open GOP.

    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.

  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.

    As a workaround, you can manually synchronize the device before returning the data from the callback.

  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:

    • privileged=yes in Extra Settings for AWS data points

    • --privileged or --security-opt seccomp=unconfined for bare Docker