Release Notes#

This document describes the key features, software enhancements and improvements, and known issues for DALI 1.42.0. For previously released DALI documentation, see DALI Archives.

Overview#

DALI offers both performance and flexibility of accelerating different data pipelines (graphs that can have multiple outputs and inputs), as a single library, that can be easily integrated into different deep learning training and inference applications.

Using DALI#

Note

DALI builds for NVIDIA® CUDA® 12 dynamically link the CUDA toolkit. To use DALI, install the latest CUDA toolkit.

To upgrade to DALI 1.42.0 from a previous version of DALI, follow the installation and usage information in the DALI User Guide.

Note

The internal DALI C++ API used for operator’s implementation, and the C++ API that enables using DALI as a library from native code, is not yet officially supported. Hence these APIs may change in the next release without advance notice.

Key Features and Enhancements#

This DALI release includes the following key features and enhancements:

  • Introduced more flexible execution in the DALI pipeline with the experimental_exec_dynamic flag (5593), (5528), (5620), (5602), (5529), and (5595).

    • Enabled support for GPU-to-CPU transfers in a pipeline.

    • Added support for accessing CPU metadata of GPU outputs (e.g. shape of GPU decoded images/videos).

  • Added support for CUDA 12.6U1 (5616).

  • Added an option to return the number of frames in the experimental video reader (5628).

Fixed Issues#

The following issues were fixed in this release:

  • Fixed handling of optical flow initialization failure (5624).

Breaking Changes#

  • There are no breaking changes in this DALI release.

Deprecated Features#

  • No issues were deprecated in this release.

Known Issues#

This DALI release includes the following known issues:

  • The following operators do not currently support checkpointing: experimental.readers.fits, experimental.decoders.video, experimental.inputs.video, and experimental.decoders.image_random_crop.

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.

    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.

  • The experimental VideoReaderDecoder does not support open GOP.

    It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.

  • In experimental debug and eager modes, the GPU external source is not properly synchronized with DALI internal streams.

    As a workaround, you can manually synchronize the device before returning the data from the callback.

  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:

    • privileged=yes in Extra Settings for AWS data points

    • --privileged or --security-opt seccomp=unconfined for bare Docker