DALI Release 0.24.0

The DALI 0.24.0 is not yet a major release, so the features, functionality, and performance might be limited.

Using DALI 0.24.0

To upgrade to DALI 0.24.0 from an older version of DALI, follow the installation and usage information in the DALI User Guide.

Note: The internal DALI C++ API used for operator’s implementation, and the C++ API that enables using DALI as a library from native code, is not yet officially supported. Hence these APIs may change in the next release without advance notice.

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • The DALI package name now adds -cuda110 and -cuda100 suffixes to indicate the CUDA version and allows the hosting of all packages under one pip index.

    This is important only for installation, and the DALI module in Python is still `nvidia.dali` regardless of the CUDA version. Refer to the Installation section in the DALI User Guide for more information.

  • New Operators:
    • Preemphasis (#2025 )
    • GaussianBlur CPU (#1987, #2009, and #2038

    • Operator Improvements:
      • Extended the Slice and Crop family of operators with out-of-bounds policies, which provides support for padding and trimming to existing shape (#2000, #2056, #2044).
      • Moved the memory hint allocation in the Resize to the build phase (#2033).
      • Optimized the Transpose GPU operator to improve the performance on non-uniform data batches (#2011, #2032).
    • Support for GPU data input data in the ExternalSource operator (#1997).

      • Added built-in support for GPU CuPy and PyTorch tensors in ExternalSource (#2050).
      • Added the ability to provide an external stream, stream 0, or automatic stream selection for GPU data access (#2050).
      • Added DLPack input support to the ExternalSource operator (#2023).
    • Add an ability to dump info about operator output buffer size (#2039)
    • Improved error checking with external libraries (#2062, #2063).

Fixed Issues

This DALI release includes the following fixes.

  • Fixed the performance regression for a heterogeneous batch in the Transpose GPU.

  • Fixed the global state problems when there is more than one Transpose GPU operator (#2032).

Breaking Changes

Empty for now.

Deprecated Features

  • Added a deprecation warning for Python 3.5 (#2021).
  • Deprecated `output_dtype` and use `dtype` (#2051).
  • Added an argument deprecation mechanism and deprecated "image_type" in Crop, Slice, and CropMirrorNormalize (#2061).

Known Issues

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.

    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.

  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.

    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary that is shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)

  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker