DALI Release 0.28.0

The DALI 0.28.0 is not yet a major release, so the features, functionality, and performance might be limited.

Using DALI 0.28.0

To upgrade to DALI 0.28.0 from an older version of DALI, follow the installation and usage information in the DALI User Guide.

Note: The internal DALI C++ API used for operator’s implementation, and the C++ API that enables using DALI as a library from native code, is not yet officially supported. Hence these APIs may change in the next release without advance notice.

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • Affine transform generators, which are operators that generate scale, rotate, shear, translate, and crop transform matrices (#2309) and crop transform matrices.
      • You can use the transform.Combine operator to combine these matrices (#2317).
      • These transformations can be applied to the data by using the CoordTransform operator (#2317).
    • Added min, max, and clamp arithmetic operators (#2298).
    • Cat and Stack Operators to concatenate and stack Tensors for the CPU and the GPU (#2301, #2339, #2350).
    • The following reductions for the CPU and the GPU (#2342, #2379 #2395):
      • Min
      • Max
      • Sum
      • Mean
      • MeanSquare
      • RootMeanSquare
      • Std
      • Variance
    • The MFCC operator for the GPU (#2423).
  • The SelectMasks operator (#2381).
  • Add operators for batch reordering:
    • BatchPermutation for generating random reordering of the batch.
    • PermuteBatch, which reorders tensors in a batch, based on a list of provided indices (#2417).
  • Operator Compose: PyTorch-style API to compose the operators (#2393).
  • Improvements in existing operators:
    • Added SeekFrames to the audio decoder.

      The redesign allows you to decide the decoded data type at runtime (#2334).

    • Added the ability to handle UTF8 text to the NemoAsrReader (#2358).
    • Added explicit file list support to the FileReader (#2389).
    • Improvements in the COCO reader API (#2406).
      • The COCOReader API now outputs relative mask polygon coordinates when the option ratio is set to True (#2375).
    • RandomBBoxCrop now optionally outputs the indices of the bounding boxes that passed the centroid filter (#2374).
  • The late initialization of torch_gpu_device in the Pytorch plugin (#2411).
  • The automatic constant-to-input promotion (#2361) and generalized handling of operator arguments (#2393).
  • Added a MNIST example for DALI and PyTorch Lightning (#2360).
  • Added the last_batch_policy to the framework iterator (#2269).
  • New builds:
    • Python 3.9 is now enabled (#2333).
    • The DALI wheels for CUDA 11 are built with CUDA 11.1 and use Enhanced Compatibility to work with CUDA 11.0 (#2302, #2367, #2356,and #2413).
    • Added support for the SM_86 architecture (#2364).
    • Added the ability to cross-build Python wheels for Jetson (#2313).

Fixed Issues

This DALI release includes the following fixes.

  • Preserved the shape of pseudoscalars in the arithmetic operators (#2359).
  • Fixed the CPU-only mode for arithmetic operators (#2400).
  • Fixed the problem of the output outliving the pipeline in Python (#2341).
  • Fixed the lack of a correct layout setting in the VideoReader (#2346).
  • Fixed the uniform generator operator (#2352).

Breaking Changes

  • Python 3.5 is no longer supported by the official DALI wheels.

Deprecated Features

There are no breaking changes in this release.

Known Issues

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.

    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.

  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.

    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary that is shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)

  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker