DALI Release 0.11.0 Beta

The DALI 0.11.0 is a beta release. Hence, for all the features, the functionality and performance will likely be limited.

Using DALI 0.11.0 Beta

The DALI 0.11.0 can be used with the 19.07 NVIDIA GPU Cloud (NGC) optimized container for MXNet, PyTorch, and TensorFlow. Also, the 19.07 container will be shipped with DALI 0.11.0.

To upgrade to DALI 0.11.0 beta from an older version of DALI, follow the installation instructions in the DALI Quick Start Guide.

Refer to the DALI Developer Guide for usage details.

Note: The internal DALI C++ API used for operators implementation, and the C++ API that enables using DALI as a library from native code, are not yet officially supported. Hence these APIs may change in the next release without advance notice.

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added the ability to provide more than one input to, and return more than one output from, a Python-based operator.
  • Extended the bounding box encoder for SSD to return offsets also. See nvidia.dali.ops.BoxEncoder.
  • Added the ability to build DALI by mounting the source code to the Docker so consecutive rebuilds are much faster.
  • Added experimental support for aarch64 (ARM) platform. Note that this support is added only for the native part—Python is not supported yet.
  • Re-implemented the flip operator to increase its performance. See nvidia.dali.ops.Flip.
  • Improved the performance of nvJPEG Decoder with new internal API to match the previous implementation. See nvidia.dali.ops.nvJPEGDecoder.

Fixed Issues

This DALI release includes the following fixes.

  • Fixed an issue wherein loading a plugin could break other operators that were already registered.
  • Fixed an interoperability issue with PyCUDA. Now DALI will not interfere with PyCUDA’s CUDA context management.

Breaking API Changes

  • CPU operators have moved from per-sample processing (pipeline process sample after sample, all the way through the pipeline) to batch-procession (all samples are processed by the first operator before moving to the next operator). This may result in a small performance degradation for some use cases. However, in the long term it will make some currently unavailable optimizations possible, for example: operations that need to view the whole batch during the processing (like random sample blending inside a batch).
  • CropCastPermute is removed. CropMirrorNormalize should be used instead (with the default values for normalization).

Deprecated Features

  • Removed the prebuilt version of TensorFlow plugin for DALI. Now it is always necessary to install a separate nvidia-dali-tf-plugin package. See Binary Installation.

Known Issues

  • The new video reader operator requires NVIDIA VIDEO CODEC SDK support in the platform. Prior to 19.01, the NVIDIA GPU Cloud (NGC) optimized containers lack this functionality in the default configuration. To enable the functionality, run the container with the "video" capability enabled, as shown below:
    -e "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video"
  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • DALI TensorFlow plugin may be not compatible with TensorFlow 1.14.0 release. The DALI TensorFlow plugin requires that the gcc compiler that matches the one used to build TensorFlow (gcc 4.8.4 or gcc 4.8.5, depending on the particular version) be present on the system.