The DALI 0.11.0
is a beta release. Hence, for all the features, the functionality and performance will
likely be limited.
Using DALI 0.11.0 Beta
The DALI 0.11.0 can be used with the 19.07 NVIDIA GPU Cloud (NGC) optimized container for
MXNet, PyTorch, and TensorFlow. Also, the 19.07 container will be shipped with DALI
0.11.0.
To upgrade to DALI 0.11.0 beta from an older version of DALI, follow the installation
instructions in the DALI Quick Start Guide.
Refer to the DALI Developer Guide for usage details.
Note: The internal DALI C++ API used for operators implementation, and the C++ API that
enables using DALI as a library from native code, are not yet officially supported.
Hence these APIs may change in the next release without advance notice.
Key Features and Enhancements
This DALI release includes the following key features and
enhancements.
- Added the ability to provide more than one input to, and return more than one output from,
a Python-based operator.
- Extended the bounding box encoder for SSD to return offsets also. See nvidia.dali.ops.BoxEncoder.
- Added the ability to build DALI by mounting the source code to the Docker so
consecutive rebuilds are much faster.
- Added experimental support for aarch64 (ARM) platform. Note that
this support is added only for the native part—Python is not supported yet.
- Re-implemented the flip operator to increase its performance. See nvidia.dali.ops.Flip.
- Improved the performance of nvJPEG Decoder with new internal API to match the
previous implementation. See nvidia.dali.ops.nvJPEGDecoder.
Fixed Issues
This DALI release includes the following fixes.
- Fixed an issue wherein loading a plugin could break other operators that were
already registered.
- Fixed an interoperability issue with PyCUDA. Now DALI will not interfere with
PyCUDA’s CUDA context management.
Breaking API Changes
- CPU operators have moved from per-sample processing (pipeline process sample
after sample, all the way through the pipeline) to batch-procession (all samples
are processed by the first operator before moving to the next operator). This may
result in a small performance degradation for some use cases. However, in the long
term it will make some currently unavailable optimizations possible, for example:
operations that need to view the whole batch during the processing (like random
sample blending inside a batch).
- CropCastPermute is removed. CropMirrorNormalize should be used
instead (with the default values for normalization).
Deprecated Features
- Removed the prebuilt version of TensorFlow plugin for DALI. Now it is always necessary to
install a separate nvidia-dali-tf-plugin package. See Binary Installation.