TensorFlow Plugin API reference¶

class nvidia.dali.plugin.tf.DALIDataset(pipeline, output_dtypes=None, output_shapes=None, fail_on_device_mismatch=True, *, input_datasets=None, batch_size=1, num_threads=4, device_id=0, exec_separated=False, prefetch_queue_depth=2, cpu_prefetch_queue_depth=2, gpu_prefetch_queue_depth=2, dtypes=None, shapes=None)¶

Creates a DALIDataset compatible with tf.data.Dataset from a DALI pipeline. It supports TensorFlow 1.15 and 2.x family.

DALIDataset can be placed on CPU and GPU.

Please keep in mind that TensorFlow allocates almost all available device memory by default. This might cause errors in DALI due to insufficient memory. On how to change this behavior please look into the TensorFlow documentation, as it may differ based on your use case.

Warning

Most TensorFlow Datasets have only CPU variant. To process GPU-placed DALIDataset by other TensorFlow dataset you need to first copy it back to CPU using explicit tf.data.experimental.copy_to_device - roundtrip from CPU to GPU back to CPU would probably degrade performance a lot and is thus discouraged.

Additionally, it is advised to not use datasets like repeat() or similar after DALIDataset, which may interfere with DALI memory allocations and prefetching.

Parameters:

pipeline (nvidia.dali.Pipeline) – defining the data processing to be performed.
output_dtypes (tf.DType or tuple of tf.DType, default = None) – expected output types
output_shapes (tuple of shapes, optional, default = None) – expected output shapes. If provided, must match arity of the output_dtypes. When set to None, DALI will infer the shapes on its own. Individual shapes can be also set to None or contain None to indicate unknown dimensions. If specified must be compatible with shape returned from DALI Pipeline and with batch_size argument which will be the outermost dimension of returned tensors. In case of batch_size = 1 it can be omitted in the shape. DALI Dataset will try to match requested shape by squeezing 1-sized dimensions from shape obtained from Pipeline.
fail_on_device_mismatch (bool, optional, default = True) –
When set to True runtime check will be performed to ensure DALI device and TF device are both CPU or both GPU. In some contexts this check might be inaccurate. When set to

False will skip the check but print additional logs to check the devices. Keep in mind

that this may allow hidden GPU to CPU copies in the workflow and impact performance.
batch_size (int, optional, default = 1) – batch size of the pipeline.
num_threads (int, optional, default = 4) – number of CPU threads used by the pipeline.
device_id (int, optional, default = 0) – id of GPU used by the pipeline. A None value for this parameter means that DALI should not use GPU nor CUDA runtime. This limits the pipeline to only CPU operators but allows it to run on any CPU capable machine.
exec_separated (bool, optional, default = False) – Whether to execute the pipeline in a way that enables overlapping CPU and GPU computation, typically resulting in faster execution speed, but larger memory consumption.
prefetch_queue_depth (int, optional, default = 2) – depth of the executor queue. Deeper queue makes DALI more resistant to uneven execution time of each batch, but it also consumes more memory for internal buffers. Value will be used with exec_separated set to False.
cpu_prefetch_queue_depth (int, optional, default = 2) – depth of the executor cpu queue. Deeper queue makes DALI more resistant to uneven execution time of each batch, but it also consumes more memory for internal buffers. Value will be used with exec_separated set to True.
gpu_prefetch_queue_depth (int, optional, default = 2) – depth of the executor gpu queue. Deeper queue makes DALI more resistant to uneven execution time of each batch, but it also consumes more memory for internal buffers. Value will be used with exec_separated set to True.

Return type:

DALIDataset object based on DALI pipeline and compatible with tf.data.Dataset API.

nvidia.dali.plugin.tf.DALIIterator()¶

TF Plugin Wrapper

This operator works in the same way as DALI TensorFlow plugin, with the exception that it also accepts Pipeline objects as an input, which are serialized internally. For more information, see nvidia.dali.plugin.tf.DALIRawIterator().

nvidia.dali.plugin.tf.DALIIteratorWrapper(pipeline=None, serialized_pipeline=None, sparse=[], shapes=[], dtypes=[], batch_size=-1, prefetch_queue_depth=2, **kwargs)¶

TF Plugin Wrapper

This operator works in the same way as DALI TensorFlow plugin, with the exception that it also accepts Pipeline objects as an input, which are serialized internally. For more information, see nvidia.dali.plugin.tf.DALIRawIterator().

nvidia.dali.plugin.tf.DALIRawIterator()¶

DALI TensorFlow plugin

Creates a DALI pipeline from a serialized pipeline, obtained from serialized_pipeline argument. shapes must match the shape of the coresponding DALI Pipeline output tensor shape. dtypes must match the type of the coresponding DALI Pipeline output tensors type.

Parameters:

serialized_pipeline – A string.
shapes – A list of shapes (each a tf.TensorShape or list of ints) that has length >= 1.
dtypes – A list of tf.DTypes from: tf.half, tf.float32, tf.uint8, tf.int16, tf.int32, tf.int64 that has length >= 1.
num_threads – An optional int. Defaults to -1.
device_id – An optional int. Defaults to -1.
exec_separated – An optional bool. Defaults to False.
gpu_prefetch_queue_depth – An optional int. Defaults to 2.
cpu_prefetch_queue_depth – An optional int. Defaults to 2.
sparse – An optional list of bools. Defaults to [].
batch_size – An optional int. Defaults to -1.
enable_memory_stats – An optional bool. Defaults to False.
name – A name for the operation (optional).

Returns:

A list of Tensor objects of type dtypes.

Please keep in mind that TensorFlow allocates almost all available device memory by default. This might cause errors in DALI due to insufficient memory. On how to change this behavior please look into the TensorFlow documentation, as it may differ based on your use case.

nvidia.dali.plugin.tf.dataset_compatible_tensorflow()¶: Returns True if current TensorFlow version is compatible with DALIDataset.

nvidia.dali.plugin.tf.dataset_distributed_compatible_tensorflow()¶: Returns True if the tf.distribute APIs for current TensorFlow version are compatible with DALIDataset.

nvidia.dali.plugin.tf.dataset_inputs_compatible_tensorflow()¶: Returns True if the current TensorFlow version is compatible with experimental.DALIDatasetWithInputs and input Datasets can be used with DALI.

nvidia.dali.plugin.tf.dataset_options()¶

nvidia.dali.plugin.tf.serialize_pipeline(pipeline)¶

Experimental¶

nvidia.dali.plugin.tf.experimental.DALIDatasetWithInputs(pipeline, output_dtypes=None, output_shapes=None, fail_on_device_mismatch=True, *, input_datasets=None, batch_size=1, num_threads=4, device_id=0, exec_separated=False, prefetch_queue_depth=2, cpu_prefetch_queue_depth=2, gpu_prefetch_queue_depth=2, dtypes=None, shapes=None)¶

Experimental variant of DALIDataset. This dataset adds support for input tf.data.Datasets. Support for input tf.data.Datasets is available only for TensorFlow 2.4.1 and newer.

Input dataset specification

Each of the input datasets must be mapped to a external_source() operator that will represent the input to the DALI pipeline. In the pipeline the input is represented as the name parameter of external_source(). Input datasets must be provided as a mapping from that name to the dataset object via the input_datasets dictionary argument of DALIDatasetWithInputs.

Per-sample and batch mode

The input datasets can operate in per-sample mode or in batch mode.

In per-sample mode, the values produced by the source dataset are interpreted as individual samples. The batch dimension is absent. For example, a 640x480 RGB image would have a shape [480, 640, 3].

In batch mode, the tensors produced by the source dataset are interpreted as batches, with an additional outer dimension denoting the samples in the batch. For example, a batch of ten 640x480 RGB images would have a shape [10, 480, 640, 3].

In both cases (per-sample and batch mode), the layout of those inputs should be denoted as “HWC”.

In per-sample mode DALIDataset will query the inputs dataset batch_size-times to build a batch that would be fed into the DALI Pipeline. In per-sample mode, each sample produced by the input dataset can have a different shape, but the number of dimension and the layout must remain constant.

External Source with source parameter

This experimental DALIDataset accepts pipelines with external_source() nodes that have source parameter specified. In that case, the source will be converted automatically into appropriate tf.data.Dataset.from_generator dataset with correct placement and tf.data.experimental.copy_to_device directives.

Those nodes can also work in per-sample or in batch mode. The data in batch mode must be a dense, uniform tensor (each sample has the same dimensions). Only CPU data is accepted.

This allows TensorFlow DALIDataset to work with most Pipelines that have External Source source already specified.

Warning

This class is experimental and its API might change without notice.

Note

External source nodes with num_outputs specified to any number are not supported - this means that callbacks with multiple (tuple) outputs are not supported.

Note

External source cycle policy 'raise' is not supported - the dataset is not restartable.

Note

External source cuda_stream parameter is ignored - source is supposed to return CPU data and tf.data.Dataset inputs are handled internally.

Note

External source use_copy_kernel and blocking parameters are ignored.

Note

Setting no_copy on the external source nodes when defining the pipeline is considered a no-op when used with DALI Dataset. The no_copy option is handled internally and enabled automatically if possible.

Note

Parallel execution of external source callback provided via source is not supported. The callback is executed via TensorFlow tf.data.Dataset.from_generator - the parallel and prefetch_queue_depth parameters are ignored.

The operator adds additional parameters to the ones supported by the DALIDataset:

Parameters:

input_datasets (dict[str, tf.data.Dataset] or) –

dict[str, nvidia.dali.plugin.tf.experimental.Input] input datasets to the DALI Pipeline. It must be provided as a dictionary mapping from the names of the External Source nodes to the datasets objects or to the Input() wrapper.

For example:

{
    'tensor_input': tf.data.Dataset.from_tensors(tensor).repeat(),
    'generator_input': tf.data.Dataset.from_generator(some_generator)
}

can be passed as input_datasets for Pipeline like:

@pipeline_def
def external_source_pipe():
    input_0 = fn.external_source(name='tensor_input')
    input_1 = fn.external_source(name='generator_input')
    return fn.resize(input_1, resize_x=input_0)

Entries that use tf.data.Dataset directly, like:

{
    'input': tf.data.Dataset.from_tensors(tensor)
}

are equivalent to following specification using nvidia.dali.plugin.tf.experimental.Input:

{
    'input' : nvidia.dali.plugin.tf.experimental.Input(
                  dataset=tf.data.Dataset.from_tensors(tensor),
                  layout=None,
                  batch=False)
}

This means that inputs, specified as tf.data.Dataset directly, are considered sample inputs.

Warning

Input dataset must be placed on the same device as DALIDatasetWithInputs. If the input has different placement (for instance, input is placed on CPU, while DALIDatasetWithInputs is placed on GPU) the tf.data.experimental.copy_to_device with GPU argument must be first applied to input.

nvidia.dali.plugin.tf.experimental.Input(dataset, *, layout=None, batch=False)¶

Wrapper for an input passed to DALIDataset. Allows to pass additional options that can override some of the ones specified in the External Source node in the Python Pipeline object. Passing None indicates, that the value should be looked up in the pipeline definition.

Parameters:

dataset (tf.data.Dataset) – The dataset used as an input
layout (str, optional, default = None) –
Layout of the input. If None, the layout will be taken from the corresponding External Source node in the Python Pipeline object. If both are provided,

the layouts must be the same.

If neither is provided, empty layout will be used.
batch (bool, optional, default = False) –
Batch mode of a given input. If None, the batch mode will be taken from the corresponding External Source node in the Python Pipeline object.

If the batch = False, the input dataset is considered sample input.

If the batch = True, the input dataset is expected to return batches.