# Supported operations¶

• CPU operator means that the operator can be scheduled on the CPU. Outputs of CPU operators may be used as regular inputs and to provide per-sample parameters for other operators through tensor arguments.

• GPU operator means that the operator can be scheduled on the GPU. Their outputs may only be used as regular inputs for other GPU operators and pipeline outputs.

• Mixed operator means that the operator accepts input on the CPU while producing the output on the GPU.

• Sequences means that the operator can work (produce or accept as an input) sequence (video like) kind of input.

• Volumetric means that the operator supports 3D data processing.

## How to read this doc¶

DALI Operators are used in two steps - creating the parametrized Operator instance using its constructor and later invoking its __call__ operator in define_graph() method of the Pipeline.

Documentation of every DALI Operator lists Keyword Arguments supported by the class constructor.

The documentation for __call__ operator lists the positional arguments (Parameters) and additional Keyword Arguments. __call__ should be only used in the define_graph(). The inputs to the __call__ operator represent batches of Tensors (TensorLists) processed by DALI, which are returned by other DALI Operators.

The Keyword Arguments listed in __call__ operator accept TensorList argument inputs. They should be produced by other ‘cpu’ Operators.

Note

The names of positional arguments for __call__ operator (Parameters) are only for the documentation purposes and should not be used as keyword arguments.

Note

Some Keyword Arguments can be listed twice - once for class constructor and once for __call__ operator. This means they can be parametrized during operator construction with some Python values or driven by output of other operator when running the pipeline.

## Support table¶

Below table lists all available operators and devices they can operate on.

Operator name

CPU

GPU

Mixed

Sequences

Volumetric

AudioDecoder

BbFlip

BBoxPaste

BoxEncoder

Brightness

BrightnessContrast

Caffe2Reader

CaffeReader

Cast

COCOReader

CoinFlip

ColorSpaceConversion

ColorTwist

Constant

Contrast

CoordFlip

Copy

Crop

CropMirrorNormalize

DLTensorPythonFunction

DumpImage

ElementExtract

Erase

ExternalSource

FastResizeCropMirror

FileReader

Flip

GaussianBlur

Hsv

Hue

ImageDecoder

ImageDecoderCrop

ImageDecoderRandomCrop

ImageDecoderSlice

Jitter

LookupTable

MelFilterBank

MFCC

MXNetReader

NonsilentRegion

NormalDistribution

Normalize

NumpyReader

OldColorTwist

OneHot

OpticalFlow

Pad

Paste

PowerSpectrum

PreemphasisFilter

PythonFunction

RandomBBoxCrop

RandomResizedCrop

Reinterpret

Reshape

Resize

ResizeCropMirror

Rotate

Saturation

SequenceReader

SequenceRearrange

Shapes

Slice

Spectrogram

Sphere

SSDRandomCrop

TFRecordReader

ToDecibels

TorchPythonFunction

Transpose

Uniform

VideoReader

WarpAffine

Water

## Operators documentation¶

class nvidia.dali.ops.AudioDecoder(**kwargs)

Decode audio data. This operator is a generic way of handling encoded data in DALI. It supports most of well-known audio formats (wav, flac, ogg).

This operator produces two outputs:

• output[0]: batch of decoded data

• output[1]: batch of sampling rates [Hz]

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• downmix (bool, optional, default = False) – If True, downmix all input channels to mono. If downmixing is turned on, decoder will produce always 1-D output

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Type of the output data. Supports types: INT16, INT32, FLOAT

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• quality (float, optional, default = 50.0) – Resampling quality, 0 is lowest, 100 is highest. 0 corresponds to 3 lobes of the sinc filter; 50 gives 16 lobes and 100 gives 64 lobes.

• sample_rate (float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments

sample_rate (TensorList of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

class nvidia.dali.ops.BBoxPaste(**kwargs)

Transforms bounding boxes so that they are in the same place in the image after pasting it onto a larger canvas.

Corner coordinates:

(x', y') = (x/ratio + paste_x', y/ratio + paste_y')


Box sizes:

(w', h') = (w/ratio, h/ratio)


Where:

paste_x' = paste_x * (ratio - 1)/ratio
paste_y' = paste_y * (ratio - 1)/ratio


Paste coordinates are normalized so that (0,0) aligns the image to top-left of the canvas and (1,1) aligns it to bottom-right.

Supported backends
• ‘cpu’

Keyword Arguments
• ratio (float) – Ratio of canvas size to input size, must be > 1.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

• paste_x (float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• ratio (TensorList of float) – Ratio of canvas size to input size, must be > 1.

• paste_x (TensorList of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (TensorList of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

class nvidia.dali.ops.BbFlip(**kwargs)

Operator for horizontal or vertical flip (mirror) of bounding boxes. Input: Bounding box coordinates; in either [x, y, w, h] or [left, top, right, bottom] format. All coordinates are in the image coordinate system (i.e. 0.0-1.0)

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• horizontal (int, optional, default = 1) – Flip horizontal dimension.

• ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• vertical (int, optional, default = 0) – Flip vertical dimension.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• horizontal (TensorList of int, optional, default = 1) – Flip horizontal dimension.

• vertical (TensorList of int, optional, default = 0) – Flip vertical dimension.

class nvidia.dali.ops.BoxEncoder(**kwargs)

Encodes input bounding boxes and labels using set of default boxes (anchors) passed during op construction. Follows algorithm described in https://arxiv.org/abs/1512.02325 and implemented in https://github.com/mlperf/training/tree/master/single_stage_detector/ssd Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b], and Labels containing the corresponding label for each bounding box. Results are two tensors: EncodedBBoxes containing M encoded bounding boxes as [l,t,r,b], where M is number of anchors and EncodedLabels containing the corresponding label for each encoded box.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• anchors (float or list of float) – Anchors to be used for encoding. List of floats in ltrb format.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• criteria (float, optional, default = 0.5) – Threshold IOU for matching bounding boxes with anchors. Value between 0 and 1.

• means (float or list of float, optional, default = [0.0, 0.0, 0.0, 0.0]) – [x y w h] means for offset normalization.

• offset (bool, optional, default = False) – Returns normalized offsets ((encoded_bboxes*scale - anchors*scale) - mean) / stds in EncodedBBoxes using std, mean and scale arguments (default values are transparent).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• scale (float, optional, default = 1.0) – Rescale the box and anchors values before offset calculation (e.g. to get back to absolute values).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• stds (float or list of float, optional, default = [1.0, 1.0, 1.0, 1.0]) – [x y w h] standard deviations for offset normalization.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.BoxEncoder() for full documentation.

class nvidia.dali.ops.Brightness(**kwargs)

Changes the brightness of an image

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

brightness (TensorList of float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

class nvidia.dali.ops.BrightnessContrast(**kwargs)

Adjust the brightness and contrast of the image according to the formula:

out = brightness_shift * output_range + brightness * (grey + contrast * (in - grey))


where output_range is 1 for float outputs or the maximum positive value for integral types; grey denotes the value of 0.5 for float, 128 for uint8, 16384 for int16, etc.

Additionally, this operator can change the type of data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) – Brightness mutliplier; 1.0 is neutral.

• brightness_shift (float, optional, default = 0.0) – Brightness shift; 0 is neutral; for signed types, 1.0 means maximum positive value that can be represented by the type.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) – Set the contrast multiplier; 1.0 is neutral, 0.0 produces uniform grey.

• contrast_center (float, optional, default = 0.5) – Sets the instensity level that is unaffected by contrast - this is the value which all pixels assume when contrast is zero. When not set, the half of the input types’s positive range (or 0.5 for float) is used.

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type; if not set, the input type is used.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• brightness (TensorList of float, optional, default = 1.0) – Brightness mutliplier; 1.0 is neutral.

• brightness_shift (TensorList of float, optional, default = 0.0) – Brightness shift; 0 is neutral; for signed types, 1.0 means maximum positive value that can be represented by the type.

• contrast (TensorList of float, optional, default = 1.0) – Set the contrast multiplier; 1.0 is neutral, 0.0 produces uniform grey.

class nvidia.dali.ops.COCOReader(**kwargs)

Read data from a COCO dataset composed of directory with images and an annotation files. For each image, with m bboxes, returns its bboxes as (m,4) Tensor (m * [x, y, w, h] or m * [left, top, right, bottom]) and labels as (m,1) Tensor (m * category_id).

Supported backends
• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing data files.

• annotations_file (str, optional, default = '') – List of paths to the JSON annotations files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• dump_meta_files (bool, optional, default = False) – If true, operator will dump meta files in folder provided with dump_meta_files_path.

• dump_meta_files_path (str, optional, default = '') – Path to directory for saving meta files containing preprocessed COCO annotations.

• file_list (str, optional, default = '') – Path to the file with a list of pairs file id (leave empty to traverse the file_root directory to obtain files and labels)

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• ltrb (bool, optional, default = False) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height].

• masks (bool, optional, default = False) –

If true, segmentation masks are read and returned as polygons. Each mask can be one or more polygons. A polygon is a list of points (2 floats). For a given sample, the polygons are represented by two tensors:

• masks_coords-> list of (x,y) coordinates

One mask can have one or more masks_meta having the same mask_idx, which means that the mask for that given index consists of several polygons). start_idx indicates the index of the first coords in masks_coords. Currently skips objects with iscrowd=1 annotations (RLE masks, not suitable for instance segmentation).

• meta_files_path (str, optional, default = '') – Path to directory with meta files containing preprocessed COCO annotations.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• ratio (bool, optional, default = False) – If true, bboxes returned values as expressed as ratio w.r.t. to the image width and height.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• save_img_ids (bool, optional, default = False) – If true, image IDs will also be returned.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch.

• size_threshold (float, optional, default = 0.1) – If width or height of a bounding box representing an instance of an object is under this value, object will be skipped during reading. It is represented as absolute value.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.Caffe2Reader(**kwargs)

Read sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).

Supported backends
• ‘cpu’

Keyword Arguments
• path (str or list of str) – List of paths to Caffe2 LMDB directories.

• additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors provided for each sample.

• bbox (bool, optional, default = False) – Denotes if bounding-box information is present.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• image_available (bool, optional, default = True) – If image is available at all in this LMDB.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• label_type (int, optional, default = 0) –

Type of label stored in dataset.

• 0 = SINGLE_LABEL : single integer label for multi-class classification

• 1 = MULTI_LABEL_SPARSE : sparse active label indices for multi-label classification

• 2 = MULTI_LABEL_DENSE : dense label embedding vector for label embedding regression

• 3 = MULTI_LABEL_WEIGHTED_SPARSE : sparse active label indices with per-label weights for multi-label classification.

• 4 = NO_LABEL : no label is available.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_labels (int, optional, default = 1) – Number of classes in dataset. Required when sparse labels are used.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.CaffeReader(**kwargs)

Read (Image, label) pairs from a Caffe LMDB.

Supported backends
• ‘cpu’

Keyword Arguments
• path (str or list of str) – List of paths to Caffe LMDB directories.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• image_available (bool, optional, default = True) – If image is available at all in this LMDB.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• label_available (bool, optional, default = True) – If label is available at all.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.Cast(**kwargs)

Cast tensor to a different type.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• dtype (nvidia.dali.types.DALIDataType) – Output data type.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.CoinFlip(**kwargs)

Produce tensor filled with 0s and 1s - results of random coin flip, usable as an argument for select ops.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• probability (float, optional, default = 0.5) – Probability of returning 1.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.ColorSpaceConversion(**kwargs)

Converts between various image color models.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• image_type (nvidia.dali.types.DALIImageType) – The color space of the input image

• output_type (nvidia.dali.types.DALIImageType) – The color space of the output image

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.ColorTwist(**kwargs)

Combination of hue, saturation, contrast and brightness.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• hue (float, optional, default = 0.0) – Hue change, in degrees.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• brightness (TensorList of float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• contrast (TensorList of float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• hue (TensorList of float, optional, default = 0.0) – Hue change, in degrees.

• saturation (TensorList of float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

class nvidia.dali.ops.Constant(**kwargs)

Produces a batch of constant tensors.

The floating point input data should be placed in fdata argument and integer data in idata. The data is a flat vector of values or a single scalar. The data is then reshaped according to the shape argument. If the data is scalar, it will be broadcast to fill the entire shape.

The operator only performs meaningful work at first invocation - subsequent calls will return a reference to the same memory.

The operator can be automatically instantiated in Python with a call to types.Constant(value, dtype, shape, layout). The value can be a scalar, a tuple, a list or a numpy array, in which case the shape and dtype (if not explicitly overridden), will be taken from that array.

64-bit integer and double precision arrays are not supported and will be silently downgraded to 32-bit.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Type of the output data. If not set, the output is float if fdata argument is used and int if idata is used.

• fdata (float or list of float, optional, default = []) – Contents of the constant produced (for floating point types). fdata and idata are mutually exclusive and one of them is required

• idata (int or list of int, optional, default = []) – Contents of the constant produced (for integer types). fdata and idata are mutually exclusive and one of them is required

• layout (layout str, optional, default = ‘’) – Layout info. If set and not empty, the layout must match the dimensionality of the output.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – The desired shape of the output. If not set, the data is assumed to be 1D

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.Contrast(**kwargs)

Changes the color contrast of the image.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments

contrast (TensorList of float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

class nvidia.dali.ops.CoordFlip(**kwargs)

Transforms coordinates so that they are flipped (point reflected) with respect to a center point.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• center_x (float, optional, default = 0.5) – Flip center on horizontal dimension.

• center_y (float, optional, default = 0.5) – Flip center on vertical dimension.

• center_z (float, optional, default = 0.5) – Flip center on depthwise dimension.

• flip_x (int, optional, default = 1) – Flip horizontal (x) dimension.

• flip_y (int, optional, default = 0) – Flip vertical (y) dimension.

• flip_z (int, optional, default = 0) – Flip depthwise (z) dimension.

• layout (layout str, optional, default = ‘’) –

Determines the layout of the coordinates. Possible values are:

• x (horizontal position),

• y (vertical position),

• z (depthwise position),

Note: If left empty, "x", "xy" or "xyz" will be assumed, depending on the number of dimensions.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• center_x (TensorList of float, optional, default = 0.5) – Flip center on horizontal dimension.

• center_y (TensorList of float, optional, default = 0.5) – Flip center on vertical dimension.

• center_z (TensorList of float, optional, default = 0.5) – Flip center on depthwise dimension.

• flip_x (TensorList of int, optional, default = 1) – Flip horizontal (x) dimension.

• flip_y (TensorList of int, optional, default = 0) – Flip vertical (y) dimension.

• flip_z (TensorList of int, optional, default = 0) – Flip depthwise (z) dimension.

class nvidia.dali.ops.Copy(**kwargs)

Make a copy of the input tensor.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Crop(**kwargs)

Crops image with a given window dimensions and window position (upper left corner).

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• fill_values (float or list of float, optional, default = [0.0]) – Determines padding values, only relevant if out_of_bounds_policy is set to “pad”. If a scalar is provided, it will be used for all the channels. If multiple values are given, there should be as many values as channels (extent of dimension ‘C’ in the layout) in the output slice.

• out_of_bounds_policy (str, optional, default = 'error') –

Determines the policy when slicing out of bounds of the input. Supported values are:

• ”error” (default) : Attempting to slice outside of the bounds of the image will produce an error.

• ”pad”: The input will be padded as needed with zeros or any other value specified with fill_values argument.

• ”trim_to_shape”: The slice window will be cut to the bounds of the input.)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• crop_d (TensorList of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (TensorList of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (TensorList of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (TensorList of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

class nvidia.dali.ops.CropMirrorNormalize(**kwargs)

Perform fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting. Normalization takes input image and produces output using formula:

output = (input - mean) / std


Note that not providing any crop argument will result into mirroring and normalization only.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type. Supported types: FLOAT and FLOAT16

• fill_values (float or list of float, optional, default = [0.0]) – Determines padding values, only relevant if out_of_bounds_policy is set to “pad”. If a scalar is provided, it will be used for all the channels. If multiple values are given, there should be as many values as channels (extent of dimension ‘C’ in the layout) in the output slice.

• mean (float or list of float, optional, default = [0.0]) – Mean pixel values for image normalization.

• mirror (int, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.

• out_of_bounds_policy (str, optional, default = 'error') –

Determines the policy when slicing out of bounds of the input. Supported values are:

• ”error” (default) : Attempting to slice outside of the bounds of the image will produce an error.

• ”pad”: The input will be padded as needed with zeros or any other value specified with fill_values argument.

• ”trim_to_shape”: The slice window will be cut to the bounds of the input.)

• output_layout (layout str, optional, default = ‘CHW’) – Output tensor data layout

• pad_output (bool, optional, default = False) – Whether to pad the output to number of channels being a power of 2.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• std (float or list of float, optional, default = [1.0]) – Standard deviation values for image normalization.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• crop_d (TensorList of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (TensorList of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (TensorList of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (TensorList of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (TensorList of int, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.

class nvidia.dali.ops.DLTensorPythonFunction(function, num_outputs=1, device='cpu', synchronize_stream=True, batch_processing=True, **kwargs)

Execute a python function that operates on DLPack tensors. The function should not modify input tensors.

In case of the GPU operator it is a user’s responsibility to synchronize the device code with DALI. This can be accomplished by synchronizing DALI’s work before the operator call with the synchronize_stream flag (true by default) and then making sure the scheduled device tasks are finished within the operator call. Alternatively, the gpu code can be done on the DALI’s stream which may be determined by calling the current_dali_stream() function. In this case, the synchronize_stream flag can be set to false.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• function (object) – Function object.

• batch_processing (bool, optional, default = True) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• synchronize_stream (bool, optional, default = True) – Make DALI synchronize its CUDA stream before calling the python function. Should be set to false only if the called function schedules the device job to the stream used by DALI.

class nvidia.dali.ops.DumpImage(**kwargs)

Save images in batch to disk in PPM format. Useful for debugging.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• input_layout (layout str, optional, default = ‘HWC’) – Layout of input images.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• suffix (str, optional, default = '') – Suffix to be added to output file names.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.ElementExtract(**kwargs)

Extracts one or more elements from input.

This operator expects sequence inputs.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• element_map (int or list of int) – Indices of extracted elements

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Erase(**kwargs)

Erases one or multiple regions from the image.

The region is specified by an anchor (starting point) and a shape (dimensions). Only the relevant dimensions are specified. Non-specified dimensions are treated as if the whole range of the axis was provided. To specify multiple regions, anchor and shape represent multiple points consecutively (e.g. anchor = (y0, x0, y1, x1, …) and shape = (h0, w0, h1, w1, …)). The arguments anchor and shape are interpreted according to the value of the argument axis_names (or alternatively the value of the argument axes). If no axis_names/axes arguments are provided, all the dimensions except ‘C’ (channels) must be specified.

Example 1:

anchor = (10, 20), shape = (190, 200), axis_names = “HW”, fill_value = 0

input: layout = “HWC”, shape = (300, 300, 3)

The erase region covers the range from 10 to 200 in the vertical dimension (heigth) and goes from 20 to 220 in the horizontal dimension (width). The range for the channel dimension goes from 0 to 3, as it was not specified. That is:

output[y, x, c] = 0               if 20 <= x < 220 and 10 <= y < 200
output[y, x, c] = input[y, x, c]  otherwise


Example 2:

anchor = (10, 250), shape = (20, 30), axis_names = “W”, fill_value = (118, 185, 0)

input: layout = “HWC”, shape = (300, 300, 3)

Two erase regions are provided, covering two vertical bands ranging from x=(10, 30) and x=(250, 280) respectively. Each pixel in the erased regions is filled with a multi-channel value (118, 185, 0). That is:

output[y, x, :] = (118, 185, 0)   if 10 <= x < 30 or 250 <= x < 280
output[y, x, :] = input[y, x, :]  otherwise


Example 3:

anchor = (0.15, 0.15), shape = (0.3, 0.3), axis_names = “HW”, fill_value = 100, normalized = True

input: layout = “HWC”, shape = (300, 300, 3)

One erase region with normalized coordinates in the height and width dimensions is provided. A single fill value is provided for all the channels. The coordinates can be transformed to the absolute by multiplying by the input shape. That is:

output[y, x, c] = 100             if 0.15 * 300 <= x < (0.3 + 0.15) * 300 and 0.15 * 300 <= y < (0.3 + 0.15) * 300
output[y, x, c] = input[y, x, c]  otherwise


Example 4:

anchor = (0.15, 0.15), shape = (20, 30), normalized_anchor = True, normalized_shape = False

input: layout = “HWC”, shape = (300, 300, 3)

One erase region with an anchor specified in normalized coordinates and shape in absolute coordinates. Since no axis_names is provided, the anchor and shape must contain all dimensions except ‘C’ (channels)”:

output[y, x, c] = 0               if 0.15 * 300 <= x < (0.15 * 300) + 20 and (0.15 * 300) <= y < (0.15 * 300) + 30
output[y, x, c] = input[y, x, c]  otherwise


This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• anchor (float or list of float, optional, default = []) – Coordinates for anchor or starting point of the erase region. Only the coordinates of the relevant dimensions (specified by axis_names or axes) should be provided.

• axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for anchor and shape arguments, as dimension indexes. For instance, axes=(1, 0) means the coordinates in anchor and shape refer to axes 1 and 0, in that particular order

• axis_names (str, optional, default = 'HW') – Order of dimensions used for anchor and shape arguments, as described in the layout. For instance, axis_names=”HW” means that the coordinates in anchor and shape refer to dimensions H (heigth) and W (width), in that particular order. If provided, axis_names takes higher priority than axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• centered_anchor (bool, optional, default = False) – If True, the anchors refer to the center of the region instead of the top-left corner, resulting in centered erased regions at the specified anchor.

• fill_value (float or list of float, optional, default = [0.0]) – Value to fill the erased region. Might be specified as a single value (e.g. 0) or a multi-channel value (e.g. (200, 210, 220)). If a multi-channel fill value is provided, the input layout should contain a channel dimension ‘C’

• normalized (bool, optional, default = False) – Whether or not the anchor and shape arguments should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates. It is mutually exclusive with providing a value for normalized_shape and normalized_anchor separately.

• normalized_anchor (bool, optional, default = False) – Whether or not the anchor argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates. It is mutually exclusive with providing a value for normalized

• normalized_shape (bool, optional, default = False) – Whether or not the shape argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates. It is mutually exclusive with providing a value for normalized.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (float or list of float, optional, default = []) – Values for shape or dimensions of the erase region. Only the coordinates of the relevant dimensions (specified by axis_names or axes) should be provided.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• anchor (TensorList of float, optional, default = []) – Coordinates for anchor or starting point of the erase region. Only the coordinates of the relevant dimensions (specified by axis_names or axes) should be provided.

• shape (TensorList of float, optional, default = []) – Values for shape or dimensions of the erase region. Only the coordinates of the relevant dimensions (specified by axis_names or axes) should be provided.

class nvidia.dali.ops.ExternalSource(source=None, num_outputs=None, *, cycle=None, layout=None, name=None, device='cpu', cuda_stream=None, **kwargs)

ExternalSource is a special operator which can provide data to DALI pipeline from Python using several methods.

The simplest and preferred way is to specify a source, which may be a callable or iterable.

Note

To return a batch of copies of the same tensor, use nvidia.dali.types.Constant(), which is more performant.

Parameters
• source (callable or iterable) – The source of the data. The source is polled for data (via a call source() or next(source) whenever the pipeline needs input for the next iteration. The source can supply one or more data batches, depending on the value of num_outputs. If num_outputs is not set, the source is expected to return a single batch. If it’s specified, the data is expected to a be tuple or list where each element corresponds to respective return value of the external_source. If the source is a callable and has a positional argument, it is assumed to be the current iteration number and consecutive calls will be source(0), source(1), etc. If the source is a generator function, it is invoked and treated as an iterable - however, unlike a generator, it can be used with cycle, in which case the function will be called again when the generator reaches end of iteration. In the case of the GPU input, it is the user responsibility to modify the provided GPU memory content only using provided stream (DALI schedules a copy on it and all work is properly queued). If no stream is provided, DALI will use a default, with best-effort approach at correctness (see cuda_stream argument documentation for details).

• num_outputs (int, optional) – If specified, denotes the number of TensorLists produced by the source function

Keyword Arguments
• cycle (bool) – If True, the source will be wrapped. Otherwise, StopIteration will be raised when end of data is reached. This flag requires that source is either a collection, i.e. an iterable object where iter(source) will return a fresh iterator on each call or a generator function. In the latter case, the generator function will be called again when more data is requested than was yielded by the function.

• name (str, optional) – The name of the data node - used when feeding the data in iter_setup; can be omitted if the data is provided by source.

• layout (layout str or list/tuple thereof) – If provided, sets the layout of the data. When num_outputs > 1, layout can be a list containing a distinct layout for each output. If the list has fewer elements than num_outputs, only the first outputs have the layout set, the reset have it cleared.

• cuda_stream (optional, cudaStream_t or an object convertible to cudaStream_t, e.g. cupy.cuda.Stream, torch.cuda.Stream) –

The CUDA stream, which is going to be used for copying data to GPU or from a GPU source. If not set, best effort will be taken to maintain correctness - i.e. if the data is provided as a tensor/array from a recognized library (CuPy, PyTorch), the library’s current stream is used. This should work in typical scenarios, but advanced use cases (and code using unsupported libraries) may still need to supply the stream handle explicitly.

Special values:
• 0 - use default CUDA stream

• -1 - use DALI’s internal stream

If internal stream is used, the call to feed_input will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.

__call__(*, source=None, cycle=None, name=None, layout=None, cuda_stream=None, **kwargs)
Parameters
• source (callable or iterable) – The source of the data. The source is polled for data (via a call source() or next(source) whenever the pipeline needs input for the next iteration. The source can supply one or more data batches, depending on the value of num_outputs. If num_outputs is not set, the source is expected to return a single batch. If it’s specified, the data is expected to a be tuple or list where each element corresponds to respective return value of the external_source. If the source is a callable and has a positional argument, it is assumed to be the current iteration number and consecutive calls will be source(0), source(1), etc. If the source is a generator function, it is invoked and treated as an iterable - however, unlike a generator, it can be used with cycle, in which case the function will be called again when the generator reaches end of iteration. In the case of the GPU input, it is the user responsibility to modify the provided GPU memory content only using provided stream (DALI schedules a copy on it and all work is properly queued). If no stream is provided, DALI will use a default, with best-effort approach at correctness (see cuda_stream argument documentation for details).

• num_outputs (int, optional) – If specified, denotes the number of TensorLists produced by the source function

Keyword Arguments
• cycle (bool) – If True, the source will be wrapped. Otherwise, StopIteration will be raised when end of data is reached. This flag requires that source is either a collection, i.e. an iterable object where iter(source) will return a fresh iterator on each call or a generator function. In the latter case, the generator function will be called again when more data is requested than was yielded by the function.

• name (str, optional) – The name of the data node - used when feeding the data in iter_setup; can be omitted if the data is provided by source.

• layout (layout str or list/tuple thereof) – If provided, sets the layout of the data. When num_outputs > 1, layout can be a list containing a distinct layout for each output. If the list has fewer elements than num_outputs, only the first outputs have the layout set, the reset have it cleared.

• cuda_stream (optional, cudaStream_t or an object convertible to cudaStream_t, e.g. cupy.cuda.Stream, torch.cuda.Stream) –

The CUDA stream, which is going to be used for copying data to GPU or from a GPU source. If not set, best effort will be taken to maintain correctness - i.e. if the data is provided as a tensor/array from a recognized library (CuPy, PyTorch), the library’s current stream is used. This should work in typical scenarios, but advanced use cases (and code using unsupported libraries) may still need to supply the stream handle explicitly.

Special values:
• 0 - use default CUDA stream

• -1 - use DALI’s internal stream

If internal stream is used, the call to feed_input will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.

class nvidia.dali.ops.FastResizeCropMirror(**kwargs)

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work performed.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• fill_values (float or list of float, optional, default = [0.0]) – Determines padding values, only relevant if out_of_bounds_policy is set to “pad”. If a scalar is provided, it will be used for all the channels. If multiple values are given, there should be as many values as channels (extent of dimension ‘C’ in the layout) in the output slice.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• mirror (int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• out_of_bounds_policy (str, optional, default = 'error') –

Determines the policy when slicing out of bounds of the input. Supported values are:

• ”error” (default) : Attempting to slice outside of the bounds of the image will produce an error.

• ”pad”: The input will be padded as needed with zeros or any other value specified with fill_values argument.

• ”trim_to_shape”: The slice window will be cut to the bounds of the input.)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• crop_d (TensorList of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (TensorList of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (TensorList of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (TensorList of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (TensorList of int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• resize_longer (TensorList of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (TensorList of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (TensorList of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (TensorList of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.FileReader(**kwargs)

Read (Image, label) pairs from a directory

Supported backends
• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing data files. FileReader supports flat directory structure. file_root directory should contain directories with images in them. To obtain labels FileReader sorts directories in file_root in alphabetical order and takes an index in this order as a class label.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• file_list (str, optional, default = '') – Path to a text file containing rows of filename label pairs, where the filenames are relative to file_root. If left empty, file_root is traversed for subdirectories (only those at one level deep from file_root) containing files associated with the same label. When traversing subdirectories, labels are assigned consecutive numbers.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch. It is exclusive with stick_to_shard and random_shuffle.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.Flip(**kwargs)

Flip selected dimensions (horizontal, vertical, depthwise).

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• depthwise (int, optional, default = 0) – Flip depthwise dimension.

• horizontal (int, optional, default = 1) – Flip horizontal dimension.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• vertical (int, optional, default = 0) – Flip vertical dimension.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('FDHWC', 'FHWC', 'DHWC', 'HWC', 'FCDHW', 'FCHW', 'CDHW', 'CHW')) – Input to the operator.

Keyword Arguments
• depthwise (TensorList of int, optional, default = 0) – Flip depthwise dimension.

• horizontal (TensorList of int, optional, default = 1) – Flip horizontal dimension.

• vertical (TensorList of int, optional, default = 0) – Flip vertical dimension.

class nvidia.dali.ops.GaussianBlur(**kwargs)

Apply Gaussian Blur to the input.

User can specify sigma, kernel window size or both. If only the sigma is provided, the radius of kernel is calculated as ceil(3 * sigma), thus the kernel window size is 2 * ceil(3 * sigma) + 1.

If only the kernel window size is provided, the sigma is calculated using the following formula:

radius = (window_size - 1) / 2
sigma = (radius - 1) * 0.3 + 0.8


Both sigma and kernel window size can be specified as single value for all data axes or per data axis.

When specifying the sigma or window size per axis, they are provided same as layouts: from outermost to innermost. The channel C and frame F dimensions are not considered data axes. If channels are present only channel-first or channel-last inputs are supported.

For example, with HWC input, user can provide sigma=1.0 or sigma=(1.0, 2.0) as there are two data axes H and W.

The same input can be provided as per-sample tensors.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type; if not set, the input type is used. Supported type: FLOAT.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• sigma (float or list of float, optional, default = [0.0]) – Sigma value for Gaussian Kernel.

• window_size (int or list of int, optional, default = [0]) – The diameter of kernel.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• sigma (TensorList of float, optional, default = [0.0]) – Sigma value for Gaussian Kernel.

• window_size (TensorList of int, optional, default = [0]) – The diameter of kernel.

class nvidia.dali.ops.Hsv(**kwargs)

This operator performs HSV manipulation. To change hue, saturation and/or value of the image, pass corresponding coefficients. Keep in mind, that hue has additive delta argument, while for saturation and value they are multiplicative.

This operator accepts RGB color space as an input.

For performance reasons, the operation is approximated by a linear transform in RGB space. The color vector is projected along the neutral (gray) axis, rotated (according to hue delta) and scaled according to value and saturation multiplers, and then restored to original color space.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• hue (float, optional, default = 0.0) – Set additive change of hue. 0 denotes no-op

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) – Set multiplicative change of saturation. 1 denotes no-op

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• value (float, optional, default = 1.0) – Set multiplicative change of value. 1 denotes no-op

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• hue (TensorList of float, optional, default = 0.0) – Set additive change of hue. 0 denotes no-op

• saturation (TensorList of float, optional, default = 1.0) – Set multiplicative change of saturation. 1 denotes no-op

• value (TensorList of float, optional, default = 1.0) – Set multiplicative change of value. 1 denotes no-op

class nvidia.dali.ops.Hue(**kwargs)

Changes the hue level of the image.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• hue (float, optional, default = 0.0) – Hue change, in degrees.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

hue (TensorList of float, optional, default = 0.0) – Hue change, in degrees.

class nvidia.dali.ops.ImageDecoder(**kwargs)

Decode images

For jpeg images, the implementation will use nvJPEG library or libjpeg-turbo depending on the selected backend (mixed and cpu respectively). Other image formats are decoded with OpenCV or other specific libraries (e.g. libtiff).

If used with mixed device, the operator will use a dedicated hardware decoder if available.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM.

Supported backends
• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• cache_batch_copy (bool, optional, default = True) – mixed backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

• cache_debug (bool, optional, default = False) – mixed backend only Print debug information about decoder cache.

• cache_size (int, optional, default = 0) – mixed backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

• cache_threshold (int, optional, default = 0) – mixed backend only Size threshold (in bytes) for images (after decoding) to be cached.

• cache_type (str, optional, default = '') – mixed backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hw_decoder_load (float, optional, default = 0.65) – mixed backend only Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload will depend on the number of threads given to the DALI pipeline and should be found empirically.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.ImageDecoderCrop(**kwargs)

Decode images and extract a fixed region-of-interest (ROI) specified by a constant window dimensions and a variable anchor.

When possible, it will make use of region-of-interest decoding APIs (e.g. libjpeg-turbo, nvJPEG) thus optimizing decoding time and memory usage. When not supported, it will decode the whole image and then crop the selected ROI.

Note: ROI decoding is currently not compatible with hardware based decoding. Using ImageDecoderCrop will automatically disable hardware accelerated decoding. To make use of the hardware decoder, use ImageDecoder and Crop operators instead.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM.

Supported backends
• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• crop_d (TensorList of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (TensorList of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (TensorList of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (TensorList of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

class nvidia.dali.ops.ImageDecoderRandomCrop(**kwargs)

Decode images and extract a random region-of-interest (ROI) with window dimensions generated from within a range of valid aspect_ratio and area values.

When possible, it will make use of region-of-interest decoding APIs (e.g. libjpeg-turbo, nvJPEG) thus optimizing decoding time and memory usage. When not supported, it will decode the whole image and then crop the selected ROI.

Note: ROI decoding is currently not compatible with hardware based decoding. Using ImageDecoderRandomCrop will automatically disable hardware accelerated decoding. To make use of the hardware decoder, use ImageDecoder and RandomResizedCrop operators instead.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM.

Supported backends
• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

• output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

• random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.ImageDecoderSlice(**kwargs)

Decode images and extract an externally provided region-of-interest (ROI) specified by an anchor and a shape of the ROI.

Inputs must be supplied as 3 separate tensors in a specific order: data containing input data, anchor containing either normalized or absolute coordinates (depending on the value of normalized_anchor) for the starting point of the slice (x0, x1, x2, …), and shape containing either normalized or absolute coordinates (depending on the value of normalized_shape) for the dimensions of the slice (s0, s1, s2, …). Both anchor and shape coordinates must be within the interval [0.0, 1.0] for normalized coordinates, or within the image shape for absolute coordinates. Both anchor and shape inputs will provide as many dimensions as specified with arguments axis_names or axes.

By default ImageDecoderSlice operator uses normalized coordinates and WH order for the slice arguments.

When possible, it will make use of region-of-interest decoding APIs (e.g. libjpeg-turbo, nvJPEG) thus optimizing decoding time and memory usage. When not supported, it will decode the whole image and then crop the selected ROI.

Note: ROI decoding is currently not compatible with hardware based decoding. Using ImageDecoderSlice will automatically disable hardware accelerated decoding. To make use of the hardware decoder, use ImageDecoder and Slice operators instead.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM.

Supported backends
• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for anchor and shape slice inputs, as dimension indexes

• axis_names (layout str, optional, default = ‘WH’) – Order of dimensions used for anchor and shape slice inputs, as described in layout. If provided, axis_names takes higher priority than axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• normalized_anchor (bool, optional, default = True) – Whether or not the anchor input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• normalized_shape (bool, optional, default = True) – Whether or not the shape input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.ImageDecoderSlice() for full documentation.

class nvidia.dali.ops.Jitter(**kwargs)

Perform a random Jitter augmentation. The output image is produced by moving each pixel by a random amount bounded by half of nDegree parameter (in both x and y dimensions).

Supported backends
• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in range [-nDegree/2, nDegree/2].

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

mask (TensorList of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.ops.LookupTable(**kwargs)

Maps input to output by using a lookup table specified by keys and values and a default_value for non specified keys.

keys and values are used to define the lookup table:

keys[] =   {0,     2,   3,   4,   5,    3}
values[] = {0.2, 0.4, 0.5, 0.6, 0.7, 0.10}
default_value = 0.99


yielding:

lut[] = {0.2, 0.99, 0.4, 0.10, 0.6, 0.7}  // only last occurrence of a key is considered


producing the output according to the formula:

Output[i] = lut[Input[i]]   if 0 <= Input[i] <= len(lut)
Output[i] = default_value   otherwise


Example:

Input[] =  {1,      4,    1,   0,  100,   2,     3,   4}
Output[] = {0.99, 0.6, 0.99, 0.2, 0.99, 0.4,  0.10, 0.6}


Note: Only integer types can be used as input to this operator.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• default_value (float, optional, default = 0.0) – Default output value for keys not present in the table.

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.

• keys (int or list of int, optional, default = []) – input values (keys) present in the lookup table. Length of keys and values argument should match.keys should be in the range [0, 65535].

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• values (float or list of float, optional, default = []) – mapped output values for each keys entry. Length of keys and values argument should match.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.MFCC(**kwargs)

Mel Frequency Cepstral Coefficiencs (MFCC). Computes MFCCs from a mel spectrogram.

Supported backends
• ‘cpu’

Keyword Arguments
• axis (int, optional, default = 0) – Axis over which the transform will be applied. If not provided, the outer-most dimension will be used.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dct_type (int, optional, default = 2) – Discrete Cosine Transform type. Supported types are: 1, 2, 3, 4. The formulas used to calculate the DCT are equivalent to those described in https://en.wikipedia.org/wiki/Discrete_cosine_transform

• lifter (float, optional, default = 0.0) –

Cepstral filtering (also known as liftering) coefficient. If lifter > 0, the MFCCs will be scaled according to the following formula:

MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)


• n_mfcc (int, optional, default = 20) – Number of MFCC coefficients

• normalize (bool, optional, default = False) – If true, the DCT will use an ortho-normal basis. Note: Normalization is not supported for dct_type=1.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.MXNetReader(**kwargs)

Read sample data from a MXNet RecordIO.

Supported backends
• ‘cpu’

Keyword Arguments
• index_path (str or list of str) – List (of length 1) containing a path to index (.idx) file. It is generated by the MXNet’s im2rec.py script together with RecordIO file. It can also be generated using rec2idx script distributed with DALI.

• path (str or list of str) – List of paths to RecordIO files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.MelFilterBank(**kwargs)

Converts a Spectrogram to a mel Spectrogram using triangular filter banks. Expects an input with 2 or more dimensions where the last two dimensions correspond to the fft bin index and the window index respectively.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• freq_high (float, optional, default = 0.0) – Maximum frequency. If not provided, sample_rate / 2 will be used

• freq_low (float, optional, default = 0.0) – Minimum frequency

• mel_formula (str, optional, default = 'slaney') – Determines the formula used to convert frequencies from Hertz to mel and viceversa. The mel scale is a perceptual scale of pitches and therefore there is no single formula to it. Supported values are: - “slaney” : Follows Slaney’s MATLAB Auditory Modelling Work behavior. This formula is linear under 1 KHz and logarithmic above. This implementation is consistent with Librosa’s default implementation. - “htk” : Follows O’Shaughnessy’s book formula m = 2595 * log10(1 + (f/700)). This is consistent with the implementation of the Hidden Markov Toolkit (HTK).

• nfilter (int, optional, default = 128) – Number of mel filters.

• normalize (bool, optional, default = True) – Whether to normalize the triangular filter weights by the width of their mel band. If set to true, the integral of the filter function will amount to 1. If set to false, the peak of the filter function will be 1

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.NonsilentRegion(**kwargs)

The operator performs leading and trailing silence detection in an audio buffer. The operator returns the beginning and length of the non-silent region by comparing short term power of the signal with a silence cut-off threshold. The signal is consider silence when short_term_power_db < cutoff_db with:

short_term_power_db = 10 * log10( short_term_power / reference_power )


and reference_power being typically the maximum of the signal, unless specified otherwise.

Inputs/Outputs

Input 0 - 1D audio buffer Output 0 - Begin index of nonsilent region Output 1 - Length of nonsilent region

Remarks
• If Outputs[1] == 0, Outputs[0] value is undefined

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• cutoff_db (float, optional, default = -60.0) – The threshold [dB], below which everything is considered as silence

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• reference_power (float, optional, default = 0.0) – The reference power used for converting signal to db. If reference_power is not provided, the maximum of the signal will be used as the reference power

• reset_interval (int, optional, default = 8192) – The number of samples after which the moving mean average is recalculated to avoid loss of precision. If reset_interval == -1 or the input type allows exact calculation, the average won’t be reset. The default value should fit most of the use cases.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• window_length (int, optional, default = 2048) – Size of a sliding window. The sliding window is used to calculate short-term power of the signal.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.NormalDistribution(**kwargs)

Creates a tensor that consists of data distributed normally. This operator can be ran in 3 modes, which determine the shape of the output tensor: 1. Providing an input batch to this operator results in a batch of output tensors, which have the same shape as the input tensors. 2. Providing a custom shape as an argument results in an output batch, where every tensor has the same (given) shape. 3. Providing no input arguments results in an output batch of scalars, distributed normally.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Data type for the output

• mean (float, optional, default = 0.0) – Mean value of the distribution

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – Shape of single output tensor in a batch

• stddev (float, optional, default = 1.0) – Standard deviation of the distribution

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• mean (TensorList of float, optional, default = 0.0) – Mean value of the distribution

• stddev (TensorList of float, optional, default = 1.0) – Standard deviation of the distribution

class nvidia.dali.ops.Normalize(**kwargs)

Normalizes the input by removing mean and dividing by standard deviation.

The mean and standard deviation can be calculated internally for specified subset of axes or can be externally provided as mean and stddev arguments.

The normalization is done following the formula:

out = scale * (in - mean) / stddev + shift


The expression assumes that out, in are equally shaped tensors whereas mean and stddev may be either tensors of same shape or scalars or a mix of these. The expression follows numpy broadcasting rules.

Sizes of (non-scalar) mean and stddev must have either extent of 1, if given axis is reduced, or match the corresponding extent of the input. A dimension is considered reduced if it’s listed in axes or axis_names. If neither axes nor axis_names argument is present, the set of reduced axes is inferred by comparing input shape to the shape of mean/stddev arguments, but it is enforced that the set of reduced axes is the same for all tensors in the batch.

Examples of valid argument combinations:

1. Per-sample normalization of dimensions 0 and 2:

axes = 0,2                                        # optional
input.shape = [ [480, 640, 3], [1080, 1920, 4] ]
batch = False
mean.shape =  [ [1, 640, 1], [1, 1920, 1] ]
stddev = (not supplied)


With these shapes, batch normalization is not possible, because the non-reduced dimension has different extent across samples.

1. Batch normalization of dimensions 0 and 1:

axes = 0,1                                        # optional
input.shape = [ [480, 640, 3], [1080, 1920, 3] ]
batch = True
mean = (scalar)
stddev.shape =  [ [1, 1, 3] ] ]


For color images, this normalizes the 3 color channels separately, but across all samples in the batch.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• axes (int or list of int, optional, default = []) – Indices of dimensions along which the input is normalized. By default, all axes are used. Axes can also be specified by name, see axes_names.

• axis_names (layout str, optional, default = ‘’) – Names of the axes in the input - axis indices are taken from the input layout. This argument cannot be used together with axes.

• batch (bool, optional, default = False) – If True, the mean and standard deviation are calculated across tensors in the batch. This also requires that the input sample shapes in the non-averaged axes match.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• ddof (int, optional, default = 0) – Delta Degrees of Freedom for Bessel’s correction. The variance is estimated as sum(Xi - mean)**2 / (N - ddof). This argument is ignored when externally supplied standard deviation is used.

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output type. When using integral types, use shift and scale to improve usage of the output type’s dynamic range. If dtype is an integral type, out of range values are clamped, and non-integer values are rounded to nearest integer.

• epsilon (float, optional, default = 0.0) – A value added to the variance to avoid division by small numbers

• mean (float, optional, default = 0.0) – Mean value to subtract from the data. It can be either a scalar or a batch of tensors with same dimensionality as the input and the extent in each dimension must either match that of the input or be equal to 1 (in which case the value will be broadcast in this dimension). If not specified, the mean is calculated from the input. Non-scalar mean cannot be used when batch argument is True.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• scale (float, optional, default = 1.0) – The scaling factor applied to the output. Useful for integral output types

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shift (float, optional, default = 0.0) – The value to which the mean will map in the output. Useful for unsigned output types.

• stddev (float, optional, default = 0.0) – Standard deviation value to scale the data. For shape constraints, see mean argument. If not specified, the standard deviation is calculated from the input. Non-scalar mean cannot be used when batch argument is True.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments
• mean (TensorList of float, optional, default = 0.0) – Mean value to subtract from the data. It can be either a scalar or a batch of tensors with same dimensionality as the input and the extent in each dimension must either match that of the input or be equal to 1 (in which case the value will be broadcast in this dimension). If not specified, the mean is calculated from the input. Non-scalar mean cannot be used when batch argument is True.

• stddev (TensorList of float, optional, default = 0.0) – Standard deviation value to scale the data. For shape constraints, see mean argument. If not specified, the standard deviation is calculated from the input. Non-scalar mean cannot be used when batch argument is True.

class nvidia.dali.ops.NumpyReader(**kwargs)

Read Numpy arrays from a directory

Supported backends
• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing data files. NumpyReader supports flat directory structure. file_root directory should contain directories with numpy files in them.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• file_filter (str, optional, default = ‘*.npy’) – If specified, the string will be interpreted as glob string to filter the list of files in the sub-directories of file_root.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch. It is exclusive with stick_to_shard and random_shuffle.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.OldColorTwist(**kwargs)

Warning

This operator is now deprecated. Use ColorTwist instead.

Combination of hue, saturation, contrast and brightness. Old implementation using NPP.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• hue (float, optional, default = 0.0) – Hue change, in degrees.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• brightness (TensorList of float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• contrast (TensorList of float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• hue (TensorList of float, optional, default = 0.0) – Hue change, in degrees.

• saturation (TensorList of float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

class nvidia.dali.ops.OneHot(**kwargs)

Produce tensor representing one hot encoding of the given input. Input must be a Scalar, otherwise the operator will fail.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Data type for the output

• num_classes (int, optional, default = 0) – Number of all classes in the data

• off_value (float, optional, default = 0.0) –

Value that will be used to fill the output when input[j] != i.

It will be cast to dtype type

• on_value (float, optional, default = 1.0) –

Value that will be used to fill the output when input[j] = i.

It will be cast to dtype type

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.OpticalFlow(**kwargs)

Calculates the Optical Flow for sequence of images given as a input. Mandatory input for the operator is a sequence of frames. As an optional input, operator accepts external hints for OF calculation. The output format of this operator matches the output format of OF driver API. Dali uses Turing optical flow hardware implementation: https://developer.nvidia.com/opticalflow-sdk

This operator allows sequence inputs.

Supported backends
• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• enable_external_hints (bool, optional, default = False) – enabling/disabling external hints for OF calculation. External hints are analogous to temporal hints, only they come from external source. When this option is enabled, Operator requires 2 inputs.

• enable_temporal_hints (bool, optional, default = False) – enabling/disabling temporal hints for sequences longer than 2 images. They are used to speed up calculation: previous OF result in sequence is used to calculate current flow. You might want to use temporal hints for sequences, that don’t have much changes in the scene (e.g. only moving objects)

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – Type of input images (RGB, BGR, GRAY)

• output_format (int, optional, default = -1) – Setting grid size for output vector. Value defines width of grid square (e.g. if value == 4, 4x4 grid is used). For values <=0, grid size is undefined. Currently only grid_size=4 is supported.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• preset (float, optional, default = 0.0) –

Speed/quality level of OF calculation:

• 0.0 is the lowest speed and the best quality,

• 0.5 is the medium speed and quality,

• 1.0 is the fastest speed and the lowest quality.

The lower the speed, the more additional pre-/post-processing is used to enchance the quality of OF result.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.OpticalFlow() for full documentation.

class nvidia.dali.ops.Pad(**kwargs)

Pads all samples with fill_value in the given axes, to match the biggest extent in the batch for those axes, or to match the minimum shape specified.

Supported types are integer and floating point numeric types.

Examples:

• 1-D samples, fill_value = -1, axes = (0,)

input  = [[3,   4,   2,   5,   4],
[2,   2],
[3, 199,   5]];
output = [[3,   4,   2,   5,   4],
[2,   2,  -1,  -1,  -1],
[3, 199,   5,  -1,  -1]]

• 1-D samples, fill_value = -1, axes = (0,), shape = (7,)

input  = [[3,   4,   2,   5,   4],
[2,   2],
[3, 199,   5],
[1,   2,   3,   4,   5,   6,   7,   8]];
output = [[3,   4,   2,   5,   4,  -1,  -1],
[2,   2,  -1,  -1,  -1,  -1,  -1],
[3, 199,   5,  -1,  -1,  -1,  -1],
[1,   2,   3,   4,   5,   6,   7,   8]]

• 1-D samples, fill_value = -1, axes = (0,), align = (4,)

input  = [[3,   4,   2,   5,   4],
[2,   2],
[3, 199,   5]];
output = [[3,   4,   2,   5,   4,  -1,  -1,  -1],
[2,   2,  -1,  -1,  -1,  -1,  -1,  -1],
[3, 199,   5,  -1,  -1,  -1,  -1,  -1]]

• 1-D samples, fill_value = -1, axes = (0,), shape = (1,), align = (2,)

input  = [[3,   4,   2,   5,   4],
[2,   2],
[3, 199,   5]];
output = [[3,   4,   2,   5,   4,  -1],
[2,   2],
[3, 199,   5,  -1]]

• 2-D samples, fill_value = 42, axes = (1,)

input  = [[[1,  2,  3,  4],
[5,  6,  7,  8]],
[[1,  2],
[4,  5]]]
output = [[[1,  2,  3,  4],
[5,  6,  7,  8]],
[[1,  2, 42, 42],
[4,  5, 42, 42]]]

• 2-D samples, fill_value = 0, axes = (0, 1), align = (4, 5)

input  = [[[1,  2,  3,  4],
[5,  6,  7,  8],
[9, 10, 11, 12]],
[[1, 2],
[4, 5]]]
output = [[[1,  2,  3,  4,  0],
[5,  6,  7,  8,  0],
[9, 10, 11, 12,  0],
[0,  0,  0,  0,  0]],
[[1,  2,  0,  0,  0],
[4,  5,  0,  0,  0],
[0,  0,  0,  0,  0],
[0,  0,  0,  0,  0]]]

• 2-D samples, fill_value = 0, axes = (0, 1), align = (1, 2), shape = (4, -1)

input  = [[[1,  2,  3],
[4,  5,  6]],
[[1, 2],
[4, 5],
[6, 7]]]
output = [[[1,  2,  3,  0],
[4,  5,  6,  0],
[0,  0,  0,  0],
[0,  0,  0,  0]],
[[1,  2,  0,  0],
[4,  5,  0,  0],
[6,  7,  0,  0],
[0,  0,  0,  0]]]

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• align (int or list of int, optional, default = []) – If specified, determines the alignment on those dimensions specified by axes or axis_names. That is, the extent on axis = axes[i] will be adjusted to be a multiple of align[i]. If a single integer value is provided, the alignment restrictions are applied to all the padded axes.

• axes (int or list of int, optional, default = []) – Indices of the axes on which the batch samples will be padded. Indexes are zero-based with 0 being the outer-most dimension of the tensor. Arguments axis_names and axes are mutually exclusive. If axes and axis_names are empty or not provided, the output will be padded on all the axes

• axis_names (layout str, optional, default = ‘’) – Names of the axes on which the batch samples will be padded. Dimension names should correspond to dimensions in the input layout. Arguments axis_names and axes are mutually exclusive. If axes and axis_names are empty or not provided, the output will be padded on all the axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – The value to pad the batch with

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – The extents of the output shape in the axes specified by axes or axis_names. Specifying -1 for an axis restores the default behavior of extending the axis to accommodate the (aligned) size of the largest sample in the batch. If the provided extent is smaller than the one of the sample, no padding will be applied, except what is needed to match required alignment.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Paste(**kwargs)

Paste the input image on a larger canvas. The canvas size is equal to input size * ratio.

Supported backends
• ‘gpu’

Keyword Arguments
• fill_value (int or list of int) – Tuple of values of the color to fill the canvas. Length of the tuple needs to be equal to n_channels.

• ratio (float) – Ratio of canvas size to input size, must be > 1.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• min_canvas_size (float, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.

• n_channels (int, optional, default = 3) – Number of channels in the image.

• paste_x (float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• ratio (TensorList of float) – Ratio of canvas size to input size, must be > 1.

• min_canvas_size (TensorList of float, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.

• paste_x (TensorList of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (TensorList of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

class nvidia.dali.ops.PowerSpectrum(**kwargs)

Power spectrum of signal.

Supported backends
• ‘cpu’

Keyword Arguments
• axis (int, optional, default = -1) – Index of the dimension to be transformed to the frequency domain. By default, the last dimension is selected.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• nfft (int, optional, default = -1) – Size of the FFT. By default nfft is selected to match the lenght of the data in the transformation axis. The number of bins created in the output is nfft // 2 + 1 (positive part of the spectrum only).

• power (int, optional, default = 2) – Exponent of the fft magnitude: Supported values are 2 for power spectrum (real*real + imag*imag) and 1 for complex magnitude (sqrt(real*real + imag*imag)).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.PreemphasisFilter(**kwargs)

This operator performs preemphasis filter on the input data. This filter in simple form can be expressed by the formula:

Y(t) = X[t] - coeff * X[t-1]

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Data type for the output

• preemph_coeff (float, optional, default = 0.97) – Preemphasis coefficient coeff

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments

preemph_coeff (TensorList of float, optional, default = 0.97) – Preemphasis coefficient coeff

class nvidia.dali.ops.PythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)

Executes a python function. The operator can be used to execute custom python code within the DALI pipeline. The called function will get tensors’ data as NumPy arrays for CPU operators or as CuPy arrays for GPU operators and should return results in the same format (for more universal data format see DLTensorPythonFunction). The function should not modify input tensors.

For now, this operator can be used only in pipelines with exec_async=False and exec_pipelined=False specified. Due to inferior performance, it is intended for prototyping and debugging.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• function (object) – Function object.

• batch_processing (bool, optional, default = False) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

static current_stream()

Get DALI’s current CUDA stream.

class nvidia.dali.ops.PythonFunctionBase(impl_name, function, num_outputs=1, device='cpu', **kwargs)

Supported backends

Keyword Arguments
• function (object) – Function object.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.RandomBBoxCrop(**kwargs)

Applies a prospective random crop to an image coordinate space while keeping bounding boxes (and optionally labels) consistent. That is, after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped region of interest. The applied random crop operation is constrained by the arguments provided to the operator.

The cropping window candidates are selected randomly until one matches the overlap restrictions specified by thresholds argument. Thresholds values represent a minimum overlap metric, specified by threshold_type, such as intersection-over-union of the cropping window and the bounding boxes or relative overlap as the ratio of the intersection area and the area of the bounding box.

Alternatively, we can allow not to crop as one of the valid outcomes of the random process (use allow_no_crop for that).

Two modes of random crop are available:

Randomly shaped window, randomly placed within the original input space. The random crop window dimensions are selected according to the provided aspect_ratio and relative area restrictions.

Fixed size window, randomly placed within the original input space. The random crop window dimensions are taken from the crop_shape argument and the anchor is selected randomly. When providing crop_shape, a second argument input_shape should be provided, specifying the original image dimensions, which is required to scale the output bounding boxes.

The argument num_attempts can be used to control the maximum number of attempts that we try to produce a valid crop to match a single minimum overlap metric value from thresholds. Be advised that when allow_no_crop is False and thresholds does not contain 0.0, it is good to increase the num_attempts value as otherwise it may loop for a very long time.

Inputs: 0: bboxes, (1:labels, )

The first input, bboxes, refers to bounding boxes that are provided as a 2-dimensional tensor where the first dimension refers to the index of the bounding box and the second dimension refers to the index of the coordinate. Coordinates are relative to the original image dimensions (i.e. range [0.0, 1.0]), representing either start and end of the region or start and shape depending on the value of bbox_layout. E.g. bbox_layout="xyXY" means the bounding box coordinates are following the order start_x, start_y, end_x, end_y, while bbox_layout="xyWH" indicates the order is start_x, start_y, width, height. See more in bbox_layout argument description.

A second input labels can be optionally provided, representing the labels associated with each of the bounding boxes.

Outputs: 0:anchor, 1:shape, 2:bboxes, (3:labels,)

The resulting crop parameters are provided as two separate outputs anchor and shape, that can be fed directly to Slice operator to perform the actual cropping of the original image. anchor and shape contain starting coordinates and dimensions for the crop, as [x, y, (z,)] and [w, h, (d,)] respectively. The coordinates can be represented in absolute or relative terms, depending of whether the fixed crop_shape was used or not.

The third and fourth output correspond to the adjusted bounding boxes and optionally their corresponding labels. Bounding boxes are always specified in relative coordinates.

Supported backends
• ‘cpu’

Keyword Arguments
• all_boxes_above_threshold (bool, optional, default = True) – If true, all bounding boxes in a sample should overlap with the cropping window as specified by thresholds, otherwise the cropping window is considered invalid. If false, a cropping window will be considered valid if any bounding box overlaps it sufficiently.

• allow_no_crop (bool, optional, default = True) – If true, not cropping will be one of the possible outcomes of the random process, as if it was one more thresholds value to choose from.

• aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –

Single or multiple valid aspect ratio ranges for cropping windows.

For 2D bounding boxes, a single aspect ratio x/y range should be provided (i.e. [min_xy, max_xy])

For 3D bounding boxes, three aspect ratio ranges are expected (x/y, x/z and y/z) ranges, i.e [min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]. Alternatively, if a single aspect ratio range is provided, the valid aspect ratio range will be the same for the three ratios.

Value for min should be > 0.0 and min <= max.

By default, square windows are generated.

Note: Providing aspect_ratio and scaling is incompatible with specifying crop_shape explicitly.

• bbox_layout (layout str, optional, default = ‘’) –

Determines the meaning of the coordinates of the bounding boxes.

Possible values are:

x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor),

X (horizontal end anchor), Y (vertical end anchor), Z (depthwise end anchor),

W (width), H (height), D (depth).

Note: If left empty, "xyXY" or "xyzXYZ" will be assumed, depending on the number of dimensions.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop_shape (int or list of int, optional, default = []) –

If provided, the random crop window dimensions will be fixed to this shape.

The order of dimensions is determined by the layout provided in shape_layout.

Note: crop_shape and input_shape should be provided together, and providing those is incompatible with using scaling and aspect_ratio arguments.

• input_shape (int or list of int, optional, default = []) –

Specifies the shape of the original input image.

The order of dimensions is determined by the layout provided in shape_layout.

Note: crop_shape and input_shape should be provided together, and providing those is incompatible with using scaling and aspect_ratio arguments.

• ltrb (bool, optional, default = True) –

If true, bboxes are returned as [left, top, right, bottom], otherwise they are provided as [left, top, width, height].

WARNING: This argument is deprecated. Use bbox_layout instead to specify the bbox encoding. E.g ltrb=True is equivalent to bbox_layout="xyXY" and ltrb=False corresponds to bbox_layout="xyWH"

• num_attempts (int, optional, default = 1) –

Number of attempts to get a crop window that matches the aspect_ratio and a selected value from thresholds.

After each num_attempts, a different threshold will be picked, until either reaching a maximum of total_num_attempts (if provided), or indefinitely otherwise.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• scaling (float or list of float, optional, default = [1.0, 1.0]) –

Range [min, max] for crop size with respect to original image dimensions.

Value for min should satisfy 0.0 <= min <= max.

Note: Providing aspect_ratio and scaling is incompatible with specifying crop_shape explicitly

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape_layout (layout str, optional, default = ‘’) –

Determines the meaning of the dimensions provided in crop_shape and input_shape. Possible values are:

W (width), H (height), D (depth).

Note: If left empty, "WH" or "WHD" will be assumed, depending on the number of dimensions.

• threshold_type (str, optional, default = 'iou') – Determines the meaning of thresholds. By default refers to intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, it could be set to overlap to specify the fraction (by area) of the the bounding box which will fall inside the crop window (e.g. a threshold 1.0 means the whole bounding box must be contained in the resulting cropping window)

• thresholds (float or list of float, optional, default = [0.0]) – Minimum intersection-over-union (IoU), or a different metric if specified by threshold_type, of the bounding boxes with respect to the cropping window. Each sample will select one of the thresholds randomly, and the operator will try a given number of attempts (see num_attempts) to produce a random crop window that has the metric above the selected threshold.

• total_num_attempts (int, optional, default = -1) –

If provided, it indicates the total maximum number of attempts to get a crop window that matches the aspect_ratio and any selected value from thresholds.

After total_num_attempts the best candidate will be chosen as a default.

If not specified, the crop search will continue indefinitely until finding a valid crop.

WARNING: Not providing total_num_attempts could lead to an infinite loop if the provided arguments can’t be statisfied.

__call__(boxes, labels = None, **kwargs)

Operator call to be used in define_graph step.

Parameters
• boxes (2D TensorList of float) – Relative coordinates of the bounding boxes represented as a 2D tensor where the first dimension refers to the index of the bounding box and the second dimension refers to the index of the coordinate.

• labels (1D TensorList of integers, optional) – (optional) Labels associated with each of the bounding boxes.

Keyword Arguments
• crop_shape (TensorList of int, optional, default = []) –

If provided, the random crop window dimensions will be fixed to this shape.

The order of dimensions is determined by the layout provided in shape_layout.

Note: crop_shape and input_shape should be provided together, and providing those is incompatible with using scaling and aspect_ratio arguments.

• input_shape (TensorList of int, optional, default = []) –

Specifies the shape of the original input image.

The order of dimensions is determined by the layout provided in shape_layout.

Note: crop_shape and input_shape should be provided together, and providing those is incompatible with using scaling and aspect_ratio arguments.

class nvidia.dali.ops.RandomResizedCrop(**kwargs)

Perform a crop with randomly chosen area and aspect ratio, then resize it to given size. Expects a 3-dimensional input with samples in HWC layout (height, width, channels).

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• size (int or list of int) – Size of resized image.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used. Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

• mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up

• min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down

• minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

• num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

• random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.Reinterpret(**kwargs)
Treats content of the input as if it had a different type, shape and/or layout.

The buffer contents are not copied.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – The desired output type. The total size in bytes of the output must match the input. If no shape is provided, the innermost dimension is adjusted accordingly. If the byte size of the innermost dimension is not divisible by the size of the target type, an error is thrown.

• layout (layout str, optional, default = ‘’) – New layout for the data. If not specified, the output layout is preserved if number of dimension matches existing layout or reset to empty otherwise. If set and not empty, the layout must match the dimensionality of the output.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• rel_shape (float or list of float, optional, default = []) – The relative shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and rel_shape = [0.5, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – The desired shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and shape = [240, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

__call__(data, shape_input = None, **kwargs)

Operator call to be used in define_graph step.

Parameters
• data (TensorList) – Data to be reshaped

• shape_input (1D TensorList of integers, optional) – Same as shape keyword argument

Keyword Arguments
• rel_shape (TensorList of float, optional, default = []) – The relative shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and rel_shape = [0.5, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

• shape (TensorList of int, optional, default = []) – The desired shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and shape = [240, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

class nvidia.dali.ops.Reshape(**kwargs)
Treats content of the input as if it had a different shape and/or layout.

The buffer contents are not copied.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• layout (layout str, optional, default = ‘’) – New layout for the data. If not specified, the output layout is preserved if number of dimension matches existing layout or reset to empty otherwise. If set and not empty, the layout must match the dimensionality of the output.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• rel_shape (float or list of float, optional, default = []) – The relative shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and rel_shape = [0.5, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – The desired shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and shape = [240, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

__call__(data, shape_input = None, **kwargs)

Operator call to be used in define_graph step.

Parameters
• data (TensorList) – Data to be reshaped

• shape_input (1D TensorList of integers, optional) – Same as shape keyword argument

Keyword Arguments
• rel_shape (TensorList of float, optional, default = []) – The relative shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and rel_shape = [0.5, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

• shape (TensorList of int, optional, default = []) – The desired shape of the output. Number of dimensions must not exceed the number of dimensions of the input. There can be one negative extent which receives the size required to match the input volume, e.g. input of shape [480, 640, 3] and shape = [240, -1] would get the shape [240, 3840]. NOTE: rel_shape and shape are mutually exclusive.

class nvidia.dali.ops.Resize(**kwargs)

Resize images.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used. Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

• mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down

• minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• save_attrs (bool, optional, default = False) – Save reshape attributes for testing.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• resize_longer (TensorList of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (TensorList of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (TensorList of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (TensorList of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.ResizeCropMirror(**kwargs)

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• fill_values (float or list of float, optional, default = [0.0]) – Determines padding values, only relevant if out_of_bounds_policy is set to “pad”. If a scalar is provided, it will be used for all the channels. If multiple values are given, there should be as many values as channels (extent of dimension ‘C’ in the layout) in the output slice.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• mirror (int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• out_of_bounds_policy (str, optional, default = 'error') –

Determines the policy when slicing out of bounds of the input. Supported values are:

• ”error” (default) : Attempting to slice outside of the bounds of the image will produce an error.

• ”pad”: The input will be padded as needed with zeros or any other value specified with fill_values argument.

• ”trim_to_shape”: The slice window will be cut to the bounds of the input.)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments
• crop_d (TensorList of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (TensorList of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (TensorList of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (TensorList of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (TensorList of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (TensorList of int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• resize_longer (TensorList of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (TensorList of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (TensorList of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (TensorList of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.Rotate(**kwargs)

Rotate the image by given angle.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• angle (float) – Angle, in degrees, by which the image is rotated. For 2D data, the rotation is counter-clockwise, assuming top-left corner at (0,0) For 3D data, the angle is a positive rotation around given axis

• axis (float or list of float, optional, default = []) – 3D only: axis around which to rotate. The vector does not need to be normalized, but must have non-zero length. Reversing the vector is equivalent to changing the sign of angle.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default, same as input type

• fill_value (float, optional, default = 0.0) – Value used to fill areas that are outside source image. If not specified, source coordinates are clamped and the border pixel is repeated.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

• keep_size (bool, optional, default = False) – If True, original canvas size is kept. If False (default) and size is not set, then the canvas size is adjusted to acommodate the rotated image with least padding possible

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• size (float or list of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC', 'DHWC')) – Input to the operator.

Keyword Arguments
• angle (TensorList of float) – Angle, in degrees, by which the image is rotated. For 2D data, the rotation is counter-clockwise, assuming top-left corner at (0,0) For 3D data, the angle is a positive rotation around given axis

• axis (TensorList of float, optional, default = []) – 3D only: axis around which to rotate. The vector does not need to be normalized, but must have non-zero length. Reversing the vector is equivalent to changing the sign of angle.

• size (TensorList of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

class nvidia.dali.ops.SSDRandomCrop(**kwargs)

Perform a random crop with bounding boxes where IoU meets randomly selected threshold between 0-1. When IoU falls below threshold new random crop is generated up to num_attempts. As an input, it accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_attempts (int, optional, default = 1) – Number of attempts.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.SSDRandomCrop() for full documentation.

class nvidia.dali.ops.Saturation(**kwargs)

Changes saturation level of the image.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – Output data type; if not set, the input type is used.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

saturation (TensorList of float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

class nvidia.dali.ops.SequenceReader(**kwargs)

Read [Frame] sequences from a directory representing collection of streams. Expects file_root to contain set of directories, each of them represents one extracted video stream. Extracted video stream is represented by one file for each frame, sorting the paths to frames lexicographically should give the original order of frames. Sequences do not cross stream boundary and only full sequences are considered - there is no padding.

Example directory structure:

- file_root
- 0
- 00001.png
- 00002.png
- 00003.png
- 00004.png
- 00005.png
- 00006.png
....

- 1
- 00001.png
- 00002.png
- 00003.png
- 00004.png
- 00005.png
- 00006.png
....


This operator allows sequence inputs.

Supported backends
• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing streams (directories representing streams).

• sequence_length (int) – Length of sequence to load for each sample

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• step (int, optional, default = 1) – Distance between first frames of consecutive sequences

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• stride (int, optional, default = 1) – Distance between consecutive frames in sequence

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.SequenceRearrange(**kwargs)

Rearrange the sequence stored as tensor. Assumes that the outermost dimension represents a sequence and other dimensions of input represent elements of that sequence. If layout is specified, the first dimension should be denoted as F indicating frames of the sequence.

This operator allows sequence inputs.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• new_order (int or list of int) – List describing new order for elements of each sample. Output sequence at position i will contain element new_order[i] from input sequence. Elements can be repeated or dropped, empty output sequences are not allowed. Only indices in [0, input_outermost_dimension) are allowed to be used in new_order. Can be specified per sample as 1D tensors.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

Keyword Arguments

new_order (TensorList of int) – List describing new order for elements of each sample. Output sequence at position i will contain element new_order[i] from input sequence. Elements can be repeated or dropped, empty output sequences are not allowed. Only indices in [0, input_outermost_dimension) are allowed to be used in new_order. Can be specified per sample as 1D tensors.

class nvidia.dali.ops.Shapes(**kwargs)

Returns the shapes of inputs.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• type (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type, to which the sizes are converted.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Slice(**kwargs)

Extract a subtensor or slice with a given shape and anchor. Inputs must be supplied as 3 separate tensors in a specific order: data, anchor and shape. Both anchor and shape coordinates must be within the interval [0.0, 1.0] for normalized coordinates, or within the image shape for absolute coordinates. Both anchor and shape inputs will provide as many dimensions as specified with arguments axis_names or axes. By default Slice operator uses normalized coordinates and WH order for the slice arguments.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for anchor and shape slice inputs, as dimension indexes

• axis_names (layout str, optional, default = ‘WH’) – Order of dimensions used for anchor and shape slice inputs, as described in layout. If provided, axis_names takes higher priority than axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• fill_values (float or list of float, optional, default = [0.0]) – Determines padding values, only relevant if out_of_bounds_policy is set to “pad”. If a scalar is provided, it will be used for all the channels. If multiple values are given, there should be as many values as channels (extent of dimension ‘C’ in the layout) in the output slice.

• normalized_anchor (bool, optional, default = True) – Whether or not the anchor input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• normalized_shape (bool, optional, default = True) – Whether or not the shape input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• out_of_bounds_policy (str, optional, default = 'error') –

Determines the policy when slicing out of bounds of the input. Supported values are:

• ”error” (default) : Attempting to slice outside of the bounds of the image will produce an error.

• ”pad”: The input will be padded as needed with zeros or any other value specified with fill_values argument.

• ”trim_to_shape”: The slice window will be cut to the bounds of the input.)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, anchor, shape, **kwargs)

Operator call to be used in define_graph step.

Parameters
• data (TensorList) – Batch containing input data

• anchor (1D TensorList of float) – Input containing either normalized or absolute coordinates (depending on the value of normalized_anchor) for the starting point of the slice (x0, x1, x2, …).

• shape (1D TensorList of float) – Input containing either normalized or absolute coordinates (depending on the value of normalized_shape) for the dimensions of the slice (s0, s1, s2, …).

class nvidia.dali.ops.Spectrogram(**kwargs)

Produces a spectrogram from a 1D signal (e.g. audio). Input data is expected to be single channel (shape being (nsamples,), (nsamples, 1) or (1, nsamples)).

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• center_windows (bool, optional, default = True) – Indicates whether extracted windows should be padded so that window function is centered at multiples of window_step. If set to false, the signal will not be padded, that is only windows within the input range will be extracted.

• nfft (int, optional, default = -1) – Size of the FFT. The number of bins created in the output is nfft // 2 + 1 (positive part of the spectrum only).

• power (int, optional, default = 2) – Exponent of the magnitude of the spectrum. Supported values are 1 for energy and 2 for power.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• reflect_padding (bool, optional, default = True) – Indicates the padding policy when sampling outside the bounds of the signal. If set to true, the signal is mirrored with respect to the boundary, otherwise the signal is padded with zeros. Note: This option is ignored when center_windows is set to false.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• window_fn (float or list of float, optional, default = []) – Samples of the window function that will be multiplied to each extracted window when calculating the STFT. If provided it should be a list of floating point numbers of size window_length. If not provided, a Hann window will be used.

• window_length (int, optional, default = 512) – Window size (in number of samples)

• window_step (int, optional, default = 256) – Step betweeen the STFT windows (in number of samples)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Sphere(**kwargs)

Perform a sphere augmentation.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

mask (TensorList of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.ops.TFRecordReader(path, index_path, features, **kwargs)

Read sample data from a TensorFlow TFRecord file.

Supported backends
• ‘cpu’

Keyword Arguments
• features (dict of (string, nvidia.dali.tfrecord.Feature)) – Dictionary of names and configuration of features existing in TFRecord file. Typically obtained using helper functions dali.tfrecord.FixedLenFeature and dali.tfrecord.VarLenFeature, they are equivalent to TensorFlow’s tf.FixedLenFeature and tf.VarLenFeature respectively. For more flexibility dali.tfrecord.VarLenFeature supports partial_shape parameter. If provided, data will be reshaped to match its value. First dimension will be inferred from the data size.

• index_path (str or list of str) – List of paths to index files (1 index file for every TFRecord file). Index files may be obtained from TFRecord files using tfrecord2idx script distributed with DALI.

• path (str or list of str) – List of paths to TFRecord files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.ToDecibels(**kwargs)

Converts a magnitude (real, positive) to the decibel scale, according to the formula:

min_ratio = pow(10, cutoff_db / multiplier)
out[i] = multiplier * log10( max(min_ratio, input[i] / reference) )

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• cutoff_db (float, optional, default = -200.0) – Minimum or cut-off ratio in dB. Any value below this value will saturate. Example: A value of cutoff_db=-80 corresponds to a minimum ratio of 1e-8.

• multiplier (float, optional, default = 10.0) – Factor by which we multiply the logarithm (typically 10.0 or 20.0 depending if we are dealing with a squared magnitude or not).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• reference (float, optional, default = 0.0) – Reference magnitude. If not provided, the maximum of the input will be used as reference. Note: The maximum of the input will be calculated on a per-sample basis.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Transpose(**kwargs)

Transpose tensor by reordering the dimensions according to the perm parameter. Destination dimension i is obtained from source dimension perm[i].

For example, with src image in HWC layout, shape = (100, 200, 3), and perm = [2, 0, 1] Transpose Operator would produce a dst image of layout CHW and shape = (3, 100, 200), holding the euqality:

$dst(x_2, x_0, x_1) = src(x_0, x_1, x_2)$

which is equivalent to:

$dst(x_{perm[0]}, x_{perm[1]}, x_{perm[2]}) = src(x_0, x_1, x_2)$

for all valid coordinates.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• perm (int or list of int) – Permutation of the dimensions of the input (e.g. [2, 0, 1]).

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• output_layout (layout str, optional, default = ‘’) – If provided, sets output data layout, overriding any transpose_layout setting

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• transpose_layout (bool, optional, default = True) – When set to true, the output data layout will be transposed according to perm. Otherwise, the input layout is copied to the output

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList) – Input to the operator.

class nvidia.dali.ops.Uniform(**kwargs)

Produce tensor filled with uniformly distributed random numbers.

Supported backends
• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• range (float or list of float, optional, default = [-1.0, 1.0]) – Range of produced random numbers.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = [1]) – Shape of the samples

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.VideoReader(**kwargs)

Load and decode H264 video codec with FFmpeg and NVDECODE, NVIDIA GPU’s hardware-accelerated video decoding. The video codecs can be contained in most of container file formats. FFmpeg is used to parse video containers. Returns a batch of sequences of sequence_length frames of shape [N, F, H, W, C] (N being the batch size and F the number of frames). Supports only constant frame rate videos.

Supported backends
• ‘gpu’

Keyword Arguments
• sequence_length (int) – Frames to load per sequence.

• additional_decode_surfaces (int, optional, default = 2) – Additional decode surfaces to use beyond minimum required. This is ignored when decoder is not able to determine minimum number of decode surfaces, which may happen when using an older driver. This parameter can be used trade off memory usage with performance.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• channels (int, optional, default = 3) – Number of channels.

• dont_use_mmap (bool, optional, default = False) – If set to true, the Loader will not attempt to map the file in memory and will use plain file I/O instead. Mapping provides a small performance benefit when accessing a local file system, but most of the network ones, due to their nature, don’t provide optimum performance

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – The data type of the output frames (supports FLOAT and UINT8).

• enable_frame_num (bool, optional, default = False) – Return frame number output if file_list or file_root argument is passed

• enable_timestamps (bool, optional, default = False) – Return timestamps output if file_list or file_root argument is passed

• file_list (str, optional, default = '') – Path to the file with a list of pairs file label. This option is mutually exclusive with filenames and file_root.

• file_list_frame_num (bool, optional, default = False) – If start/end timestamps are provided in file_list, interpret them as frame numbers instead of timestamp. If floating point values are given, then start frame number is ceiling of the number and end frame number is floor of the number. Frame numbers start from 0.

• file_root (str, optional, default = '') – Path to a directory containing data files. This option is mutually exclusive with filenames and file_list.

• filenames (str or list of str, optional, default = []) – File names of the video files to load. This option is mutually exclusive with file_root and file_list.

• image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (supports RGB and YCbCr).

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• normalized (bool, optional, default = False) – Get output as normalized data.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be artificially added when the data set size is not equally divisible by the number of shards, and the shard is not equally divisible by the batch size. In the end, the shard size will be equalized between shards.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• skip_vfr_check (bool, optional, default = False) – Skips check for variable frame rate on videos. This is useful when heuristic fails.

• step (int, optional, default = -1) – Frame interval between each sequence (if step < 0, step is set to sequence_length).

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• stride (int, optional, default = 1) – Distance between consecutive frames in sequence.

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any TensorList inputs.

class nvidia.dali.ops.WarpAffine(**kwargs)

Apply an affine transformation to the image.

This operator supports volumetric data.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default, same as input type

• fill_value (float, optional, default = 0.0) – Value used to fill areas that are outside source image. If not specified, source coordinates are clamped and the border pixel is repeated.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

• matrix (float or list of float, optional, default = []) –

Transform matrix (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using the following formula

dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• size (float or list of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.WarpAffine() for full documentation.

Keyword Arguments
• matrix (TensorList of float, optional, default = []) –

Transform matrix (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using the following formula

dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

• size (TensorList of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

class nvidia.dali.ops.Water(**kwargs)

Perform a water augmentation (make image appear to be underwater).

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• ampl_x (float, optional, default = 10.0) – Amplitude of the wave in x direction.

• ampl_y (float, optional, default = 10.0) – Amplitude of the wave in y direction.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• freq_x (float, optional, default = 0.049087) – Frequency of the wave in x direction.

• freq_y (float, optional, default = 0.049087) – Frequence of the wave in y direction.

• interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• phase_x (float, optional, default = 0.0) – Phase of the wave in x direction.

• phase_y (float, optional, default = 0.0) – Phase of the wave in y direction.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (TensorList ('HWC')) – Input to the operator.

Keyword Arguments

mask (TensorList of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.plugin.pytorch.TorchPythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)

Executes a function operating on Torch tensors. Analogous to PythonFunction but tensors’ data is handled as PyTorch tensors.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
• ‘cpu’

• ‘gpu’

Keyword Arguments
• function (object) – Function object.

• batch_processing (bool, optional, default = False) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

## Arithmetic expressions¶

DALI allows to use regular Python arithmetic operations within define_graph() method on the values returned from invocations of other operators.

The expressions used will be incorporated into the Pipeline without the need to explicitly instantiate operators and will describe element-wise operations on Tensors.

At least one of the inputs must be a TensorList input that is returned by other DALI Operator. The other can be nvidia.dali.types.Constant() or regular Python value of type bool, int or float.

As the operations performed are element-wise, the shapes of all operands must match.

Note

If one of the operands is a batch of Tensors representing scalars the scalar values are broadcasted to the other operand.

For details and examples see expressions tutorials.

### Supported arithmetic operations¶

Currently, DALI supports the following operations:

Unary arithmetic operators: +, -

Unary operators implementing __pos__(self) and __neg__(self). The result of an unary arithmetic operation always keeps the input type. Unary operators accept only TensorList inputs from other operators.

Return type

TensorList of the same type

Binary arithmetic operations: +, -, *, /, //

Binary operators implementing __add__, __sub__, __mul__, __truediv__ and __floordiv__ respectively.

The result of arithmetic operation between two operands is described below, with the exception of /, the __truediv__ operation, which always returns float32 or float64 types.

Operand Type

Operand Type

Result Type

T

T

T

floatX

T

floatX

where T is not a float

floatX

floatY

floatZ

where Z = max(X, Y)

intX

intY

intZ

where Z = max(X, Y)

uintX

uintY

uintZ

where Z = max(X, Y)

intX

uintY

int2Y

if X <= Y

intX

uintY

intX

if X > Y

T stands for any one of the supported numerical types: bool, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64.

bool type is considered the smallest unsigned integer type and is treated as uint1 with respect to the table above.

Note

Type promotions are commutative.

Note

The only allowed arithmetic operation between two bool values is multiplication *.

Return type

TensorList of type calculated based on type promotion rules.

Comparison operations: ==, !=, <, <=, >, >=

Comparison operations.

Return type

TensorList of bool type.

Bitwise binary operations: &, |, ^`

The bitwise binary operations abide by the same type promotion rules as arithmetic binary operations, but their inputs are restricted to integral types (bool included).

Note

A bitwise operation can be applied to two boolean inputs. Those operations can be used to emulate element-wise logical operations on Tensors.

Return type

TensorList of type calculated based on type promotion rules.