# Supported operations¶

• CPU operator means that the operator can be scheduled on the CPU. Outputs of CPU operators may be used as regular inputs and to provide per-sample parameters for other operators through tensor arguments.

• GPU operator means that the operator can be scheduled on the GPU. Their outputs may only be used as regular inputs for other GPU operators and pipeline outputs.

• Mixed operator means that the operator accepts input on the CPU while producing the output on the GPU.

• Sequences means that the operator can work (produce or accept as an input) sequence (video like) kind of input.

• Volumetric means that the operator supports 3D data processing.

## How to read this doc¶

DALI Operators are used in two steps - creating the parametrized Operator instance using its constructor and later invoking its __call__ operator in define_graph() method of the Pipeline.

Documentation of every DALI Operator lists Keyword Arguments supported by the class constructor.

The documentation for __call__ operator lists the positional arguments (Parameters) and additional Keyword Arguments. __call__ should be only used in the define_graph(). The inputs to the __call__ operator represent Tensors (or rather batches of Tensors) processed by DALI, which are returned by other DALI Operators.

The Keyword Arguments listed in __call__ operator accept Tensor argument inputs. They should be produced by other ‘cpu’ Operators.

Note

The names of positional arguments for __call__ operator (Parameters) are only for the documentation purposes and should not be used as keyword arguments.

Note

Some Keyword Arguments can be listed twice - once for class constructor and once for __call__ operator. This means they can be parametrized during operator construction with some Python values or driven by output of other operator when running the pipeline.

## Support table¶

Below table lists all available operators and devices they can operate on.

Operator name

CPU

GPU

Mixed

Sequences

Volumetric

AudioDecoder

BbFlip

BBoxPaste

BoxEncoder

Brightness

BrightnessContrast

Caffe2Reader

CaffeReader

Cast

COCOReader

CoinFlip

ColorSpaceConversion

ColorTwist

Contrast

Copy

Crop

CropMirrorNormalize

DLTensorPythonFunction

DumpImage

ElementExtract

ExternalSource

FastResizeCropMirror

FileReader

Flip

Hsv

Hue

ImageDecoder

ImageDecoderCrop

ImageDecoderRandomCrop

ImageDecoderSlice

Jitter

LookupTable

MelFilterBank

MFCC

MXNetReader

NormalDistribution

OpticalFlow

Pad

Paste

PowerSpectrum

PreemphasisFilter

PythonFunction

RandomBBoxCrop

RandomResizedCrop

Reshape

Resize

ResizeCropMirror

Rotate

Saturation

SequenceReader

Shapes

Slice

Spectrogram

Sphere

SSDRandomCrop

TFRecordReader

ToDecibels

TorchPythonFunction

Transpose

Uniform

VideoReader

WarpAffine

Water

## Operators documentation¶

class nvidia.dali.ops.AudioDecoder(**kwargs)

Decode audio data. This operator is a generic way of handling encoded data in DALI. It supports most of well-known audio formats (wav, flac, ogg).

This operator produces two outputs:

• output[0]: batch of decoded data

• output[1]: batch of sampling rates [Hz]

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• downmix (bool, optional, default = False) – If True, downmix all input channels to mono.

• dtype (int, optional, default = 9) – Type of the output data. Supports types: INT16, INT32, FLOAT

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• quality (float, optional, default = 50.0) – Resampling quality, 0 is lowest, 100 is highest. 0 corresponds to 3 lobes of the sinc filter; 50 gives 16 lobes and 100 gives 64 lobes.

• sample_rate (float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

sample_rate (Tensor of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

class nvidia.dali.ops.BBoxPaste(**kwargs)

Transforms bounding boxes so that they are in the same place in the image after pasting it onto a larger canvas.

Corner coordinates:

(x', y') = (x/ratio + paste_x', y/ratio + paste_y')


Box sizes:

(w', h') = (w/ratio, h/ratio)


Where:

paste_x' = paste_x * (ratio - 1)/ratio
paste_y' = paste_y * (ratio - 1)/ratio


Paste coordinates are normalized so that (0,0) aligns the image to top-left of the canvas and (1,1) aligns it to bottom-right.

Supported backends

• ‘cpu’

Keyword Arguments
• ratio (float) – Ratio of canvas size to input size, must be > 1.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

• paste_x (float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• ratio (Tensor of float) – Ratio of canvas size to input size, must be > 1.

• paste_x (Tensor of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (Tensor of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

class nvidia.dali.ops.BbFlip(**kwargs)

Operator for horizontal flip (mirror) of bounding box. Input: Bounding box coordinates; in either [x, y, w, h] or [left, top, right, bottom] format. All coordinates are in the image coordinate system (i.e. 0.0-1.0)

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• horizontal (int, optional, default = 1) – Perform flip along horizontal axis.

• ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• vertical (int, optional, default = 0) – Perform flip along vertical axis.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• horizontal (Tensor of int, optional, default = 1) – Perform flip along horizontal axis.

• vertical (Tensor of int, optional, default = 0) – Perform flip along vertical axis.

class nvidia.dali.ops.BoxEncoder(**kwargs)

Encodes input bounding boxes and labels using set of default boxes (anchors) passed during op construction. Follows algorithm described in https://arxiv.org/abs/1512.02325 and implemented in https://github.com/mlperf/training/tree/master/single_stage_detector/ssd Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b], and Labels containing the corresponding label for each bounding box. Results are two tensors: EncodedBBoxes containing M encoded bounding boxes as [l,t,r,b], where M is number of anchors and EncodedLabels containing the corresponding label for each encoded box.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• anchors (float or list of float) – Anchors to be used for encoding. List of floats in ltrb format.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• criteria (float, optional, default = 0.5) – Threshold IOU for matching bounding boxes with anchors. Value between 0 and 1.

• means (float or list of float, optional, default = [0.0, 0.0, 0.0, 0.0]) – [x y w h] means for offset normalization.

• offset (bool, optional, default = False) – Returns normalized offsets ((encoded_bboxes*scale - anchors*scale) - mean) / stds in EncodedBBoxes using std, mean and scale arguments (default values are transparent).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• scale (float, optional, default = 1.0) – Rescale the box and anchors values before offset calculation (e.g. to get back to absolute values).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• stds (float or list of float, optional, default = [1.0, 1.0, 1.0, 1.0]) – [x y w h] standard deviations for offset normalization.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.BoxEncoder() for full documentation.

class nvidia.dali.ops.Brightness(**kwargs)

Warning

This operator is now deprecated. Use BrightnessContrast instead.

Changes the brightness of an image

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (int, optional, default = 0) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

brightness (Tensor of float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

class nvidia.dali.ops.BrightnessContrast(**kwargs)

Adjust the brightness and contrast of the image according to the formula:

out = brightness_shift * output_range + brightness * (grey + contrast * (in - grey))


where output_range is 1 for float outputs or the maximum positive value for integral types; grey denotes the value of 0.5 for float, 128 for uint8, 16384 for int16, etc.

Additionally, this operator can change the type of data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) – Brightness mutliplier; 1.0 is neutral.

• brightness_shift (float, optional, default = 0.0) – Brightness shift; 0 is neutral; for signed types, 1.0 means maximum positive value that can be represented by the type.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) – Set the contrast multiplier; 1.0 is neutral, 0.0 produces uniform grey.

• contrast_center (float, optional, default = 0.5) – Sets the instensity level that is unaffected by contrast - this is the value which all pixels assume when contrast is zero. When not set, the half of the input types’s positive range (or 0.5 for float) is used.

• dtype (int, optional, default = -1) – Output data type; if not set, the input type is used.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• brightness (Tensor of float, optional, default = 1.0) – Brightness mutliplier; 1.0 is neutral.

• brightness_shift (Tensor of float, optional, default = 0.0) – Brightness shift; 0 is neutral; for signed types, 1.0 means maximum positive value that can be represented by the type.

• contrast (Tensor of float, optional, default = 1.0) – Set the contrast multiplier; 1.0 is neutral, 0.0 produces uniform grey.

class nvidia.dali.ops.COCOReader(**kwargs)

Read data from a COCO dataset composed of directory with images and an annotation files. For each image, with m bboxes, returns its bboxes as (m,4) Tensor (m * [x, y, w, h] or m * [left, top, right, bottom]) and labels as (m,1) Tensor (m * category_id).

Supported backends

• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing data files.

• annotations_file (str, optional, default = '') – List of paths to the JSON annotations files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dump_meta_files (bool, optional, default = False) – If true, operator will dump meta files in folder provided with dump_meta_files_path.

• dump_meta_files_path (str, optional, default = '') – Path to directory for saving meta files containing preprocessed COCO annotations.

• file_list (str, optional, default = '') – Path to the file with a list of pairs file id (leave empty to traverse the file_root directory to obtain files and labels)

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• ltrb (bool, optional, default = False) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height].

• masks (bool, optional, default = False) –

If true, segmentation masks are read and returned as polygons. Each mask can be one or more polygons. A polygon is a list of points (2 floats). For a given sample, the polygons are represented by two tensors:

• masks_coords-> list of (x,y) coordinates

One mask can have one or more masks_meta having the same mask_idx, which means that the mask for that given index consists of several polygons). start_idx indicates the index of the first coords in masks_coords. Currently skips objects with iscrowd=1 annotations (RLE masks, not suitable for instance segmentation).

• meta_files_path (str, optional, default = '') – Path to directory with meta files containing preprocessed COCO annotations.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• ratio (bool, optional, default = False) – If true, bboxes returned values as expressed as ratio w.r.t. to the image width and height.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• save_img_ids (bool, optional, default = False) – If true, image IDs will also be returned.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch.

• size_threshold (float, optional, default = 0.1) – If width or height of a bounding box representing an instance of an object is under this value, object will be skipped during reading. It is represented as absolute value.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.Caffe2Reader(**kwargs)

Read sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).

Supported backends

• ‘cpu’

Keyword Arguments
• path (str or list of str) – List of paths to Caffe2 LMDB directories.

• additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors provided for each sample.

• bbox (bool, optional, default = False) – Denotes if bounding-box information is present.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_available (bool, optional, default = True) – If image is available at all in this LMDB.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• label_type (int, optional, default = 0) –

Type of label stored in dataset.

• 0 = SINGLE_LABEL : single integer label for multi-class classification

• 1 = MULTI_LABEL_SPARSE : sparse active label indices for multi-label classification

• 2 = MULTI_LABEL_DENSE : dense label embedding vector for label embedding regression

• 3 = MULTI_LABEL_WEIGHTED_SPARSE : sparse active label indices with per-label weights for multi-label classification.

• 4 = NO_LABEL : no label is available.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_labels (int, optional, default = 1) – Number of classes in dataset. Required when sparse labels are used.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.CaffeReader(**kwargs)

Read (Image, label) pairs from a Caffe LMDB.

Supported backends

• ‘cpu’

Keyword Arguments
• path (str or list of str) – List of paths to Caffe LMDB directories.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_available (bool, optional, default = True) – If image is available at all in this LMDB.

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• label_available (bool, optional, default = True) – If label is available at all.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.Cast(**kwargs)

Cast tensor to a different type.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• dtype (nvidia.dali.types.DALIDataType) – Output data type.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.CoinFlip(**kwargs)

Produce tensor filled with 0s and 1s - results of random coin flip, usable as an argument for select ops.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• probability (float, optional, default = 0.5) – Probability of returning 1.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.ColorSpaceConversion(**kwargs)

Converts between various image color models.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• image_type (nvidia.dali.types.DALIImageType) – The color space of the input image

• output_type (nvidia.dali.types.DALIImageType) – The color space of the output image

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.ColorTwist(**kwargs)

Warning

This operator is now deprecated. Use Hsv/BrightnessContrast instead.

Combination of hue, saturation, contrast and brightness.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• brightness (float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• hue (float, optional, default = 0.0) – Hue change, in degrees.

• image_type (int, optional, default = 0) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• brightness (Tensor of float, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

• 0 - black image,

• 1 - no change

• 2 - increase brightness twice

• contrast (Tensor of float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• hue (Tensor of float, optional, default = 0.0) – Hue change, in degrees.

• saturation (Tensor of float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

class nvidia.dali.ops.Contrast(**kwargs)

Warning

This operator is now deprecated. Use BrightnessContrast instead.

Changes the color contrast of the image.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• contrast (float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

• image_type (int, optional, default = 0) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

contrast (Tensor of float, optional, default = 1.0) –

Contrast change factor. Values >= 0 are accepted. For example:

• 0 - gray image,

• 1 - no change

• 2 - increase contrast twice

class nvidia.dali.ops.Copy(**kwargs)

Make a copy of the input tensor.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Crop(**kwargs)

Crops image with a given window dimensions and window position (upper left corner).

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• image_type (int, optional, default = 0) – The color space of input and output image

• output_dtype (int, optional, default = -1) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• crop_d (Tensor of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (Tensor of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (Tensor of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (Tensor of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

class nvidia.dali.ops.CropMirrorNormalize(**kwargs)

Perform fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting. Normalization takes input image and produces output using formula:

output = (input - mean) / std


Note that not providing any crop argument will result into mirroring and normalization only.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• image_type (int, optional, default = 0) – The color space of input and output image

• mean (float or list of float, optional, default = [0.0]) – Mean pixel values for image normalization.

• mirror (int, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.

• output_dtype (int, optional, default = 9) – Output data type. Supported types: FLOAT and FLOAT16

• output_layout (str, optional, default = 'CHW') – Output tensor data layout

• pad_output (bool, optional, default = False) – Whether to pad the output to number of channels being a power of 2.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• std (float or list of float, optional, default = [1.0]) – Standard deviation values for image normalization.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• crop_d (Tensor of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (Tensor of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (Tensor of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (Tensor of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (Tensor of int, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.

class nvidia.dali.ops.DLTensorPythonFunction(function, num_outputs=1, device='cpu', synchronize_stream=True, batch_processing=True, **kwargs)

Execute a python function that operates on DLPack tensors. In case of the GPU operator it is a user’s responsibility to synchronize the device code with DALI. This can be accomplished by synchronizing DALI’s work before the operator call with the synchronize_stream flag (true by default) and then making sure the scheduled device tasks are finished within the operator call. Alternatively, the gpu code can be done on the DALI’s stream which may be determined by calling the current_dali_stream() function. In this case, the synchronize_stream flag can be set to false.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• function (object) – Function object consuming and producing numpy arrays.

• batch_processing (bool, optional, default = True) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• synchronize_stream (bool, optional, default = True) – Make DALI synchronize its CUDA stream before calling the python function. Should be set to false only if the called function schedules the device job to the stream used by DALI.

class nvidia.dali.ops.DumpImage(**kwargs)

Save images in batch to disk in PPM format. Useful for debugging.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• input_layout (str, optional, default = 'HWC') – Layout of input images.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• suffix (str, optional, default = '') – Suffix to be added to output file names.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.ElementExtract(**kwargs)

Extracts one or more elements from input.

This operator expects sequence inputs.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• element_map (int or list of int) – Indices of extracted elements

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.ExternalSource(**kwargs)

Allows externally provided data to be passed as an input to the pipeline, see nvidia.dali.pipeline.Pipeline.feed_input() and nvidia.dali.pipeline.Pipeline.iter_setup(). Currently this operator is not supported in TensorFlow. It is worth noting that fed inputs should match the number of dimensions expected by the next operator in the pipeline (e.g. NHWC will expect 3-dimensional tensors where the last dimension represents the different channels).

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.FastResizeCropMirror(**kwargs)

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work performed.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• image_type (int, optional, default = 0) – The color space of input and output image

• interp_type (int, optional, default = 1) – Type of interpolation used.

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• mirror (int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• output_dtype (int, optional, default = -1) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• crop_d (Tensor of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (Tensor of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (Tensor of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (Tensor of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (Tensor of int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• resize_longer (Tensor of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (Tensor of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (Tensor of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (Tensor of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.FileReader(**kwargs)

Read (Image, label) pairs from a directory

Supported backends

• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing data files. FileReader supports flat directory structure. file_root directory should contain directories with images in them. To obtain labels FileReader sorts directories in file_root in alphabetical order and takes an index in this order as a class label.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• file_list (str, optional, default = '') – Path to the file with a list of pairs file label (leave empty to traverse the file_root directory to obtain files and labels)

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch. It is exclusive with stick_to_shard and random_shuffle.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.Flip(**kwargs)

Flip the image over the horizontal and/or vertical axes.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• depthwise (int, optional, default = 0) – Perform a depthwise flip.

• horizontal (int, optional, default = 1) – Perform a horizontal flip.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• vertical (int, optional, default = 0) – Perform a vertical flip.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• depthwise (Tensor of int, optional, default = 0) – Perform a depthwise flip.

• horizontal (Tensor of int, optional, default = 1) – Perform a horizontal flip.

• vertical (Tensor of int, optional, default = 0) – Perform a vertical flip.

class nvidia.dali.ops.Hsv(**kwargs)

This operator performs HSV manipulation. To change hue, saturation and/or value of the image, pass corresponding coefficients. Keep in mind, that hue has additive delta argument, while for saturation and value they are multiplicative.

This operator accepts RGB color space as an input.

For performance reasons, the operation is approximated by a linear transform in RGB space. The color vector is projected along the neutral (gray) axis, rotated (according to hue delta) and scaled according to value and saturation multiplers, and then restored to original color space.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (int, optional, default = 0) – Output data type; if not set, the input type is used.

• hue (float, optional, default = 0.0) – Set additive change of hue. 0 denotes no-op

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) – Set multiplicative change of saturation. 1 denotes no-op

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• value (float, optional, default = 1.0) – Set multiplicative change of value. 1 denotes no-op

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• hue (Tensor of float, optional, default = 0.0) – Set additive change of hue. 0 denotes no-op

• saturation (Tensor of float, optional, default = 1.0) – Set multiplicative change of saturation. 1 denotes no-op

• value (Tensor of float, optional, default = 1.0) – Set multiplicative change of value. 1 denotes no-op

class nvidia.dali.ops.Hue(**kwargs)

Warning

This operator is now deprecated. Use Hsv instead.

Changes the hue level of the image.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• hue (float, optional, default = 0.0) – Hue change, in degrees.

• image_type (int, optional, default = 0) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

hue (Tensor of float, optional, default = 0.0) – Hue change, in degrees.

class nvidia.dali.ops.ImageDecoder(**kwargs)

Decode images. For jpeg images, the implementation will be based on nvJPEG library or libjpeg-turbo depending on the selected backend (mixed and cpu respectively). Other image formats are decoded with OpenCV or other specific libraries (e.g. libtiff). The Output of the decoder is in HWC ordering.

Supported backends

• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• cache_batch_copy (bool, optional, default = True) – mixed backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

• cache_debug (bool, optional, default = False) – mixed backend only Print debug information about decoder cache.

• cache_size (int, optional, default = 0) – mixed backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

• cache_threshold (int, optional, default = 0) – mixed backend only Size threshold (in bytes) for images (after decoding) to be cached.

• cache_type (str, optional, default = '') – mixed backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• output_type (int, optional, default = 0) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.ImageDecoderCrop(**kwargs)

Decode images with a fixed cropping window size and variable anchor. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Supported backends

• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• output_type (int, optional, default = 0) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• crop_d (Tensor of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (Tensor of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (Tensor of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (Tensor of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

class nvidia.dali.ops.ImageDecoderRandomCrop(**kwargs)

Decode images with a random cropping anchor/window. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Supported backends

• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

• output_type (int, optional, default = 0) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

• random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.ImageDecoderSlice(**kwargs)

Decode images on the host with a cropping window of given size and anchor. Inputs must be supplied as 3 separate tensors in a specific order: data containing input data, anchor containing either normalized or absolute coordinates (depending on the value of normalized_anchor) for the starting point of the slice (x0, x1, x2, …), and shape containing either normalized or absolute coordinates (depending on the value of normalized_shape) for the dimensions of the slice (s0, s1, s2, …). Both anchor and shape coordinates must be within the interval [0.0, 1.0] for normalized coordinates, or within the image shape for absolute coordinates. Both anchor and shape inputs will provide as many dimensions as specified with arguments axis_names or axes. By default ImageDecoderSlice operator uses normalized coordinates and WH order for the slice arguments. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Supported backends

• ‘cpu’

• ‘mixed’

Keyword Arguments
• affine (bool, optional, default = True) – mixed backend only If internal threads should be affined to CPU cores

• axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for anchor and shape slice inputs, as dimension indexes

• axis_names (str, optional, default = 'WH') – Order of dimensions used for anchor and shape slice inputs, as described in layout. If provided, axis_names takes higher priority than axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• device_memory_padding (int, optional, default = 16777216) – mixed backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• host_memory_padding (int, optional, default = 8388608) – mixed backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

• hybrid_huffman_threshold (int, optional, default = 1000000) – mixed backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

• normalized_anchor (bool, optional, default = True) – Whether or not the anchor input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• normalized_shape (bool, optional, default = True) – Whether or not the shape input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• output_type (int, optional, default = 0) – The color space of output image.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• split_stages (bool, optional, default = False) – mixed backend only Split into separated CPU stage and GPU stage operators

• use_chunk_allocator (bool, optional, default = False) – Experimental, mixed backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

• use_fast_idct (bool, optional, default = False) – Enables fast IDCT in CPU based decompressor when GPU implementation cannot handle given image. According to libjpeg-turbo documentation, decompression performance is improved by 4-14% with very little loss in quality.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.ImageDecoderSlice() for full documentation.

class nvidia.dali.ops.Jitter(**kwargs)

Perform a random Jitter augmentation. The output image is produced by moving each pixel by a random amount bounded by half of nDegree parameter (in both x and y dimensions).

Supported backends

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• interp_type (int, optional, default = 0) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in range [-nDegree/2, nDegree/2].

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

mask (Tensor of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.ops.LookupTable(**kwargs)

Maps input to output by using a lookup table specified by keys and values and a default_value for non specified keys.

keys and values are used to define the lookup table:

keys[] =   {0,     2,   3,   4,   5,    3}
values[] = {0.2, 0.4, 0.5, 0.6, 0.7, 0.10}
default_value = 0.99


yielding:

lut[] = {0.2, 0.99, 0.4, 0.10, 0.6, 0.7}  // only last occurrence of a key is considered


producing the output according to the formula:

Output[i] = lut[Input[i]]   if 0 <= Input[i] <= len(lut)
Output[i] = default_value   otherwise


Example:

Input[] =  {1,      4,    1,   0,  100,   2,     3,   4}
Output[] = {0.99, 0.6, 0.99, 0.2, 0.99, 0.4,  0.10, 0.6}


Note: Only integer types can be used as input to this operator.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• default_value (float, optional, default = 0.0) – Default output value for keys not present in the table.

• keys (int or list of int, optional, default = []) – input values (keys) present in the lookup table. Length of keys and values argument should match.keys should be in the range [0, 65535].

• output_dtype (int, optional, default = 9) – Output data type.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• values (float or list of float, optional, default = []) – mapped output values for each keys entry. Length of keys and values argument should match.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.MFCC(**kwargs)

Mel Frequency Cepstral Coefficiencs (MFCC). Computes MFCCs from a mel spectrogram.

Supported backends

• ‘cpu’

Keyword Arguments
• axis (int, optional, default = 0) – Axis over which the transform will be applied. If not provided, the outer-most dimension will be used.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dct_type (int, optional, default = 2) – Discrete Cosine Transform type. Supported types are: 1, 2, 3, 4. The formulas used to calculate the DCT are equivalent to those described in https://en.wikipedia.org/wiki/Discrete_cosine_transform

• lifter (float, optional, default = 0.0) –

Cepstral filtering (also known as liftering) coefficient. If lifter > 0, the MFCCs will be scaled according to the following formula:

MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)


• n_mfcc (int, optional, default = 20) – Number of MFCC coefficients

• normalize (bool, optional, default = False) – If true, the DCT will use an ortho-normal basis. Note: Normalization is not supported for dct_type=1.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.MXNetReader(**kwargs)

Read sample data from a MXNet RecordIO.

Supported backends

• ‘cpu’

Keyword Arguments
• index_path (str or list of str) – List (of length 1) containing a path to index (.idx) file. It is generated by the MXNet’s im2rec.py script together with RecordIO file. It can also be generated using rec2idx script distributed with DALI.

• path (str or list of str) – List of paths to RecordIO files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.MelFilterBank(**kwargs)

Converts a Spectrogram to a mel Spectrogram using triangular filter banks. Expects an input with 2 or more dimensions where the last two dimensions correspond to the fft bin index and the window index respectively.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• freq_high (float, optional, default = 0.0) – Maximum frequency. If not provided, sample_rate / 2 will be used

• freq_low (float, optional, default = 0.0) – Minimum frequency

• mel_formula (str, optional, default = 'slaney') – Determines the formula used to convert frequencies from Hertz to mel and viceversa. The mel scale is a perceptual scale of pitches and therefore there is no single formula to it. Supported values are: - “slaney” : Follows Slaney’s MATLAB Auditory Modelling Work behavior. This formula is linear under 1 KHz and logarithmic above. This implementation is consistent with Librosa’s default implementation. - “htk” : Follows O’Shaughnessy’s book formula m = 2595 * log10(1 + (f/700)). This is consistent with the implementation of the Hidden Markov Toolkit (HTK).

• nfilter (int, optional, default = 128) – Number of mel filters.

• normalize (bool, optional, default = True) – Whether to normalize the triangular filter weights by the width of their mel band. If set to true, the integral of the filter function will amount to 1. If set to false, the peak of the filter function will be 1

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.NormalDistribution(**kwargs)

Creates a tensor that consists of data distributed normally. This operator can be ran in 3 modes, which determine the shape of the output tensor: 1. Providing an input batch to this operator results in a batch of output tensors, which have the same shape as the input tensors. 2. Providing a custom shape as an argument results in an output batch, where every tensor has the same (given) shape. 3. Providing no input arguments results in an output batch of scalars, distributed normally.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (int, optional, default = 9) – Data type for the output

• mean (float, optional, default = 0.0) – Mean value of the distribution

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – Shape of single output tensor in a batch

• stddev (float, optional, default = 1.0) – Standard deviation of the distribution

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• mean (Tensor of float, optional, default = 0.0) – Mean value of the distribution

• stddev (Tensor of float, optional, default = 1.0) – Standard deviation of the distribution

class nvidia.dali.ops.OpticalFlow(**kwargs)

Calculates the Optical Flow for sequence of images given as a input. Mandatory input for the operator is a sequence of frames. As an optional input, operator accepts external hints for OF calculation. The output format of this operator matches the output format of OF driver API. Dali uses Turing optical flow hardware implementation: https://developer.nvidia.com/opticalflow-sdk

This operator allows sequence inputs.

Supported backends

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• enable_external_hints (bool, optional, default = False) – enabling/disabling external hints for OF calculation. External hints are analogous to temporal hints, only they come from external source. When this option is enabled, Operator requires 2 inputs.

• enable_temporal_hints (bool, optional, default = False) – enabling/disabling temporal hints for sequences longer than 2 images. They are used to speed up calculation: previous OF result in sequence is used to calculate current flow. You might want to use temporal hints for sequences, that don’t have much changes in the scene (e.g. only moving objects)

• image_type (int, optional, default = 0) – Type of input images (RGB, BGR, GRAY)

• output_format (int, optional, default = -1) – Setting grid size for output vector. Value defines width of grid square (e.g. if value == 4, 4x4 grid is used). For values <=0, grid size is undefined. Currently only grid_size=4 is supported.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• preset (float, optional, default = 0.0) –

Setting quality level of OF calculation.

0.0f … 1.0f, where 1.0f is best quality, lowest speed

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.OpticalFlow() for full documentation.

class nvidia.dali.ops.Pad(**kwargs)

Pads all samples with fill_value in the given axes, to match the size of the biggest dimension on those axes in the batch. The element padding axes is specified with the argument axes. Supported types: int, float.

Examples:

• Batch of 3 1-D samples, fill_value = -1, axes = (0,)

input  = [{3, 4, 2, 5, 4},
{2, 2},
{3, 199, 5}};
output = [{3, 4, 2, 5, 4},
{2, 2, -1, -1, -1},
{3, 199, 5, -1, -1}]

• Batch of 2 2-D samples, fill_value = 42, axes = (1,)

input  = [{{1, 2 , 3, 4},
{5, 6, 7, 8}},
{{1, 2},
{4, 5}}]
output = [{{1,  2,  3,  4},
{5,  6,  7,  8}},
{{1,  2, 42, 42},
{4,  5, 42, 42}}]


Supported backends

• ‘gpu’

Keyword Arguments
• axes (int or list of int, optional, default = []) – The axes on which the batch samples will be padded. Indexes are zero-based with 0 being the first axis or outermost dimension of the tensor. If axes is empty or not provided, the output will be padded on all the axes.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – The value to pad the batch with

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Paste(**kwargs)

Paste the input image on a larger canvas. The canvas size is equal to input size * ratio.

Supported backends

• ‘gpu’

Keyword Arguments
• fill_value (int or list of int) – Tuple of values of the color to fill the canvas. Length of the tuple needs to be equal to n_channels.

• ratio (float) – Ratio of canvas size to input size, must be > 1.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• min_canvas_size (float, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.

• n_channels (int, optional, default = 3) – Number of channels in the image.

• paste_x (float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• ratio (Tensor of float) – Ratio of canvas size to input size, must be > 1.

• min_canvas_size (Tensor of float, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.

• paste_x (Tensor of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

• paste_y (Tensor of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

class nvidia.dali.ops.PowerSpectrum(**kwargs)

Power spectrum of signal.

Supported backends

• ‘cpu’

Keyword Arguments
• axis (int, optional, default = -1) – Index of the dimension to be transformed to the frequency domain. By default, the last dimension is selected.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• nfft (int, optional, default = -1) – Size of the FFT. By default nfft is selected to match the lenght of the data in the transformation axis. The number of bins created in the output is nfft // 2 + 1 (positive part of the spectrum only).

• power (int, optional, default = 2) – Exponent of the fft magnitude: Supported values are 2 for power spectrum (real*real + imag*imag) and 1 for complex magnitude (sqrt(real*real + imag*imag)).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.PreemphasisFilter(**kwargs)

This operator performs preemphasis filter on the input data. This filter in simple form can be expressed by the formula:

Y(t) = X(t) - X(t-1)*coeff


Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• dtype (int, optional, default = 9) – Data type for the output

• preemph_coeff (float, optional, default = 0.97) – Preemphasis coefficient coeff

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.PythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)

Executes a python function.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends

• ‘cpu’

Keyword Arguments
• function (object) – Function object consuming and producing numpy arrays.

• batch_processing (bool, optional, default = False) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

static current_stream()

Get DALI’s current CUDA stream.

class nvidia.dali.ops.PythonFunctionBase(impl_name, function, num_outputs=1, device='cpu', **kwargs)

Supported backends

Keyword Arguments
• function (object) – Function object consuming and producing numpy arrays.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.RandomBBoxCrop(**kwargs)

Perform a prospective crop to an image while keeping bounding boxes and labels consistent. Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b] or [x,y,w,h], and Labels containing the corresponding label for each bounding box. Resulting prospective crop is provided as two Tensors: Begin containing the starting coordinates for the crop in (x,y) format, and ‘Size’ containing the dimensions of the crop in (w,h) format. Bounding boxes are provided as a (m*4) Tensor, where each bounding box is represented as [l,t,r,b] or [x,y,w,h]. Resulting labels match the boxes that remain, after being discarded with respect to the minimum accepted intersection threshold. Be advised, when allow_no_crop is false and thresholds does not contain 0 it is good to increase num_attempts as otherwise it may loop for a very long time.

Supported backends

• ‘cpu’

Keyword Arguments
• allow_no_crop (bool, optional, default = True) – If true, includes no cropping as one of the random options.

• aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] of valid aspect ratio values for new crops. Value for min should be greater or equal to 0.0. Default values disallow changes in aspect ratio.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• ltrb (bool, optional, default = True) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height].

• num_attempts (int, optional, default = 1) – Number of attempts to retrieve a patch with the desired parameters.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• scaling (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] for crop size with respect to original image dimensions. Value for min should be greater or equal to 0.0.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• thresholds (float or list of float, optional, default = [0.0]) – Minimum overlap (Intersection over union) of the bounding boxes with respect to the prospective crop. Selected at random for every sample from provided values. Default imposes no restrictions on Intersection over Union for boxes and crop.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.RandomBBoxCrop() for full documentation.

class nvidia.dali.ops.RandomResizedCrop(**kwargs)

Perform a crop with randomly chosen area and aspect ratio, then resize it to given size. Expects a 3-dimensional input with samples in HWC layout (height, width, channels).

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• size (int or list of int) – Size of resized image.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• interp_type (int, optional, default = 1) – Type of interpolation used. Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

• mag_filter (int, optional, default = 1) – Filter used when scaling up

• min_filter (int, optional, default = 1) – Filter used when scaling down

• minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

• num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

• random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Reshape(**kwargs)

Treats content of the input as if it had a different shape and layout.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• layout (str, optional, default = '') – New layout for the data. If not specified, the output layout is preserved if number of dimension matches existing layout or reset to empty otherwise

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shape (int or list of int, optional, default = []) – The desired shape of the output. Number of elements in each sample must match that of the input sample.

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.Reshape() for full documentation.

Keyword Arguments

shape (Tensor of int, optional, default = []) – The desired shape of the output. Number of elements in each sample must match that of the input sample.

class nvidia.dali.ops.Resize(**kwargs)

Resize images.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (int, optional, default = 0) – The color space of input and output image.

• interp_type (int, optional, default = 1) – Type of interpolation used. Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

• mag_filter (int, optional, default = 1) – Filter used when scaling up

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• min_filter (int, optional, default = 1) – Filter used when scaling down

• minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• save_attrs (bool, optional, default = False) – Save reshape attributes for testing.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• resize_longer (Tensor of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (Tensor of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (Tensor of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (Tensor of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.ResizeCropMirror(**kwargs)

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• crop (float or list of float, optional, default = [0.0, 0.0]) – Shape of the cropped image, specified as a list of value (e.g. (crop_H, crop_W) for 2D crop, (crop_D, crop_H, crop_W) for volumetric crop). Providing crop argument is incompatible with providing separate arguments crop_d, crop_h and crop_w.

• crop_d (float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• image_type (int, optional, default = 0) – The color space of input and output image

• interp_type (int, optional, default = 1) – Type of interpolation used.

• max_size (float or list of float, optional, default = [0.0, 0.0]) –

Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

Example:

Original image = 400x1200.

Resized with:

• resize_shorter = 200 (max_size not set) => 200x600

• resize_shorter = 200, max_size =  400 => 132x400

• resize_shorter = 200, max_size = 1000 => 200x600

• mirror (int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• output_dtype (int, optional, default = -1) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• resize_longer (float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• crop_d (Tensor of float, optional, default = 0.0) – Volumetric inputs only cropping window depth (in pixels). If provided, crop_h and crop_w should be provided as well. Providing crop_w, crop_h, crop_d is incompatible with providing fixed crop window dimensions (argument crop).

• crop_h (Tensor of float, optional, default = 0.0) – Cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• crop_pos_x (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

• crop_pos_y (Tensor of float, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

• crop_pos_z (Tensor of float, optional, default = 0.5) – Volumetric inputs only Normalized (0.0 - 1.0) normal position of the cropping window (front plane). Actual position is calculated as crop_z = crop_z_norm * (D - crop_d), where crop_z_norm is the normalized position, D is the depth of the image and crop_d is the depth of the cropping window.

• crop_w (Tensor of float, optional, default = 0.0) – Cropping window width (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

• mirror (Tensor of int, optional, default = 0) –

• 0 - do not perform horizontal flip for this image

• 1 - perform horizontal flip for this image.

• resize_longer (Tensor of float, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,resize_x and resize_y. The op will keep the aspect ratio of the original image.

• resize_shorter (Tensor of float, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

• resize_x (Tensor of float, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

• resize_y (Tensor of float, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

class nvidia.dali.ops.Rotate(**kwargs)

Rotate the image by given angle.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• angle (float) – Angle, in degrees, by which the image is rotated. For 2D data, the rotation is counter-clockwise, assuming top-left corner at (0,0) For 3D data, the angle is a positive rotation around given axis

• axis (float or list of float, optional, default = []) – 3D only: axis around which to rotate. The vector does not need to be normalized, but must have non-zero length. Reversing the vector is equivalent to changing the sign of angle.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Value used to fill areas that are outside source image. If not specified, source coordinates are clamped and the border pixel is repeated.

• interp_type (int, optional, default = 1) – Type of interpolation used.

• keep_size (bool, optional, default = False) – If True, original canvas size is kept. If False (default) and size is not set, then the canvas size is adjusted to acommodate the rotated image with least padding possible

• output_dtype (int, optional, default = -1) – Output data type. By default, same as input type

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• size (float or list of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments
• angle (Tensor of float) – Angle, in degrees, by which the image is rotated. For 2D data, the rotation is counter-clockwise, assuming top-left corner at (0,0) For 3D data, the angle is a positive rotation around given axis

• axis (Tensor of float, optional, default = []) – 3D only: axis around which to rotate. The vector does not need to be normalized, but must have non-zero length. Reversing the vector is equivalent to changing the sign of angle.

• size (Tensor of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

class nvidia.dali.ops.SSDRandomCrop(**kwargs)

Perform a random crop with bounding boxes where IoU meets randomly selected threshold between 0-1. When IoU falls below threshold new random crop is generated up to num_attempts. As an input, it accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_attempts (int, optional, default = 1) – Number of attempts.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.SSDRandomCrop() for full documentation.

class nvidia.dali.ops.Saturation(**kwargs)

Warning

This operator is now deprecated. Use Hsv instead.

Changes saturation level of the image.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (int, optional, default = 0) – The color space of input and output image

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• saturation (float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

saturation (Tensor of float, optional, default = 1.0) –

Saturation change factor. Values >= 0 are supported. For example:

• 0 - completely desaturated image

• 1 - no change to image’s saturation

class nvidia.dali.ops.SequenceReader(**kwargs)

Read [Frame] sequences from a directory representing collection of streams. Expects file_root to contain set of directories, each of them represents one extracted video stream. Extracted video stream is represented by one file for each frame, sorting the paths to frames lexicographically should give the original order of frames. Sequences do not cross stream boundary and only full sequences are considered - there is no padding.

Example directory structure:

- file_root
- 0
- 00001.png
- 00002.png
- 00003.png
- 00004.png
- 00005.png
- 00006.png
....

- 1
- 00001.png
- 00002.png
- 00003.png
- 00004.png
- 00005.png
- 00006.png
....


This operator allows sequence inputs.

Supported backends

• ‘cpu’

Keyword Arguments
• file_root (str) – Path to a directory containing streams (directories representing streams).

• sequence_length (int) – Length of sequence to load for each sample

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (int, optional, default = 0) – The color space of input and output image

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• step (int, optional, default = 1) – Distance between first frames of consecutive sequences

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• stride (int, optional, default = 1) – Distance between consecutive frames in sequence

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.Shapes(**kwargs)

Returns the shapes of inputs.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• type (int, optional, default = 7) – Data type, to which the sizes are converted.

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Slice(**kwargs)

Extract a subtensor or slice with a given shape and anchor. Inputs must be supplied as 3 separate tensors in a specific order: data, anchor and shape. Both anchor and shape coordinates must be within the interval [0.0, 1.0] for normalized coordinates, or within the image shape for absolute coordinates. Both anchor and shape inputs will provide as many dimensions as specified with arguments axis_names or axes. By default Slice operator uses normalized coordinates and WH order for the slice arguments.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for anchor and shape slice inputs, as dimension indexes

• axis_names (str, optional, default = 'WH') – Order of dimensions used for anchor and shape slice inputs, as described in layout. If provided, axis_names takes higher priority than axes

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• image_type (int, optional, default = 0) – The color space of input and output image

• normalized_anchor (bool, optional, default = True) – Whether or not the anchor input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• normalized_shape (bool, optional, default = True) – Whether or not the shape input should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates

• output_dtype (int, optional, default = -1) – Output data type. By default same data type as the input will be used. Supported types: FLOAT, FLOAT16, and UINT8

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, anchor, shape, **kwargs)

Operator call to be used in define_graph step.

Parameters
• data (Tensor) – Tensor containing input data

• anchor (1D tensor of floats) – Tensor containing either normalized or absolute coordinates (depending on the value of normalized_anchor) for the starting point of the slice (x0, x1, x2, …).

• shape (1D tensor of floats) – containing either normalized or absolute coordinates (depending on the value of normalized_shape) for the dimensions of the slice (s0, s1, s2, …).

class nvidia.dali.ops.Spectrogram(**kwargs)

Produces a spectrogram from a 1D signal (e.g. audio). Input data is expected to be single channel (1D shape (time)) or multi channel in planar layout (channel, time) 32 bit float tensor.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• center_windows (bool, optional, default = True) – Indicates whether extracted windows should be padded so that window function is centered at multiples of window_step. If set to false, the signal will not be padded, that is only windows within the input range will be extracted.

• nfft (int, optional, default = -1) – Size of the FFT. The number of bins created in the output is nfft // 2 + 1 (positive part of the spectrum only).

• power (int, optional, default = 2) – Exponent of the magnitude of the spectrum. Supported values are 1 for energy and 2 for power.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• reflect_padding (bool, optional, default = True) – Indicates the padding policy when sampling outside the bounds of the signal. If set to true, the signal is mirrored with respect to the boundary, otherwise the signal is padded with zeros. Note: This option is ignored when center_windows is set to false.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• window_fn (float or list of float, optional, default = []) – Samples of the window function that will be multiplied to each extracted window when calculating the STFT. If provided it should be a list of floating point numbers of size window_length. If not provided, a Hann window will be used.

• window_length (int, optional, default = 512) – Window size (in number of samples)

• window_step (int, optional, default = 256) – Step betweeen the STFT windows (in number of samples)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Sphere(**kwargs)

Perform a sphere augmentation.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• interp_type (int, optional, default = 0) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

mask (Tensor of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.ops.TFRecordReader(path, index_path, features, **kwargs)

Read sample data from a TensorFlow TFRecord file.

Supported backends

• ‘cpu’

Keyword Arguments
• features (dict of (string, nvidia.dali.tfrecord.Feature)) – Dictionary of names and configuration of features existing in TFRecord file. Typically obtained using helper functions dali.tfrecord.FixedLenFeature and dali.tfrecord.VarLenFeature, they are equivalent to TensorFlow’s tf.FixedLenFeature and tf.VarLenFeature respectively. For more flexibility dali.tfrecord.VarLenFeature supports partial_shape parameter. If provided, data will be reshaped to match its value. First dimension will be inferred from the data size.

• index_path (str or list of str) – List of paths to index files (1 index file for every TFRecord file). Index files may be obtained from TFRecord files using tfrecord2idx script distributed with DALI.

• path (str or list of str) – List of paths to TFRecord files.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.ToDecibels(**kwargs)

Converts a magnitude (real, positive) to the decibel scale, according to the formula:

min_ratio = pow(10, cutoff_db / multiplier)
out[i] = multiplier * log10( max(min_ratio, input[i] / reference) )


Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• cutoff_db (float, optional, default = -200.0) – Minimum or cut-off ratio in dB. Any value below this value will saturate. Example: A value of cutoff_db=-80 corresponds to a minimum ratio of 1e-8.

• multiplier (float, optional, default = 10.0) – Factor by which we multiply the logarithm (typically 10.0 or 20.0 depending if we are dealing with a squared magnitude or not).

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• reference (float, optional, default = 1.0) – Reference magnitude. If not provided, the maximum of the input will be used as reference. Note: The maximum of the input will be calculated on a per-sample basis.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Transpose(**kwargs)

Transpose tensor dimension to a new permutated dimension specified by perm.

This operator allows sequence inputs.

This operator supports volumetric data.

Supported backends

• ‘gpu’

Keyword Arguments
• perm (int or list of int) – Permutation of the dimensions of the input (e.g. [2, 0, 1]).

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• output_layout (str, optional, default = '') – If provided, sets output data layout, overriding any transpose_layout setting

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• transpose_layout (bool, optional, default = True) – When set to true, the output data layout will be transposed according to perm. Otherwise, the input layout is copied to the output

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

class nvidia.dali.ops.Uniform(**kwargs)

Produce tensor filled with uniformly distributed random numbers.

Supported backends

• ‘cpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• range (float or list of float, optional, default = [-1.0, 1.0]) – Range of produced random numbers.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.VideoReader(**kwargs)

Load and decode H264 video codec with FFmpeg and NVDECODE, NVIDIA GPU’s hardware-accelerated video decoding. The video codecs can be contained in most of container file formats. FFmpeg is used to parse video containers. Returns a batch of sequences of sequence_length frames of shape [N, F, H, W, C] (N being the batch size and F the number of frames). Supports only constant frame rate videos.

Supported backends

• ‘gpu’

Keyword Arguments
• sequence_length (int) – Frames to load per sequence.

• additional_decode_surfaces (int, optional, default = 2) – Additional decode surfaces to use beyond minimum required. This is ignored when decoder is not able to determine minimum number of decode surfaces, which may happen when using an older driver. This parameter can be used trade off memory usage with performance.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• channels (int, optional, default = 3) – Number of channels.

• dtype (int, optional, default = 0) – The data type of the output frames (supports FLOAT and UINT8).

• enable_frame_num (bool, optional, default = False) – Return frame number output if file_list or file_root argument is passed

• enable_timestamps (bool, optional, default = False) – Return timestamps output if file_list or file_root argument is passed

• file_list (str, optional, default = '') – Path to the file with a list of pairs file label. This option is mutually exclusive with filenames and file_root.

• file_list_frame_num (bool, optional, default = False) – If start/end timestamps are provided in file_list, interpret them as frame numbers instead of timestamp. If floating point values are given, then start frame number is ceiling of the number and end frame number is floor of the number. Frame numbers start from 0.

• file_root (str, optional, default = '') – Path to a directory containing data files. This option is mutually exclusive with filenames and file_list.

• filenames (str or list of str, optional, default = []) – File names of the video files to load. This option is mutually exclusive with file_root and file_list.

• image_type (int, optional, default = 0) – The color space of the output frames (supports RGB and YCbCr).

• initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

• lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

• normalized (bool, optional, default = False) – Get output as normalized data.

• num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

• pad_last_batch (bool, optional, default = False) – If set to true, the Loader will pad the last batch with the last image when the batch size is not aligned with the shard size.

• prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

• read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

• scale (float, optional, default = 1.0) – Rescaling factor of height and width.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• shard_id (int, optional, default = 0) – Id of the part to read.

• skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

• skip_vfr_check (bool, optional, default = False) – Skips check for variable frame rate on videos. This is useful when heuristic fails.

• step (int, optional, default = -1) – Frame interval between each sequence (if step < 0, step is set to sequence_length).

• stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

• stride (int, optional, default = 1) – Distance between consecutive frames in sequence.

• tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)

Operator call to be used in define_graph step. This operator does not accept any Tensor inputs.

class nvidia.dali.ops.WarpAffine(**kwargs)

Apply an affine transformation to the image.

This operator supports volumetric data.

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Value used to fill areas that are outside source image. If not specified, source coordinates are clamped and the border pixel is repeated.

• interp_type (int, optional, default = 1) – Type of interpolation used.

• matrix (float or list of float, optional, default = []) –

Transform matrix (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using the following formula

dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

• output_dtype (int, optional, default = -1) – Output data type. By default, same as input type

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

• size (float or list of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

__call__(*inputs, **kwargs)

Please refer to class nvidia.dali.ops.WarpAffine() for full documentation.

Keyword Arguments
• matrix (Tensor of float, optional, default = []) –

Transform matrix (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using the following formula

dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

• size (Tensor of float, optional, default = []) – Output size, in pixels/points. Non-integer sizes are rounded to nearest integer. Channel dimension should be excluded (e.g. for RGB images specify (480,640), not (480,640,3).

class nvidia.dali.ops.Water(**kwargs)

Perform a water augmentation (make image appear to be underwater).

Supported backends

• ‘cpu’

• ‘gpu’

Keyword Arguments
• ampl_x (float, optional, default = 10.0) – Amplitude of the wave in x direction.

• ampl_y (float, optional, default = 10.0) – Amplitude of the wave in y direction.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

• freq_x (float, optional, default = 0.049087) – Frequency of the wave in x direction.

• freq_y (float, optional, default = 0.049087) – Frequence of the wave in y direction.

• interp_type (int, optional, default = 0) – Type of interpolation used.

• mask (int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

• phase_x (float, optional, default = 0.0) – Phase of the wave in x direction.

• phase_y (float, optional, default = 0.0) – Phase of the wave in y direction.

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

__call__(data, **kwargs)

Operator call to be used in define_graph step.

Parameters

data (Tensor) – Input to the operator.

Keyword Arguments

mask (Tensor of int, optional, default = 1) –

Whether to apply this augmentation to the input image.

• 0 - do not apply this transformation

• 1 - apply this transformation

class nvidia.dali.plugin.pytorch.TorchPythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)

Executes a function operating on Torch tensors.

This operator allows sequence inputs.

This operator supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends

• ‘cpu’

Keyword Arguments
• function (object) – Function object consuming and producing numpy arrays.

• batch_processing (bool, optional, default = False) – Whether the function should get the whole batch as input.

• bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

• num_outputs (int, optional, default = 1) – Number of outputs

• preserve (bool, optional, default = False) – Do not remove the Op from the graph even if its outputs are unused.

• seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

## Arithmetic expressions¶

DALI allows to use regular Python arithmetic operations within define_graph() method on the values returned from invocations of other operators.

The expressions used will be incorporated into the Pipeline without the need to explicitly instantiate operators and will describe element-wise operations on Tensors.

At least one of the inputs must be a Tensor input that is returned by other DALI Operator. The other can be nvidia.dali.types.Constant() or regular Python value of type bool, int or float.

As the operations performed are element-wise, the shapes of all operands must match.

Note

If one of the operands is a batch of Tensors representing scalars the scalar values are broadcasted to the other operand.

For details and examples see expressions tutorials.

### Supported arithmetic operations¶

Currently, DALI supports the following operations:

Unary arithmetic operators: +, -

Unary operators implementing __pos__(self) and __neg__(self). The result of an unary arithmetic operation always keeps the input type. Unary operators accept only Tensor inputs from other operators.

Return type

Tensor of the same type

Binary arithmetic operations: +, -, *, /, //

Binary operators implementing __add__, __sub__, __mul__, __truediv__ and __floordiv__ respectively.

The result of arithmetic operation between two operands is described below, with the exception of /, the __truediv__ operation, which always returns float32 or float64 types.

Operand Type

Operand Type

Result Type

T

T

T

floatX

T

floatX

where T is not a float

floatX

floatY

floatZ

where Z = max(X, Y)

intX

intY

intZ

where Z = max(X, Y)

uintX

uintY

uintZ

where Z = max(X, Y)

intX

uintY

int2Y

if X <= Y

intX

uintY

intX

if X > Y

T stands for any one of the supported numerical types: bool, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64.

bool type is considered the smallest unsigned integer type and is treated as uint1 with respect to the table above.

Note

Type promotions are commutative.

Note

The only allowed arithmetic operation between two bool values is multiplication *.

Return type

Tensor of type calculated based on type promotion rules.

Comparison operations: ==, !=, <, <=, >, >=

Comparison operations.

Return type

Tensor of bool type.

Bitwise binary operations: &, |, ^

The bitwise binary operations abide by the same type promotion rules as arithmetic binary operations, but their inputs are restricted to integral types (bool included).

Note

A bitwise operation can be applied to two boolean inputs. Those operations can be used to emulate element-wise logical operations on Tensors.

Return type

Tensor of type calculated based on type promotion rules.