Supported operations

Below table lists all available operators and devices they can operate on.

Operator name CPU GPU Mixed Support
BBoxPaste tick      
BbFlip tick tick    
BoxEncoder tick tick    
Brightness tick tick    
COCOReader tick      
Caffe2Reader tick      
CaffeReader tick      
Cast tick tick    
CoinFlip       tick
ColorTwist tick tick    
Contrast tick tick    
Copy tick tick    
Crop tick tick    
CropCastPermute tick tick    
CropMirrorNormalize tick tick    
DummyOp tick tick    
DumpImage tick tick    
ExternalSource tick tick    
FastResizeCropMirror tick      
FileReader tick      
Flip tick tick    
HostDecoder tick      
Hue tick tick    
Jitter   tick    
MXNetReader tick      
NormalizePermute tick tick    
Paste   tick    
RandomBBoxCrop tick      
RandomResizedCrop tick tick    
Resize tick tick    
ResizeCropMirror tick tick    
Rotate tick tick    
SSDRandomCrop tick      
Saturation tick tick    
SequenceCrop tick      
SequenceReader tick      
Slice tick tick    
Sphere tick tick    
TFRecordReader tick      
Uniform       tick
VideoReader   tick    
WarpAffine tick tick    
Water tick tick    
nvJPEGDecoder     tick  
class nvidia.dali.ops.BBoxPaste(**kwargs)

This is ‘CPU’ operator

Transforms bounding boxes so that they are in the same place in the image after pasting it onto a larger canvas.

Corner coordinates:
(x’, y’) = (x/ratio + paste_x’, y/ratio + paste_y’)
Box sizes:
(w’, h’) = (w/ratio, h/ratio)
Where:
paste_x’ = paste_x * (ratio - 1)/ratio paste_y’ = paste_y * (ratio - 1)/ratio

Paste coordinates are normalized so that (0,0) aligns the image to top-left of the canvas and (1,1) aligns it to bottom-right.

Parameters:
  • ratio (float or float tensor) – Ratio of canvas size to input size, must be > 1.
  • ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation. Default: False
  • paste_x (float or float tensor, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)
  • paste_y (float or float tensor, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)
class nvidia.dali.ops.BbFlip(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Operator for horizontal flip (mirror) of bounding box. Input: Bounding box coordinates; in either [x, y, w, h] or [left, top, right, bottom] format. All coordinates are in the image coordinate system (i.e. 0.0-1.0)

Parameters:
  • horizontal (int or int tensor, optional, default = 1) – Perform flip along horizontal axis. Default: 1
  • ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation. Default: False
  • vertical (int or int tensor, optional, default = 0) – Perform flip along vertical axis. Default: 0
class nvidia.dali.ops.BoxEncoder(**kwargs)

This is ‘CPU’, ‘GPU’ operator

“Encodes input bounding boxes and labels using set of default boxes (anchors) passed during op construction. Follows algorithm described in https://arxiv.org/abs/1512.02325 and implemented in https://github.com/mlperf/training/tree/master/single_stage_detector/ssd Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b], and Labels containing the corresponding label for each bounding box. Results are two tensors: EncodedBBoxes containing M encoded bounding boxes as [l,t,r,b], where M is number of anchors and EncodedLabels containing the corresponding label for each encoded box.”

Parameters:
  • anchors (float or list of float) – Anchors to be used for encoding. List of floats in ltrb format.
  • criteria (float, optional, default = 0.5) – Threshold IOU for matching bounding boxes with anchors. Value between 0 and 1. Default is 0.5.
class nvidia.dali.ops.Brightness(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Changes the brightness of an image

Parameters:
  • brightness (float or float tensor, optional, default = 1.0) –

    Brightness change factor. Values >= 0 are accepted. For example:

    • 0 - black image,
    • 1 - no change
    • 2 - increase brightness twice
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.COCOReader(**kwargs)

This is ‘CPU’ operator

Read data from a COCO dataset composed of directory with images and an anotation files. For each image, with m bboxes, returns its bboxes as (m,4) Tensor (m * [x, y, w, h] or `m * [left, top, right, bottom]`) and labels as (m,1) Tensor (m * category_id).

Parameters:
  • annotations_file (str or list of str) – List of paths to the JSON annotations files.
  • file_root (str) – Path to a directory containing data files.
  • file_list (str, optional, default = '') – Path to the file with a list of pairs file label (leave empty to traverse the file_root directory to obtain files and labels)
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • ltrb (bool, optional, default = False) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height]. Default: False
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • ratio (bool, optional, default = False) – If true, bboxes returned values as expressed as ratio w.r.t. to the image width and height. Default: False
  • save_img_ids (bool, optional, default = False) – If true, image IDs will also be returned. Default: False
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.Caffe2Reader(**kwargs)

This is ‘CPU’ operator

Read sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).

Parameters:
  • path (str) – Path to Caffe2 LMDB directory.
  • additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors provided for each sample.
  • bbox (bool, optional, default = False) – Denotes if bounding-box information is present.
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • label_type (int, optional, default = 0) –

    Type of label stored in dataset.

    • 0 = SINGLE_LABEL : single integer label for multi-class classification
    • 1 = MULTI_LABEL_SPARSE : sparse active label indices for multi-label classification
    • 2 = MULTI_LABEL_DENSE : dense label embedding vector for label embedding regression
    • 3 = MULTI_LABEL_WEIGHTED_SPARSE : sparse active label indices with per-label weights for multi-label classification.
  • num_labels (int, optional, default = 1) – Number of classes in dataset. Required when sparse labels are used.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.CaffeReader(**kwargs)

This is ‘CPU’ operator

Read (Image, label) pairs from a Caffe LMDB

Parameters:
  • path (str) – Path to Caffe LMDB directory.
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.Cast(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Cast tensor to a different type

Parameters:dtype (nvidia.dali.types.DALIDataType) – Output data type.
class nvidia.dali.ops.CoinFlip(**kwargs)

This is ‘support’ operator

Produce tensor filled with 0s and 1s - results of random coin flip, usable as an argument for select ops.

Parameters:probability (float, optional, default = 0.5) – Probability of returning 1.
class nvidia.dali.ops.ColorTwist(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Combination of hue, saturation, contrast and brightness.

Parameters:brightness (float or float tensor, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

  • 0 - black image,
  • 1 - no change
  • 2 - increase brightness twice
contrast : float or float tensor, optional, default = 1.0

Contrast change factor. Values >= 0 are accepted. For example:

  • 0 - gray image,
  • 1 - no change
  • 2 - increase contrast twice
hue : float or float tensor, optional, default = 0.0
Hue change in angles.
image_type : nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB
The color space of input and output image
saturation : float or float tensor, optional, default = 1.0

Saturation change factor. Values >= 0 are supported. For example:

  • 0 - completely desaturated image
  • 1 - no change to image’s saturation
class nvidia.dali.ops.Contrast(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Changes the color contrast of the image.

Parameters:
  • contrast (float or float tensor, optional, default = 1.0) –

    Contrast change factor. Values >= 0 are accepted. For example:

    • 0 - gray image,
    • 1 - no change
    • 2 - increase contrast twice
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.Copy(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Make a copy of the input tensor

class nvidia.dali.ops.Crop(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a random crop.

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.CropCastPermute(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a random crop, data type cast and permute (from NHWC to NCHW).

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
    Output data type. If NO_TYPE is specified, the ouput data type is inferred
    from the input data type.
  • output_layout (nvidia.dali.types.DALITensorLayout, optional, default = DALITensorLayout.NCHW) – Output tensor data layout
class nvidia.dali.ops.CropMirrorNormalize(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting. Normalization takes input image and produces output using formula:

output = (input - mean) / std
Parameters:
  • mean (float or list of float) – Mean pixel values for image normalization.
  • std (float or list of float) – Standard deviation values for image normalization.
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • mirror (int or int tensor, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.
  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.
  • output_layout (nvidia.dali.types.DALITensorLayout, optional, default = DALITensorLayout.NCHW) – Output tensor data layout
  • pad_output (bool, optional, default = False) – Whether to pad the output to number of channels being multiple of 4.
class nvidia.dali.ops.DummyOp(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Dummy operator for testing

Parameters:num_outputs (int, optional, default = 2) – Number of outputs.
class nvidia.dali.ops.DumpImage(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Save images in batch to disk in PPM format. Useful for debugging.

Parameters:
  • input_layout (nvidia.dali.types.DALITensorLayout, optional, default = DALITensorLayout.NHWC) – Layout of input images.
  • suffix (str, optional, default = '') – Suffix to be added to output file names.
class nvidia.dali.ops.ExternalSource(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Allows externally provided data to be passed as an input to the pipeline, see nvidia.dali.pipeline.Pipeline.feed_input() and nvidia.dali.pipeline.Pipeline.iter_setup(). Currenlty this operator is not supported in TensorFlow.

class nvidia.dali.ops.FastResizeCropMirror(**kwargs)

This is ‘CPU’ operator

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work performed.

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
  • mirror (int or int tensor, optional, default = 0) –

    Mask for horizontal flip.

    • 0 - do not perform horizontal flip for this image
    • 1 - perform horizontal flip for this image.
  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.
  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image.
  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.
  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.
class nvidia.dali.ops.FileReader(**kwargs)

This is ‘CPU’ operator

Read (Image, label) pairs from a directory

Parameters:
  • file_root (str) – Path to a directory containing data files.
  • file_list (str, optional, default = '') – Path to the file with a list of pairs file label (leave empty to traverse the file_root directory to obtain files and labels)
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.Flip(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Flip the image on the horizontal and/or vertical axes.

Parameters:
  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • horizontal (int or int tensor, optional, default = 1) – Perform a horizontal flip. Default value is 1.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
  • vertical (int or int tensor, optional, default = 0) – Perform a vertical flip. Default value is 0.
class nvidia.dali.ops.HostDecoder(**kwargs)

This is ‘CPU’ operator

Decode images on the host using OpenCV. When applicable, it will pass execution to faster, format-specific decoders (like libjpeg-turbo). Output of the decoder is in HWC ordering.

Parameters:output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.
class nvidia.dali.ops.Hue(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Changes the hue level of the image.

Parameters:
  • hue (float or float tensor, optional, default = 0.0) – Hue change in angles.
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.Jitter(**kwargs)

This is ‘GPU’ operator

Perform a random Jitter augmentation. The output image is produced by moving each pixel by a random amount bounded by half of nDegree parameter (in both x and y dimensions).

Parameters:
  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
  • nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in range [-nDegree/2, nDegree/2].
class nvidia.dali.ops.MXNetReader(**kwargs)

This is ‘CPU’ operator

Read sample data from a MXNet RecordIO

Parameters:
  • index_path (str or list of str) – List (of length 1) containing a path to index (.idx) file. It is generated by the MXNet’s im2rec.py script together with RecordIO file. It can also be generated using rec2idx script distributed with DALI.
  • path (str or list of str) – List of paths to RecordIO files.
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.NormalizePermute(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform fused normalization, format conversion from NHWC to NCHW and type casting. Normalization takes input image and produces output using formula

output = (input - mean) / std

Parameters:
  • height (int) – Height of the input image.
  • mean (float or list of float) – Mean pixel values for image normalization.
  • std (float or list of float) – Standard deviation values for image normalization.
  • width (int) – Width of the input image.
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.
  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.
class nvidia.dali.ops.Paste(**kwargs)

This is ‘GPU’ operator

Paste the input image on a larger canvas. The canvas size is equal to input size * ratio.

Parameters:
  • fill_value (int or list of int) – Tuple of values of the color to fill the canvas. Length of the tuple needs to be equal to n_channels.
  • ratio (float or float tensor) – Ratio of canvas size to input size, must be > 1.
  • min_canvas_size (float or float tensor, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.
  • n_channels (int, optional, default = 3) – Number of channels in the image.
  • paste_x (float or float tensor, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)
  • paste_y (float or float tensor, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)
class nvidia.dali.ops.RandomBBoxCrop(**kwargs)

This is ‘CPU’ operator

Perform a prospective crop to an image while keeping bounding boxes and labels consistent. Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b] or [x,y,w,h], and Labels containing the corresponding label for each bounding box. Resulting prospective crop is provided as two Tensors: Begin containing the starting coordinates for the crop in (x,y) format, and ‘Size’ containing the dimensions of the crop in (w,h) format. Bounding boxes are provided as a (m*4) Tensor, where each bounding box is represented as [l,t,r,b] or [x,y,w,h]. Resulting labels match the boxes that remain, after being discarded with respect to the minimum accepted intersection threshold.

Parameters:
  • aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] of valid aspect ratio values for new crops. Value for min should be greater or equal to 0.0. Default values are [1.0, 1.0], disallowing changes in aspect ratio.
  • ltrb (bool, optional, default = True) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height]. By default is set to true.
  • num_attempts (int, optional, default = 1) – Number of attempts to retrieve a patch with the desired parameters.
  • scaling (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] for crop size with respect to original image dimensions. Value for min should be greater or equal to 0.0 Default values are [1.0, 1.0].
  • thresholds (float or list of float, optional, default = [0.0]) – Minimum overlap (Intersection over union) of the bounding boxes with respect to the prospective crop. Selected at random for every sample from provided values. Default value is [0.0], leaving the input image as-is in the new crop.
class nvidia.dali.ops.RandomResizedCrop(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a crop with randomly chosen area and aspect ratio, then resize it to given size.

Parameters:
  • size (int or list of int) – Size of resized image.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.
  • random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. Before resizing, the cropped image’s area will be equal to A * original image’s area.
  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio.
class nvidia.dali.ops.Resize(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Resize images.

Parameters:
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.
  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image.
  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.
  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.
  • save_attrs (bool, optional, default = False) – Save reshape attributes for testing.
class nvidia.dali.ops.ResizeCropMirror(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping.

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
  • mirror (int or int tensor, optional, default = 0) –

    Mask for horizontal flip.

    • 0 - do not perform horizontal flip for this image
    • 1 - perform horizontal flip for this image.
  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.
  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image.
  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.
  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.
class nvidia.dali.ops.Rotate(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Rotate the image.

Parameters:
  • angle (float or float tensor) – Rotation angle.
  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
class nvidia.dali.ops.SSDRandomCrop(**kwargs)

This is ‘CPU’ operator

Perform a random crop with bounding boxes where IoU meets randomly selected threshold between 0-1. When IoU falls below threshold new random crop is generated up to num_attempts. As an input, it accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.

Parameters:num_attempts (int, optional, default = 1) – Number of attempts, the default value is 1.
class nvidia.dali.ops.Saturation(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Changes saturation level of the image.

Parameters:
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • saturation (float or float tensor, optional, default = 1.0) –

    Saturation change factor. Values >= 0 are supported. For example:

    • 0 - completely desaturated image
    • 1 - no change to image’s saturation
class nvidia.dali.ops.SequenceCrop(**kwargs)

This is ‘CPU’ operator

Perform a random crop on a sequecne.

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.SequenceReader(**kwargs)

This is ‘CPU’ operator

Read [Frame] sequences from a directory representing collection of streams

Parameters:
  • file_root (str) – Path to a directory containing streams (directories representing streams).
  • sequence_length (int) – Lenght of sequence to load for each sample
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.Slice(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Crop a slice of a defined size from an input tensor, staring at the location specified by begin. Inputs must be supplied as 3 Tensors in a specific order: Images containing image data in NHWC format, Begin containing the starting pixel coordinates for the crop in (x,y) format, and Size containing the pixel dimensions of the crop in (w,h) format. For both Begin and Size, coordinates must be in the interval [0.0, 1.0]. The resulting tensor output of Slice operation is a cropped version of the input tensor Images.

Parameters:
  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image. If only a single value c is provided, the resulting crop will be square with size (c,c)
  • crop_pos_x (float or float tensor, optional, default = 0.5) – Horizontal position of the crop in image coordinates (0.0 - 1.0)
  • crop_pos_y (float or float tensor, optional, default = 0.5) – Vertical position of the crop in image coordinates (0.0 - 1.0)
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image
class nvidia.dali.ops.Sphere(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a sphere augmentation.

Parameters:
  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
class nvidia.dali.ops.TFRecordReader(path, index_path, features, **kwargs)

This is ‘CPU’ operator

Read sample data from a TensorFlow TFRecord file.

Parameters:
  • features (dict of (string, nvidia.dali.tfrecord.Feature)) – Dictionary of names and configuration of features existing in TFRecord file. Typically obtained using helper functions dali.tfrecord.FixedLenFeature and dali.tfrecord.VarLenFeature, they are equivalent to TensorFlow’s tf.FixedLenFeature and tf.VarLenFeature respectively.
  • index_path (str or list of str) – List of paths to index files (1 index file for every TFRecord file). Index files may be obtained from TFRecord files using tfrecord2idx script distributed with DALI.
  • path (str or list of str) – List of paths to TFRecord files.
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.Uniform(**kwargs)

This is ‘support’ operator

Produce tensor filled with uniformly distributed random numbers.

Parameters:range (float or list of float, optional, default = [-1.0, 1.0]) – Range of produced random numbers.
class nvidia.dali.ops.VideoReader(**kwargs)

This is ‘GPU’ operator

Load and decode H264 video codec with FFmpeg and NVDECODE, NVIDIA GPU’s hardware-accelerated video decoding. The video codecs can be contained in most of container file formats. FFmpeg is used to parse video containers. Returns a batch of sequences of sequence_length frames of shape [N, F, H, W, C] (N being the batch size and F the number of frames).

Parameters:
  • filenames (str or list of str) – File names of the video files to load.
  • sequence_length (int) – Frames to load per sequence.
  • channels (int, optional, default = 3) – Number of channels.
  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (supports RGB and YCbCr).
  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling.
  • normalized (bool, optional, default = False) – Get output as normalized data.
  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).
  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data.
  • scale (float, optional, default = 1.0) – Rescaling factor of height and width.
  • shard_id (int, optional, default = 0) – Id of the part to read.
  • step (int, optional, default = -1) – Frame interval between each sequence (if step < 0, step is set to sequence_length).
  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
class nvidia.dali.ops.WarpAffine(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Apply an affine transformation to the image.

Parameters:
  • matrix (float or list of float) –

    Matrix of the transform (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using formula

    dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

    It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
  • use_image_center (bool, optional, default = False) – Whether to use image center as the center of transformation. When this is True coordinates are calculated from the center of the image.
class nvidia.dali.ops.Water(**kwargs)

This is ‘CPU’, ‘GPU’ operator

Perform a water augmentation (make image appear to be underwater).

Parameters:
  • ampl_x (float, optional, default = 10.0) – Amplitude of the wave in x direction.
  • ampl_y (float, optional, default = 10.0) – Amplitude of the wave in y direction.
  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.
  • freq_x (float, optional, default = 0.049087) – Frequency of the wave in x direction.
  • freq_y (float, optional, default = 0.049087) – Frequence of the wave in y direction.
  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation
    • 1 - apply this transformation
  • phase_x (float, optional, default = 0.0) – Phase of the wave in x direction.
  • phase_y (float, optional, default = 0.0) – Phase of the wave in y direction.
class nvidia.dali.ops.nvJPEGDecoder(**kwargs)

This is ‘mixed’ operator

Decode JPEG images using the nvJPEG library. Output of the decoder is on the GPU and uses HWC ordering.

Parameters:
  • device_memory_padding (int, optional, default = 16777216) – Padding for nvJPEG’s device memory allocations. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it. Default is 16MB.
  • host_memory_padding (int, optional, default = 16777216) – Padding for nvJPEG’s host memory allocations. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it. Default is 16MB.
  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.
  • use_batched_decode (bool, optional, default = False) – Use nvJPEG’s batched decoding API.