Supported operations

  • CPU operator means that the operator can be scheduled on the CPU.

  • GPU operator means that the operator can be scheduled on the GPU.

  • Mixed operator means that the operator accepts input on the CPU while producing the output on the GPU.

  • Support is a special type of operator that provides data driving other operators (like a random generator). Its output cannot be used as a DALI output.

  • Sequences are an operator that can work (produce or accept as an input) sequence (video like) kind of input.

Below table lists all available operators and devices they can operate on.

Operator name

CPU

GPU

Mixed

Support

Sequences

BbFlip

v

v

BBoxPaste

v

BoxEncoder

v

v

Brightness

v

v

Caffe2Reader

v

CaffeReader

v

Cast

v

v

COCOReader

v

CoinFlip

v

ColorSpaceConversion

v

v

ColorTwist

v

v

Contrast

v

v

Copy

v

v

Crop

v

v

v

CropMirrorNormalize

v

v

v

DummyOp

v

v

DumpImage

v

v

ElementExtract

v

v

v

ExternalSource

v

v

FastResizeCropMirror

v

FileReader

v

Flip

v

v

HostDecoder

v

HostDecoderCrop

v

HostDecoderRandomCrop

v

HostDecoderSlice

v

Hue

v

v

ImageDecoder

v

v

ImageDecoderCrop

v

v

ImageDecoderRandomCrop

v

v

ImageDecoderSlice

v

v

Jitter

v

MXNetReader

v

NormalizePermute

v

v

nvJPEGDecoder

v

nvJPEGDecoderCrop

v

nvJPEGDecoderRandomCrop

v

nvJPEGDecoderSlice

v

OpticalFlow

v

v

Paste

v

PythonFunction

v

RandomBBoxCrop

v

RandomResizedCrop

v

v

Resize

v

v

ResizeCropMirror

v

Rotate

v

v

Saturation

v

v

SequenceReader

v

v

Slice

v

v

v

Sphere

v

v

SSDRandomCrop

v

TFRecordReader

v

Transpose

v

v

Uniform

v

VideoReader

v

WarpAffine

v

v

Water

v

v

class nvidia.dali.ops.BBoxPaste(**kwargs)

This is a ‘CPU’ operator

Transforms bounding boxes so that they are in the same place in the image after pasting it onto a larger canvas.

Corner coordinates:

(x’, y’) = (x/ratio + paste_x’, y/ratio + paste_y’)

Box sizes:

(w’, h’) = (w/ratio, h/ratio)

Where:

paste_x’ = paste_x * (ratio - 1)/ratio paste_y’ = paste_y * (ratio - 1)/ratio

Paste coordinates are normalized so that (0,0) aligns the image to top-left of the canvas and (1,1) aligns it to bottom-right.

Parameters
  • ratio (float or float tensor) – Ratio of canvas size to input size, must be > 1.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

  • paste_x (float or float tensor, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

  • paste_y (float or float tensor, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.BbFlip(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Operator for horizontal flip (mirror) of bounding box. Input: Bounding box coordinates; in either [x, y, w, h] or [left, top, right, bottom] format. All coordinates are in the image coordinate system (i.e. 0.0-1.0)

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • horizontal (int or int tensor, optional, default = 1) – Perform flip along horizontal axis.

  • ltrb (bool, optional, default = False) – True, for two-point (ltrb). False for for width-height representation.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • vertical (int or int tensor, optional, default = 0) – Perform flip along vertical axis.

class nvidia.dali.ops.BoxEncoder(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

“Encodes input bounding boxes and labels using set of default boxes (anchors) passed during op construction. Follows algorithm described in https://arxiv.org/abs/1512.02325 and implemented in https://github.com/mlperf/training/tree/master/single_stage_detector/ssd Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b], and Labels containing the corresponding label for each bounding box. Results are two tensors: EncodedBBoxes containing M encoded bounding boxes as [l,t,r,b], where M is number of anchors and EncodedLabels containing the corresponding label for each encoded box.”

Parameters
  • anchors (float or list of float) – Anchors to be used for encoding. List of floats in ltrb format.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • criteria (float, optional, default = 0.5) – Threshold IOU for matching bounding boxes with anchors. Value between 0 and 1.

  • means (float or list of float, optional, default = [0.0, 0.0, 0.0, 0.0]) – [x y w h] means for offset normalization.

  • offset (bool, optional, default = False) –

    Returns normalized offsets ((encoded_bboxes*scale - anchors*scale) - mean) / stds

    in EncodedBBoxes using std, mean and scale arguments (default values are transparent).

  • scale (float, optional, default = 1.0) – Rescale the box and anchors values before offset calculation (e.g. to get back to absolute values).

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • stds (float or list of float, optional, default = [1.0, 1.0, 1.0, 1.0]) – [x y w h] standard deviations for offset normalization.

class nvidia.dali.ops.Brightness(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Changes the brightness of an image

Parameters
  • brightness (float or float tensor, optional, default = 1.0) –

    Brightness change factor. Values >= 0 are accepted. For example:

    • 0 - black image,

    • 1 - no change

    • 2 - increase brightness twice

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.COCOReader(**kwargs)

This is a ‘CPU’ operator

Read data from a COCO dataset composed of directory with images and an annotation files. For each image, with m bboxes, returns its bboxes as (m,4) Tensor (m * [x, y, w, h] or `m * [left, top, right, bottom]`) and labels as (m,1) Tensor (m * category_id).

Parameters
  • annotations_file (str or list of str) – List of paths to the JSON annotations files.

  • file_root (str) – Path to a directory containing data files.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • file_list (str, optional, default = '') – Path to the file with a list of pairs file label (leave empty to traverse the file_root directory to obtain files and labels)

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • ltrb (bool, optional, default = False) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height].

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • ratio (bool, optional, default = False) – If true, bboxes returned values as expressed as ratio w.r.t. to the image width and height.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • save_img_ids (bool, optional, default = False) – If true, image IDs will also be returned.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch.

  • size_threshold (float, optional, default = 0.1) – If width or height of a bounding box representing an instance of an object is under this value, object will be skipped during reading. It is represented as absolute value.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.Caffe2Reader(**kwargs)

This is a ‘CPU’ operator

Read sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).

Parameters
  • path (str) – Path to Caffe2 LMDB directory.

  • additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors provided for each sample.

  • bbox (bool, optional, default = False) – Denotes if bounding-box information is present.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • label_type (int, optional, default = 0) –

    Type of label stored in dataset.

    • 0 = SINGLE_LABEL : single integer label for multi-class classification

    • 1 = MULTI_LABEL_SPARSE : sparse active label indices for multi-label classification

    • 2 = MULTI_LABEL_DENSE : dense label embedding vector for label embedding regression

    • 3 = MULTI_LABEL_WEIGHTED_SPARSE : sparse active label indices with per-label weights for multi-label classification.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_labels (int, optional, default = 1) – Number of classes in dataset. Required when sparse labels are used.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.CaffeReader(**kwargs)

This is a ‘CPU’ operator

Read (Image, label) pairs from a Caffe LMDB

Parameters
  • path (str) – Path to Caffe LMDB directory.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.Cast(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Cast tensor to a different type

Parameters
  • dtype (nvidia.dali.types.DALIDataType) – Output data type.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.CoinFlip(**kwargs)

This is a ‘support’ operator

Produce tensor filled with 0s and 1s - results of random coin flip, usable as an argument for select ops.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • probability (float, optional, default = 0.5) – Probability of returning 1.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.ColorSpaceConversion(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Converts between various image color models

Parameters
  • image_type (nvidia.dali.types.DALIImageType) – The color space of the input image

  • output_type (nvidia.dali.types.DALIImageType) – The color space of the output image

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.ColorTwist(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Combination of hue, saturation, contrast and brightness.

Parameters

brightness (float or float tensor, optional, default = 1.0) –

Brightness change factor. Values >= 0 are accepted. For example:

  • 0 - black image,

  • 1 - no change

  • 2 - increase brightness twice

bytes_per_sample_hintint, optional, default = 0

Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

contrastfloat or float tensor, optional, default = 1.0

Contrast change factor. Values >= 0 are accepted. For example:

  • 0 - gray image,

  • 1 - no change

  • 2 - increase contrast twice

huefloat or float tensor, optional, default = 0.0

Hue change, in degrees.

image_typenvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB

The color space of input and output image

saturationfloat or float tensor, optional, default = 1.0

Saturation change factor. Values >= 0 are supported. For example:

  • 0 - completely desaturated image

  • 1 - no change to image’s saturation

seedint, optional, default = -1

Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Contrast(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Changes the color contrast of the image.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • contrast (float or float tensor, optional, default = 1.0) –

    Contrast change factor. Values >= 0 are accepted. For example:

    • 0 - gray image,

    • 1 - no change

    • 2 - increase contrast twice

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Copy(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Make a copy of the input tensor

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Crop(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Crops image with a given window dimensions and window position (upper left corner).

This operator allows sequence inputs

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.CropMirrorNormalize(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Perform fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting. Normalization takes input image and produces output using formula:

output = (input - mean) / std

Note that not providing any crop argument will result into mirroring and normalization only.

This operator allows sequence inputs

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • mean (float or list of float, optional, default = [0.0]) – Mean pixel values for image normalization.

  • mirror (int or int tensor, optional, default = 0) – Mask for horizontal flip. - 0 - do not perform horizontal flip for this image - 1 - perform horizontal flip for this image.

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.

  • output_layout (nvidia.dali.types.DALITensorLayout, optional, default = DALITensorLayout.NCHW) – Output tensor data layout

  • pad_output (bool, optional, default = False) – Whether to pad the output to number of channels being multiple of 4.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • std (float or list of float, optional, default = [1.0]) – Standard deviation values for image normalization.

class nvidia.dali.ops.DummyOp(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Dummy operator for testing

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • num_outputs (int, optional, default = 2) – Number of outputs.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.DumpImage(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Save images in batch to disk in PPM format. Useful for debugging.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • input_layout (nvidia.dali.types.DALITensorLayout, optional, default = DALITensorLayout.NHWC) – Layout of input images.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • suffix (str, optional, default = '') – Suffix to be added to output file names.

class nvidia.dali.ops.ElementExtract(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Extracts one or more elements from input

This operator expects sequence inputs

Parameters
  • element_map (int or list of int) – Indices of extracted elements

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.ExternalSource(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Allows externally provided data to be passed as an input to the pipeline, see nvidia.dali.pipeline.Pipeline.feed_input() and nvidia.dali.pipeline.Pipeline.iter_setup(). Currently this operator is not supported in TensorFlow. It is worth noting that fed inputs should match the number of dimensions expected by the next operator in the pipeline (e.g. NHWC will expect 3-dimensional tensors where the last dimension represents the different channels).

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.FastResizeCropMirror(**kwargs)

This is a ‘CPU’ operator

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work performed.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

  • max_size (float or list of float, optional, default = [0.0, 0.0]) –

    Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint “longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

    Example:

    Original image = “400x1200”.

    Resized with:

    • resize_shorter`=”200” (`max_size not set) => “200x600”

    • `resize_shorter`=”200”, `max_size`=”400 => “132x400”

    • `resize_shorter`=”200”, `max_size`=1000 => “200x600”

  • mirror (int or int tensor, optional, default = 0) –

    Mask for horizontal flip.

    • 0 - do not perform horizontal flip for this image

    • 1 - perform horizontal flip for this image.

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used

  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.

  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.FileReader(**kwargs)

This is a ‘CPU’ operator

Read (Image, label) pairs from a directory

Parameters
  • file_root (str) – Path to a directory containing data files.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • file_list (str, optional, default = '') – Path to the file with a list of pairs file label (leave empty to traverse the file_root directory to obtain files and labels)

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch. It is exclusive with stick_to_shard and random_shuffle.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.Flip(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Flip the image over the horizontal and/or vertical axes.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • horizontal (int or int tensor, optional, default = 1) – Perform a horizontal flip.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • vertical (int or int tensor, optional, default = 0) – Perform a vertical flip.

class nvidia.dali.ops.HostDecoder(**kwargs)

This is a ‘CPU’ operator

Specific implementation of ImageDecoder for cpu backend

Warning

This operator is now deprecated. Use ImageDecoder instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.HostDecoderCrop(**kwargs)

This is a ‘CPU’ operator

Decode images on the host with a fixed cropping window size and variable anchor. When possible, will make use of partial decoding (e.g. libjpeg-turbo). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderCrop instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.HostDecoderRandomCrop(**kwargs)

This is a ‘CPU’ operator

Decode images on the host with a random cropping anchor/window. When possible, will make use of partial decoding (e.g. libjpeg-turbo). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderRandomCrop instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.HostDecoderSlice(**kwargs)

This is a ‘CPU’ operator

Decode images on the host with a cropping window of given size and anchor. Inputs must be supplied as 3 tensors in a specific order: encoded_data containing encoded image data, begin containing the starting pixel coordinates for the crop in (x,y) format, and size containing the pixel dimensions of the crop in (w,h) format. For both begin and size, coordinates must be in the interval [0.0, 1.0]. When possible, will make use of partial decoding (e.g. libjpeg-turbo). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderSlice instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.Hue(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Changes the hue level of the image.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • hue (float or float tensor, optional, default = 0.0) – Hue change, in degrees.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.ImageDecoder(**kwargs)

This is a ‘CPU’, ‘mixed’ operator

Decode images. Implementation will be based on nvJPEG library or libjpeg-turbo depending on the selected backend (mixed and cpu respectively). Non-jpeg images are decoded with OpenCV. The Output of the decoder is in HWC ordering.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.ImageDecoderCrop(**kwargs)

This is a ‘CPU’, ‘mixed’ operator

Decode images with a fixed cropping window size and variable anchor. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.ImageDecoderRandomCrop(**kwargs)

This is a ‘CPU’, ‘mixed’ operator

Decode images with a random cropping anchor/window. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.ImageDecoderSlice(**kwargs)

This is a ‘CPU’, ‘mixed’ operator

Decode images on the host with a cropping window of given size and anchor. Inputs must be supplied as 3 tensors in a specific order: encoded_data containing encoded image data, begin containing the starting pixel coordinates for the crop in (x,y) format, and size containing the pixel dimensions of the crop in (w,h) format. For both begin and size, coordinates must be in the interval [0.0, 1.0]. When possible, will make use of partial decoding (e.g. libjpeg-turbo, nvJPEG). When not supported, will decode the whole image and then crop. Output of the decoder is in HWC ordering.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.Jitter(**kwargs)

This is a ‘GPU’ operator

Perform a random Jitter augmentation. The output image is produced by moving each pixel by a random amount bounded by half of nDegree parameter (in both x and y dimensions).

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation

    • 1 - apply this transformation

  • nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in range [-nDegree/2, nDegree/2].

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.MXNetReader(**kwargs)

This is a ‘CPU’ operator

Read sample data from a MXNet RecordIO

Parameters
  • index_path (str or list of str) – List (of length 1) containing a path to index (.idx) file. It is generated by the MXNet’s im2rec.py script together with RecordIO file. It can also be generated using rec2idx script distributed with DALI.

  • path (str or list of str) – List of paths to RecordIO files.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.NormalizePermute(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Perform fused normalization, format conversion from NHWC to NCHW and type casting. Normalization takes input image and produces output using formula

output = (input - mean) / std

Warning

This operator is now deprecated. Use CropMirrorNormalize instead

Parameters
  • height (int) – Height of the input image.

  • mean (float or list of float) – Mean pixel values for image normalization.

  • std (float or list of float) – Standard deviation values for image normalization.

  • width (int) – Width of the input image.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.OpticalFlow(**kwargs)

This is a ‘GPU’ operator

Calculates the Optical Flow for sequence of images given as a input. Mandatory input for the operator is a sequence of frames. As an optional input, operator accepts external hints for OF calculation. The output format of this operator matches the output format of OF driver API. Dali uses Turing optical flow hardware implementation: https://developer.nvidia.com/opticalflow-sdk

This operator allows sequence inputs

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • enable_external_hints (bool, optional, default = False) – enabling/disabling external hints for OF calculation. External hints are analogous to temporal hints, only they come from external source. When this option is enabled, Operator requires 2 inputs.

  • enable_temporal_hints (bool, optional, default = False) – enabling/disabling temporal hints for sequences longer than 2 images. They are used to speed up calculation: previous OF result in sequence is used to calculate current flow. You might want to use temporal hints for sequences, that don’t have much changes in the scene (e.g. only moving objects)

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – Type of input images (RGB, BGR, GRAY)

  • output_format (int, optional, default = -1) – Setting grid size for output vector. Value defines width of grid square (e.g. if value == 4, 4x4 grid is used). For values <=0, grid size is undefined. Currently only grid_size=4 is supported.

  • preset (float, optional, default = 0.0) –

    Setting quality level of OF calculation.

    0.0f … 1.0f, where 1.0f is best quality, lowest speed

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Paste(**kwargs)

This is a ‘GPU’ operator

Paste the input image on a larger canvas. The canvas size is equal to input size * ratio.

Parameters
  • fill_value (int or list of int) – Tuple of values of the color to fill the canvas. Length of the tuple needs to be equal to n_channels.

  • ratio (float or float tensor) – Ratio of canvas size to input size, must be > 1.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • min_canvas_size (float or float tensor, optional, default = 0.0) – Enforce minimum paste canvas dimension after scaling input size by ratio.

  • n_channels (int, optional, default = 3) – Number of channels in the image.

  • paste_x (float or float tensor, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0)

  • paste_y (float or float tensor, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0)

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.PythonFunction(function, num_outputs=1, **kwargs)

This is a ‘CPU’ operator

Executes a python function

Parameters
  • function (object) – Function object consuming and producing a single numpy array

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • num_outputs (int, optional, default = 1) – Number of outputs

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.RandomBBoxCrop(**kwargs)

This is a ‘CPU’ operator

Perform a prospective crop to an image while keeping bounding boxes and labels consistent. Inputs must be supplied as two Tensors: BBoxes containing bounding boxes represented as [l,t,r,b] or [x,y,w,h], and Labels containing the corresponding label for each bounding box. Resulting prospective crop is provided as two Tensors: Begin containing the starting coordinates for the crop in (x,y) format, and ‘Size’ containing the dimensions of the crop in (w,h) format. Bounding boxes are provided as a (m*4) Tensor, where each bounding box is represented as [l,t,r,b] or [x,y,w,h]. Resulting labels match the boxes that remain, after being discarded with respect to the minimum accepted intersection threshold. Be advised, when allow_no_crop is false and thresholds does not contain 0 it is good to increase num_attempts as otherwise it may loop for a very long time.

Parameters
  • allow_no_crop (bool, optional, default = True) – If true, includes no cropping as one of the random options.

  • aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] of valid aspect ratio values for new crops. Value for min should be greater or equal to 0.0. Default values disallow changes in aspect ratio.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • ltrb (bool, optional, default = True) – If true, bboxes are returned as [left, top, right, bottom], else [x, y, width, height].

  • num_attempts (int, optional, default = 1) – Number of attempts to retrieve a patch with the desired parameters.

  • scaling (float or list of float, optional, default = [1.0, 1.0]) – Range [min, max] for crop size with respect to original image dimensions. Value for min should be greater or equal to 0.0.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • thresholds (float or list of float, optional, default = [0.0]) – Minimum overlap (Intersection over union) of the bounding boxes with respect to the prospective crop. Selected at random for every sample from provided values. Default imposes no restrictions on Intersection over Union for boxes and crop.

class nvidia.dali.ops.RandomResizedCrop(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Perform a crop with randomly chosen area and aspect ratio, then resize it to given size. Expects a 3-dimensional input with samples in HWC layout (height, width, channels)

Parameters
  • size (int or list of int) – Size of resized image.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation used. Use min_filter and mag_filter to specify

    different filtering for downscaling and upscaling.

  • mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up

  • min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down

  • minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

class nvidia.dali.ops.Resize(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Resize images.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation used. Use min_filter and mag_filter to specify

    different filtering for downscaling and upscaling.

  • mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up

  • max_size (float or list of float, optional, default = [0.0, 0.0]) –

    Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint “longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

    Example:

    Original image = “400x1200”.

    Resized with:

    • resize_shorter`=”200” (`max_size not set) => “200x600”

    • `resize_shorter`=”200”, `max_size`=”400 => “132x400”

    • `resize_shorter`=”200”, `max_size`=1000 => “200x600”

  • min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down

  • minibatch_size (int, optional, default = 32) – Maximum number of images processed in a single kernel call

  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.

  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

  • save_attrs (bool, optional, default = False) – Save reshape attributes for testing.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • temp_buffer_hint (int, optional, default = 0) – Initial size, in bytes, of a temporary buffer for resampling. Ingored for CPU variant.

class nvidia.dali.ops.ResizeCropMirror(**kwargs)

This is a ‘CPU’ operator

Perform a fused resize, crop, mirror operation. Handles both fixed and random resizing and cropping.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

  • max_size (float or list of float, optional, default = [0.0, 0.0]) –

    Maximum size of the longer dimension when resizing with resize_shorter. When set with resize_shorter, the shortest dimension will be resized to resize_shorter iff the longest dimension is smaller or equal to max_size. If not, the shortest dimension is resized to satisfy the constraint “longest_dim == max_size. Can be also an array of size 2, where the two elements are maximum size per dimension (H, W).

    Example:

    Original image = “400x1200”.

    Resized with:

    • resize_shorter`=”200” (`max_size not set) => “200x600”

    • `resize_shorter`=”200”, `max_size`=”400 => “132x400”

    • `resize_shorter`=”200”, `max_size`=1000 => “200x600”

  • mirror (int or int tensor, optional, default = 0) –

    Mask for horizontal flip.

    • 0 - do not perform horizontal flip for this image

    • 1 - perform horizontal flip for this image.

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used

  • resize_longer (float or float tensor, optional, default = 0.0) – The length of the longer dimension of the resized image. This option is mutually exclusive with resize_shorter,`resize_x` and resize_y. The op will keep the aspect ratio of the original image.

  • resize_shorter (float or float tensor, optional, default = 0.0) – The length of the shorter dimension of the resized image. This option is mutually exclusive with resize_longer, resize_x and resize_y. The op will keep the aspect ratio of the original image. The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or float tensor, optional, default = 0.0) – The length of the X dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_y is left at 0, then the op will keep the aspect ratio of the original image.

  • resize_y (float or float tensor, optional, default = 0.0) – The length of the Y dimension of the resized image. This option is mutually exclusive with resize_shorter. If the resize_x is left at 0, then the op will keep the aspect ratio of the original image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Rotate(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Rotate the image.

Parameters
  • angle (float or float tensor) – Counterclockwise rotation angle, in degrees.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation

    • 1 - apply this transformation

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.SSDRandomCrop(**kwargs)

This is a ‘CPU’ operator

Perform a random crop with bounding boxes where IoU meets randomly selected threshold between 0-1. When IoU falls below threshold new random crop is generated up to num_attempts. As an input, it accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • num_attempts (int, optional, default = 1) – Number of attempts.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Saturation(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Changes saturation level of the image.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • saturation (float or float tensor, optional, default = 1.0) –

    Saturation change factor. Values >= 0 are supported. For example:

    • 0 - completely desaturated image

    • 1 - no change to image’s saturation

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.SequenceReader(**kwargs)

This is a ‘CPU’ operator

Read [Frame] sequences from a directory representing collection of streams. Expects file_root to contain set of directories, each of them represents one extracted video stream. Extracted video stream is represented by one file for each frame, sorting the paths to frames lexicographically should give the original order of frames. Sequences do not cross stream boundary and only full sequences are considered - there is no padding. Example:

> file_root

> 0

> 00001.png

> 00002.png

> 00003.png

> 00004.png

> 00005.png

> 00006.png

> 1

> 00001.png

> 00002.png

> 00003.png

> 00004.png

> 00005.png

> 00006.png

This operator allows sequence inputs

Parameters
  • file_root (str) – Path to a directory containing streams (directories representing streams).

  • sequence_length (int) – Length of sequence to load for each sample

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • step (int, optional, default = 1) – Distance between first frames of consecutive sequences

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • stride (int, optional, default = 1) – Distance between consecutive frames in sequence

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.Slice(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Extract a subtensor or slice with a given shape and anchor.

Inputs must be supplied as 3 separate tensors in a specific order: data containing input data, anchor containing normalize coordinates for the starting point of the slice (x0, x1, x2, …), and shape containing the normalized dimensions of the slice (s0, s1, s2, …). Both anchor and shape coordinates must be in the interval [0.0, 1.0] and should have as many dimensions as the input data. For compatibility with the previous implementation of Slice, anchor and slice can be specified in format (x, y) and (w, h) respectively for images. This way of specifying the slice arguments is deprecated and shall be removed in future versions of DALI.

This operator allows sequence inputs

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image

  • output_dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) – Output data type. By default same data type as the input will be used

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Sphere(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Perform a sphere augmentation.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation

    • 1 - apply this transformation

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.TFRecordReader(path, index_path, features, **kwargs)

This is a ‘CPU’ operator

Read sample data from a TensorFlow TFRecord file.

Parameters
  • features (dict of (string, nvidia.dali.tfrecord.Feature)) – Dictionary of names and configuration of features existing in TFRecord file. Typically obtained using helper functions dali.tfrecord.FixedLenFeature and dali.tfrecord.VarLenFeature, they are equivalent to TensorFlow’s tf.FixedLenFeature and tf.VarLenFeature respectively. For more flexibility dali.tfrecord.VarLenFeature supports partial_shape parameter. If provided, data will be reshaped to match its value. First dimension will be inferred from the data size.

  • index_path (str or list of str) – List of paths to index files (1 index file for every TFRecord file). Index files may be obtained from TFRecord files using tfrecord2idx script distributed with DALI.

  • path (str or list of str) – List of paths to TFRecord files.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.Transpose(**kwargs)

This is a ‘GPU’ operator

Transpose tensor dimension to a new permutated dimension specified by perm.

This operator allows sequence inputs

Parameters
  • perm (int or list of int) – Permutation of the dimensions of the input (e.g. [2, 0, 1]).

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.Uniform(**kwargs)

This is a ‘support’ operator

Produce tensor filled with uniformly distributed random numbers.

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • range (float or list of float, optional, default = [-1.0, 1.0]) – Range of produced random numbers.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.VideoReader(**kwargs)

This is a ‘GPU’ operator

Load and decode H264 video codec with FFmpeg and NVDECODE, NVIDIA GPU’s hardware-accelerated video decoding. The video codecs can be contained in most of container file formats. FFmpeg is used to parse video containers. Returns a batch of sequences of sequence_length frames of shape [N, F, H, W, C] (N being the batch size and F the number of frames).

Parameters
  • sequence_length (int) – Frames to load per sequence.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • channels (int, optional, default = 3) – Number of channels.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – The data type of the output frames (supports FLOAT and UINT8).

  • file_root (str, optional, default = '') – Path to a directory containing data files. This option is mutually exclusive with filenames.

  • filenames (str or list of str, optional, default = []) – File names of the video files to load. This option is mutually exclusive with file_root.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (supports RGB and YCbCr).

  • initial_fill (int, optional, default = 1024) – Size of the buffer used for shuffling. If random_shuffle is off then this parameter is ignored.

  • lazy_init (bool, optional, default = False) – If set to true, Loader will parse and prepare the dataset metadata only during the first Run instead of in the constructor.

  • normalized (bool, optional, default = False) – Get output as normalized data.

  • num_shards (int, optional, default = 1) – Partition the data into this many parts (used for multiGPU training).

  • prefetch_queue_depth (int, optional, default = 1) – Specifies the number of batches prefetched by the internal Loader. To be increased when pipeline processing is CPU stage-bound, trading memory consumption for better interleaving with the Loader thread.

  • random_shuffle (bool, optional, default = False) – Whether to randomly shuffle data. Prefetch buffer of initial_fill size is used to sequentially read data and then randomly sample it to form a batch.

  • read_ahead (bool, optional, default = False) – Whether accessed data should be read ahead. In case of big files like LMDB, RecordIO or TFRecord it will slow down first access but will decrease the time of all following accesses.

  • scale (float, optional, default = 1.0) – Rescaling factor of height and width.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • shard_id (int, optional, default = 0) – Id of the part to read.

  • skip_cached_images (bool, optional, default = False) – If set to true, loading data will be skipped when the sample is present in the decoder cache. In such case the output of the loader will be empty

  • step (int, optional, default = -1) – Frame interval between each sequence (if step < 0, step is set to sequence_length).

  • stick_to_shard (bool, optional, default = False) – Whether reader should stick to given data shard instead of going through the whole dataset. When decoder caching is used, it reduces significantly the amount of data to be cached, but could affect accuracy in some cases

  • stride (int, optional, default = 1) – Distance between consecutive frames in sequence.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

class nvidia.dali.ops.WarpAffine(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Apply an affine transformation to the image.

Parameters
  • matrix (float or list of float) –

    Matrix of the transform (dst -> src). Given list of values (M11, M12, M13, M21, M22, M23) this operation will produce a new image using formula

    dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)

    It is equivalent to OpenCV’s warpAffine operation with a flag WARP_INVERSE_MAP set.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation

    • 1 - apply this transformation

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • use_image_center (bool, optional, default = False) – Whether to use image center as the center of transformation. When this is True coordinates are calculated from the center of the image.

class nvidia.dali.ops.Water(**kwargs)

This is a ‘CPU’, ‘GPU’ operator

Perform a water augmentation (make image appear to be underwater).

Parameters
  • ampl_x (float, optional, default = 10.0) – Amplitude of the wave in x direction.

  • ampl_y (float, optional, default = 10.0) – Amplitude of the wave in y direction.

  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • fill_value (float, optional, default = 0.0) – Color value used for padding pixels.

  • freq_x (float, optional, default = 0.049087) – Frequency of the wave in x direction.

  • freq_y (float, optional, default = 0.049087) – Frequence of the wave in y direction.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or int tensor, optional, default = 1) –

    Whether to apply this augmentation to the input image.

    • 0 - do not apply this transformation

    • 1 - apply this transformation

  • phase_x (float, optional, default = 0.0) – Phase of the wave in x direction.

  • phase_y (float, optional, default = 0.0) – Phase of the wave in y direction.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

class nvidia.dali.ops.nvJPEGDecoder(**kwargs)

This is a ‘mixed’ operator

Specific implementation of ImageDecoder for mixed backend

Warning

This operator is now deprecated. Use ImageDecoder instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.nvJPEGDecoderCrop(**kwargs)

This is a ‘mixed’ operator

Partially decode JPEG images using the nvJPEG library and a cropping window. Output of the decoder is on the GPU and uses HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderCrop instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • crop (float or list of float, optional, default = [0.0, 0.0]) – Size of the cropped image, specified as a pair (crop_H, crop_W). If only a single value c is provided, the resulting crop will be square with size (c,c). Providing crop argument is incompatible with providing separate arguments crop_h and crop_w.

  • crop_h (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_w should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner). Actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image and crop_W is the width of the cropping window.

  • crop_pos_y (float or float tensor, optional, default = 0.5) – Normalized (0.0 - 1.0) vertical position of the cropping window (upper left corner). Actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image and crop_H is the height of the cropping window.

  • crop_w (float or float tensor, optional, default = 0.0) – cropping window height (in pixels). If provided, crop_h should be provided as well. Providing crop_w, crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.nvJPEGDecoderRandomCrop(**kwargs)

This is a ‘mixed’ operator

Partially decode JPEG images using the nvJPEG library, using a random cropping anchor/window. Output of the decoder is on the GPU and uses HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderRandomCrop instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) – Range from which to choose random area factor A. The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.

class nvidia.dali.ops.nvJPEGDecoderSlice(**kwargs)

This is a ‘mixed’ operator

Partially decode JPEG images using the nvJPEG library, with a cropping window of given size and anchor. Inputs must be supplied as 3 tensors in a specific order: encoded_data containing encoded image data, begin containing the starting pixel coordinates for the crop in (x,y) format, and size containing the pixel dimensions of the crop in (w,h) format. For both begin and size, coordinates must be in the interval [0.0, 1.0]. Output of the decoder is in HWC ordering

Warning

This operator is now deprecated. Use ImageDecoderSlice instead

Parameters
  • bytes_per_sample_hint (int, optional, default = 0) – Output size hint (bytes), per sample. The memory will be preallocated if it uses GPU or page-locked memory

  • cache_batch_copy (bool, optional, default = True) – `mixed` backend only If true, multiple images from cache are copied with a single batched copy kernel call; otherwise, each image is copied using cudaMemcpy unless order in the batch is the same as in the cache

  • cache_debug (bool, optional, default = False) – `mixed` backend only Print debug information about decoder cache.

  • cache_size (int, optional, default = 0) – `mixed` backend only Total size of the decoder cache in megabytes. When provided, decoded images bigger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) – `mixed` backend only Size threshold (in bytes) for images (after decoding) to be cached.

  • cache_type (str, optional, default = '') – `mixed` backend only Choose cache type: threshold: Caches every image with size bigger than cache_threshold until cache is full. Warm up time for threshold policy is 1 epoch. largest: Store largest images that can fit the cache. Warm up time for largest policy is 2 epochs To take advantage of caching, it is recommended to use the option stick_to_shard=True with the reader operators, to limit the amount of unique images seen by the decoder in a multi node environment

  • device_memory_padding (int, optional, default = 16777216) – `mixed` backend only Padding for nvJPEG’s device memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • host_memory_padding (int, optional, default = 8388608) – `mixed` backend only Padding for nvJPEG’s host memory allocations in bytes. This parameter helps to avoid reallocation in nvJPEG whenever a bigger image is encountered and internal buffer needs to be reallocated to decode it.

  • hybrid_huffman_threshold (int, optional, default = 1000000) – `mixed` backend only Images with number of pixels (height * width) above this threshold will use the nvJPEG hybrid Huffman decoder. Images below will use the nvJPEG full host huffman decoder. N.B.: Hybrid Huffman decoder still uses mostly the CPU.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of output image.

  • seed (int, optional, default = -1) – Random seed (If not provided it will be populated based on the global seed of the pipeline)

  • split_stages (bool, optional, default = False) – `mixed` backend only Split into separated CPU stage and GPU stage operators

  • use_chunk_allocator (bool, optional, default = False) – Experimental, `mixed` backend only Use chunk pinned memory allocator, allocating chunk of size batch_size*prefetch_queue_depth during the construction and suballocate them in runtime. Ignored when split_stages is false.