Operator Objects (Legacy)#

In older versions of DALI, an object-oriented API was used to define operations instead of a functional API. The use of the object API is discouraged now and its documentation is shown here for reference purposes.

The legacy object “operators” are contained in the dali.ops module and their names are camel cased, instead of snake cased. For example, dali.ops.CropMirrorNormalize is the legacy counterpart of dali.fn.crop_mirror_normalize.

When using the operator object API, the definition of the operator is separated from its use in a DALI pipeline, which allows to set static arguments during instantiation.

Here is an example pipeline using the (recommended) functional API:

import nvidia.dali as dali

pipe = dali.pipeline.Pipeline(batch_size = 3, num_threads = 2, device_id = 0)
with pipe:
    files, labels = dali.fn.readers.file(file_root = "./my_file_root")
    images = dali.fn.decoders.image(files, device = "mixed")
    images = dali.fn.rotate(images, angle = dali.fn.random.uniform(range=(-45,45)))
    images = dali.fn.resize(images, resize_x = 300, resize_y = 300)
    pipe.set_outputs(images, labels)

pipe.build()
outputs = pipe.run()

and the legacy implementation using the operator object API:

import nvidia.dali as dali

class CustomPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id):
        super(CustomPipe, self).__init__(batch_size, num_threads, device_id)
        self.reader = dali.ops.readers.File(file_root='./my_file_root')
        self.decoder = dali.ops.ImageDecoder(device='mixed')
        self.rotate = dali.ops.Rotate()
        self.resize = dali.ops.Resize(resize_x=300, resize_y=300)
        self.rng = dali.ops.random.Uniform(range=(-45, 45))

    def define_graph(self):
        files, labels = self.reader()
        images = self.decoder(files)
        images = self.rotate(images, angle=self.rng())
        images = self.resize(images)
        return images, labels

pipe = CustomPipe(batch_size = 3, num_threads = 2, device_id = 0)
pipe.build()
outputs = pipe.run()

It is worth noting that the two APIs can be used together in a single pipeline. Here is an example of that:

pipe = dali.pipeline.Pipeline(batch_size = 3, num_threads = 2, device_id = 0)
reader = dali.ops.readers.File(file_root = ".")
resize = dali.ops.Resize(device = "gpu", resize_x = 300, resize_y = 300)

with pipe:
    files, labels = reader()
    images = dali.fn.decoders.image(files, device = "mixed")
    images = dali.fn.rotate(images, angle = dali.fn.random.uniform(range=(-45,45)))
    images = resize(images)
    pipe.set_outputs(images, labels)

pipe.build()
outputs = pipe.run()

Mapping to Functional API#

The following table shows the correspondence between the operations in the current functional API and the legacy operator objects API.

Function (fn.*)

Operator Object (ops.*)

audio_decoder

AudioDecoder

audio_resample

AudioResample

batch_permutation

BatchPermutation

bb_flip

BbFlip

bbox_paste

BBoxPaste

box_encoder

BoxEncoder

brightness

Brightness

brightness_contrast

BrightnessContrast

caffe2_reader

Caffe2Reader

caffe_reader

CaffeReader

cast

Cast

cast_like

CastLike

cat

Cat

coco_reader

COCOReader

coin_flip

CoinFlip

color_space_conversion

ColorSpaceConversion

color_twist

ColorTwist

contrast

Contrast

coord_flip

CoordFlip

coord_transform

CoordTransform

copy

Copy

crop

Crop

crop_mirror_normalize

CropMirrorNormalize

dl_tensor_python_function

DLTensorPythonFunction

dump_image

DumpImage

element_extract

ElementExtract

erase

Erase

expand_dims

ExpandDims

external_source

ExternalSource

fast_resize_crop_mirror

FastResizeCropMirror

file_reader

FileReader

flip

Flip

full

Full

full_like

FullLike

gaussian_blur

GaussianBlur

get_property

GetProperty

grid_mask

GridMask

hsv

Hsv

hue

Hue

image_decoder

ImageDecoder

image_decoder_crop

ImageDecoderCrop

image_decoder_random_crop

ImageDecoderRandomCrop

image_decoder_slice

ImageDecoderSlice

jitter

Jitter

jpeg_compression_distortion

JpegCompressionDistortion

laplacian

Laplacian

lookup_table

LookupTable

mel_filter_bank

MelFilterBank

mfcc

MFCC

multi_paste

MultiPaste

mxnet_reader

MXNetReader

nemo_asr_reader

NemoAsrReader

nonsilent_region

NonsilentRegion

normal_distribution

NormalDistribution

normalize

Normalize

numba_function

NumbaFunction

numpy_reader

NumpyReader

one_hot

OneHot

ones

Ones

ones_like

OnesLike

optical_flow

OpticalFlow

pad

Pad

paste

Paste

peek_image_shape

PeekImageShape

per_frame

PerFrame

permute_batch

PermuteBatch

power_spectrum

PowerSpectrum

preemphasis_filter

PreemphasisFilter

python_function

PythonFunction

random_bbox_crop

RandomBBoxCrop

random_crop_generator

RandomCropGenerator

random_resized_crop

RandomResizedCrop

reinterpret

Reinterpret

reshape

Reshape

resize

Resize

resize_crop_mirror

ResizeCropMirror

roi_random_crop

ROIRandomCrop

rotate

Rotate

saturation

Saturation

sequence_reader

SequenceReader

sequence_rearrange

SequenceRearrange

shapes

Shapes

slice

Slice

spectrogram

Spectrogram

sphere

Sphere

squeeze

Squeeze

ssd_random_crop

SSDRandomCrop

stack

Stack

tfrecord_reader

TFRecordReader

to_decibels

ToDecibels

torch_python_function

TorchPythonFunction

transpose

Transpose

uniform

Uniform

video_reader

VideoReader

video_reader_resize

VideoReaderResize

warp_affine

WarpAffine

water

Water

zeros

Zeros

zeros_like

ZerosLike

decoders.audio

decoders.Audio

decoders.image

decoders.Image

decoders.image_crop

decoders.ImageCrop

decoders.image_random_crop

decoders.ImageRandomCrop

decoders.image_slice

decoders.ImageSlice

experimental.audio_resample

experimental.AudioResample

experimental.debayer

experimental.Debayer

experimental.dilate

experimental.Dilate

experimental.equalize

experimental.Equalize

experimental.erode

experimental.Erode

experimental.filter

experimental.Filter

experimental.inflate

experimental.Inflate

experimental.median_blur

experimental.MedianBlur

experimental.peek_image_shape

experimental.PeekImageShape

experimental.remap

experimental.Remap

experimental.resize

experimental.Resize

experimental.tensor_resize

experimental.TensorResize

experimental.warp_perspective

experimental.WarpPerspective

experimental.decoders.image

experimental.decoders.Image

experimental.decoders.image_crop

experimental.decoders.ImageCrop

experimental.decoders.image_random_crop

experimental.decoders.ImageRandomCrop

experimental.decoders.image_slice

experimental.decoders.ImageSlice

experimental.decoders.video

experimental.decoders.Video

experimental.inputs.video

experimental.inputs.Video

experimental.readers.fits

experimental.readers.Fits

experimental.readers.video

experimental.readers.Video

io.file.read

io.file.Read

noise.gaussian

noise.Gaussian

noise.salt_and_pepper

noise.SaltAndPepper

noise.shot

noise.Shot

plugin.video.decoder

plugin.video.Decoder

random.beta

random.Beta

random.choice

random.Choice

random.coin_flip

random.CoinFlip

random.normal

random.Normal

random.uniform

random.Uniform

readers.caffe

readers.Caffe

readers.caffe2

readers.Caffe2

readers.coco

readers.COCO

readers.file

readers.File

readers.mxnet

readers.MXNet

readers.nemo_asr

readers.NemoAsr

readers.numpy

readers.Numpy

readers.sequence

readers.Sequence

readers.tfrecord

readers.TFRecord

readers.video

readers.Video

readers.video_resize

readers.VideoResize

readers.webdataset

readers.Webdataset

reductions.max

reductions.Max

reductions.mean

reductions.Mean

reductions.mean_square

reductions.MeanSquare

reductions.min

reductions.Min

reductions.rms

reductions.RMS

reductions.std_dev

reductions.StdDev

reductions.sum

reductions.Sum

reductions.variance

reductions.Variance

segmentation.random_mask_pixel

segmentation.RandomMaskPixel

segmentation.random_object_bbox

segmentation.RandomObjectBBox

segmentation.select_masks

segmentation.SelectMasks

transforms.combine

transforms.Combine

transforms.crop

transforms.Crop

transforms.rotation

transforms.Rotation

transforms.scale

transforms.Scale

transforms.shear

transforms.Shear

transforms.translation

transforms.Translation

Modules#

nvidia.dali.ops#

class nvidia.dali.ops.AudioDecoder(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use decoders.Audio() instead.

In DALI 1.0 all decoders were moved into a dedicated decoders submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for decoders.audio().

class nvidia.dali.ops.AudioResample(*, device='cpu', **kwargs)#

Resamples an audio signal.

The resampling is achieved by applying a sinc filter with Hann window with an extent controlled by the quality argument.

The resampling ratio can be specified directly or as a ratio of target to source sampling rate, or calculated from the ratio of the requested output length to the input length.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    The ouput type.

    If not specified, the output type is the same as the input type. When the type is changed, the values are normalized to fill the dynamic range of the target data type. When converting floating point inputs to integer types, the input values are assumed to be in -1..1 range. When converting between signed and unsigned types, 0 translates to half-range of the unsigned type. Example:

    float -> uint8
    -1.0  -> 0
    0     -> 128
    1.0   -> 255
    
    uint8 -> int16
    0     -> -32767
    127   -> -128
    128   ->  128
    255   ->  32767
    
    uint16 -> float
    0      -> -1
    32767  -> -0.000015
    32768  ->  0.000015
    65535  ->  1
    

  • in_rate (float or TensorList of float, optional) –

    Input sampling rate.

    The sampling rate of the input sample. This parameter must be specified together with out_rate. The value is relative to out_rate and doesn’t need to use any specific unit as long as the units of input and output rates match.

    The in_rate and out_rate parameters cannot be specified together with scale or out_length.

  • out_length (int or TensorList of int, optional) –

    The requested output length, in samples.

    The scaling factor is the ratio of this output length to the input length. This parameter cannot be specified together with in_rate, out_rate or scale.

  • out_rate (float or TensorList of float, optional) –

    Output sampling rate.

    The requested output sampling rate. This parameter must be specified together with in_rate. The value is relative to in_rate and doesn’t need to use any specific unit as long as the units of input and output rates match.

    The in_rate and out_rate parameters cannot be specified together with scale or out_length.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • quality (float, optional, default = 50.0) –

    Resampling quality, where 0 is the lowest, and 100 is the highest.

    0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.

  • scale (float or TensorList of float, optional) –

    The scaling factor.

    The scaling factor is the ratio of the target sampling rate to source sampling rate. For example, a scale=2 will produce an output with twice as many samples as the input.

    This parameter cannot be specified together with in_rate and out_rate or out_length.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.BBoxPaste(*, device='cpu', **kwargs)#

Transforms bounding boxes so that the boxes remain in the same place in the image after the image is pasted on a larger canvas.

Corner coordinates are transformed according to the following formula:

(x', y') = (x/ratio + paste_x', y/ratio + paste_y')

Box sizes (if xywh is used) are transformed according to the following formula:

(w', h') = (w/ratio, h/ratio)

Where:

paste_x' = paste_x * (ratio - 1)/ratio
paste_y' = paste_y * (ratio - 1)/ratio

The paste coordinates are normalized so that (0,0) aligns the image to top-left of the canvas and (1,1) aligns it to bottom-right.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • ratio (float or TensorList of float) – Ratio of the canvas size to the input size; the value must be at least 1.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • ltrb (bool, optional, default = False) – True for ltrb or False for xywh.

  • paste_x (float or TensorList of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0).

  • paste_y (float or TensorList of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0).

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.BatchPermutation(*, device='cpu', **kwargs)#

Produces a batch of random integers which can be used as indices for indexing samples in the batch.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • allow_repetitions (bool, optional, default = False) – If true, the output can contain repetitions and omissions.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • no_fixed_points (bool, optional, default = False) – If true, the the output permutation cannot contain fixed points, that is out[i] != i. This argument is ignored when batch size is 1.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.BbFlip(*, device='cpu', **kwargs)#

Flips bounding boxes horizontally or vertically (mirror).

The bounding box coordinates for the input are in the [x, y, width, height] - xywh or [left, top, right, bottom] - ltrb format. All coordinates are in the image coordinate system, that is 0.0-1.0

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • horizontal (int or TensorList of int, optional, default = 1) – Flip horizontal dimension.

  • ltrb (bool, optional, default = False) – True for ltrb or False for xywh.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • vertical (int or TensorList of int, optional, default = 0) – Flip vertical dimension.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.BoxEncoder(*, device='cpu', **kwargs)#

Encodes the input bounding boxes and labels using a set of default boxes (anchors) passed as an argument.

This operator follows the algorithm described in “SSD: Single Shot MultiBox Detector” and implemented in mlperf/training. Inputs must be supplied as the following Tensors:

  • BBoxes that contain bounding boxes that are represented as [l,t,r,b].

  • Labels that contain the corresponding label for each bounding box.

The results are two tensors:

  • EncodedBBoxes that contain M-encoded bounding boxes as [l,t,r,b], where M is number of anchors.

  • EncodedLabels that contain the corresponding label for each encoded box.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • anchors (float or list of float) – Anchors to be used for encoding, as the list of floats is in the ltrb format.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • criteria (float, optional, default = 0.5) –

    Threshold IoU for matching bounding boxes with anchors.

    The value needs to be between 0 and 1.

  • means (float or list of float, optional, default = [0.0, 0.0, 0.0, 0.0]) – [x y w h] mean values for normalization.

  • offset (bool, optional, default = False) – Returns normalized offsets ((encoded_bboxes*scale - anchors*scale) - mean) / stds in EncodedBBoxes that use std and the mean and scale arguments.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • scale (float, optional, default = 1.0) – Rescales the box and anchor values before the offset is calculated (for example, to return to the absolute values).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • stds (float or list of float, optional, default = [1.0, 1.0, 1.0, 1.0]) – [x y w h] standard deviations for offset normalization.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.BoxEncoder() class for complete information.

class nvidia.dali.ops.Brightness(*, device='cpu', **kwargs)#

Adjusts the brightness of the images.

The brightness is adjusted based on the following formula:

out = brightness_shift * output_range + brightness * in

Where output_range is 1 for float outputs or the maximum positive value for integral types.

This operator can also change the type of data.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • brightness (float or TensorList of float, optional, default = 1.0) –

    Brightness multiplier.

    Supports per-frame inputs.

  • brightness_shift (float or TensorList of float, optional, default = 0.0) –

    The brightness shift.

    For signed types, 1.0 represents the maximum positive value that can be represented by the type.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('FHWC', 'DHWC', 'HWC')) – Input to the operator.

class nvidia.dali.ops.BrightnessContrast(*, device='cpu', **kwargs)#

Adjusts the brightness and contrast of the images.

The brightness and contrast are adjusted based on the following formula:

out = brightness_shift * output_range +
      brightness * (contrast_center + contrast * (in - contrast_center))

Where the output_range is 1 for float outputs or the maximum positive value for integral types.

This operator can also change the type of data.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • brightness (float or TensorList of float, optional, default = 1.0) –

    Brightness multiplier.

    Supports per-frame inputs.

  • brightness_shift (float or TensorList of float, optional, default = 0.0) –

    The brightness shift.

    For signed types, 1.0 represents the maximum positive value that can be represented by the type.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • contrast (float or TensorList of float, optional, default = 1.0) –

    The contrast multiplier, where 0.0 produces the uniform grey.

    Supports per-frame inputs.

  • contrast_center (float or TensorList of float, optional, default = 0.5) –

    The intensity level that is unaffected by contrast.

    This is the value that all pixels assume when the contrast is zero. When not set, the half of the input type’s positive range (or 0.5 for float) is used.

    Supports per-frame inputs.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('FHWC', 'DHWC', 'HWC')) – Input to the operator.

class nvidia.dali.ops.COCOReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.COCO() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.coco().

class nvidia.dali.ops.Caffe2Reader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.Caffe2() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.caffe2().

class nvidia.dali.ops.CaffeReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.Caffe() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.caffe().

class nvidia.dali.ops.Cast(*, device='cpu', **kwargs)#

Cast a tensor to a different type.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • dtype (nvidia.dali.types.DALIDataType) – Output data type.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.CastLike(*, device='cpu', **kwargs)#

Cast the first tensor to the type of the second tensor.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.CastLike() class for complete information.

class nvidia.dali.ops.Cat(*, device='cpu', **kwargs)#

Joins the input tensors along an existing axis.

The shapes of the inputs must match in all dimensions except the concatenation axis.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axis (int, optional, default = 0) –

    Axis along which the input tensors are concatenated.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

  • axis_name (str, optional) –

    Name of the axis along which the tensors are concatenated.

    This argument is mutually exclusive with axis. This argument requires that at least one input has a non-empty layout and that all non-empty input layouts match.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.Cat() class for complete information.

class nvidia.dali.ops.CoinFlip(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use random.CoinFlip() instead.

Generates random boolean values following a bernoulli distribution.

The probability of generating a value 1 (true) is determined by the probability argument.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • probability (float or TensorList of float, optional, default = 0.5) – Probability of value 1.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.ColorSpaceConversion(*, device='cpu', **kwargs)#

Converts between various image color models.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • image_type (nvidia.dali.types.DALIImageType) – The color space of the input image.

  • output_type (nvidia.dali.types.DALIImageType) – The color space of the output image.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('FDHWC', 'FHWC', 'DHWC', 'HWC')) – Input to the operator.

class nvidia.dali.ops.ColorTwist(*, device='cpu', **kwargs)#

Adjusts hue, saturation, brightness and contrast of the image.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • brightness (float or TensorList of float, optional, default = 1.0) –

    Brightness change factor.

    Values must be non-negative.

    Example values:

    • 0 - Black image.

    • 1 - No change.

    • 2 - Increase brightness twice.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • contrast (float or TensorList of float, optional, default = 1.0) –

    Contrast change factor.

    Values must be non-negative.

    Example values:

    • 0 - Uniform grey image.

    • 1 - No change.

    • 2 - Increase brightness twice.

    Supports per-frame inputs.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • hue (float or TensorList of float, optional, default = 0.0) –

    Hue change, in degrees.

    Supports per-frame inputs.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • saturation (float or TensorList of float, optional, default = 1.0) –

    Saturation change factor.

    Values must be non-negative.

    Example values:

    • 0 - Completely desaturated image.

    • 1 - No change to image’s saturation.

    Supports per-frame inputs.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'DHWC')) – Input to the operator.

nvidia.dali.ops.Compose(op_list)#

Returns a meta-operator that chains the operations in op_list.

The return value is a callable object which, when called, performs:

op_list[n-1](op_list([n-2](...  op_list[0](args))))

Operators can be composed only when all outputs of the previous operator can be processed directly by the next operator in the list.

The example below chains an image decoder and a Resize operation with random square size. The decode_and_resize object can be called as if it was an operator:

decode_and_resize = ops.Compose([
    ops.decoders.Image(device="cpu"),
    ops.Resize(size=fn.random.uniform(range=400,500)), device="gpu")
])

files, labels = fn.readers.caffe(path=caffe_db_folder, seed=1)
pipe.set_outputs(decode_and_resize(files), labels)

If there’s a transition from CPU to GPU in the middle of the op_list, as is the case in this example, Compose automatically arranges copying the data to GPU memory.

Note

This is an experimental feature, subject to change without notice.

class nvidia.dali.ops.Contrast(*, device='cpu', **kwargs)#

Adjusts the contrast of the images.

The contrast is adjusted based on the following formula:

out = contrast_center + contrast * (in - contrast_center)

This operator can also change the type of data.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • contrast (float or TensorList of float, optional, default = 1.0) –

    The contrast multiplier, where 0.0 produces the uniform grey.

    Supports per-frame inputs.

  • contrast_center (float or TensorList of float, optional, default = 0.5) –

    The intensity level that is unaffected by contrast.

    This is the value that all pixels assume when the contrast is zero. When not set, the half of the input type’s positive range (or 0.5 for float) is used.

    Supports per-frame inputs.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('FHWC', 'DHWC', 'HWC')) – Input to the operator.

class nvidia.dali.ops.CoordFlip(*, device='cpu', **kwargs)#

Transforms vectors or points by flipping (reflecting) their coordinates with respect to a given center.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • center_x (float or TensorList of float, optional, default = 0.5) – Flip center in the horizontal axis.

  • center_y (float or TensorList of float, optional, default = 0.5) – Flip center in the vertical axis.

  • center_z (float or TensorList of float, optional, default = 0.5) – Flip center in the depthwise axis.

  • flip_x (int or TensorList of int, optional, default = 1) – Flip the horizontal (x) coordinate.

  • flip_y (int or TensorList of int, optional, default = 0) – Flip the vertical (y) coordinate.

  • flip_z (int or TensorList of int, optional, default = 0) – Flip the depthwise (z) coordinate.

  • layout (layout str, optional, default = ‘’) –

    Determines the order of coordinates in the input.

    The string should consist of the following characters:

    • ”x” (horizontal coordinate),

    • ”y” (vertical coordinate),

    • ”z” (depthwise coordinate),

    Note

    If left empty, depending on the number of dimensions, the “x”, “xy”, or “xyz” values are assumed.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.CoordTransform(*, device='cpu', **kwargs)#

Applies a linear transformation to points or vectors.

The transformation has the form:

out = M * in + T

Where M is a m x n matrix and T is a translation vector with m components. Input must consist of n-element vectors or points and the output has m components.

This operator can be used for many operations. Here’s the (incomplete) list:

  • applying affine transform to point clouds

  • projecting points onto a subspace

  • some color space conversions, for example RGB to YCbCr or grayscale

  • linear operations on colors, like hue rotation, brightness and contrast adjustment

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • M (float or list of float or TensorList of float, optional) –

    The matrix used for transforming the input vectors.

    If left unspecified, identity matrix is used.

    The matrix M does not need to be square - if it’s not, the output vectors will have a number of components equal to the number of rows in M.

    If a scalar value is provided, M is assumed to be a square matrix with that value on the diagonal. The size of the matrix is then assumed to match the number of components in the input vectors.

    Supports per-frame inputs.

  • MT (float or list of float or TensorList of float, optional) –

    A block matrix [M T] which combines the arguments M and T.

    Providing a scalar value for this argument is equivalent to providing the same scalar for M and leaving T unspecified.

    The number of columns must be one more than the number of components in the input. This argument is mutually exclusive with M and T.

    Supports per-frame inputs.

  • T (float or list of float or TensorList of float, optional) –

    The translation vector.

    If left unspecified, no translation is applied unless MT argument is used.

    The number of components of this vector must match the number of rows in matrix M. If a scalar value is provided, that value is broadcast to all components of T and the number of components is chosen to match the number of rows in M.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Data type of the output coordinates.

    If an integral type is used, the output values are rounded to the nearest integer and clamped to the dynamic range of this type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Copy(*, device='cpu', **kwargs)#

Creates a copy of the input tensor.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Crop(*, device='cpu', **kwargs)#

Crops the images with the specified window dimensions and window position (upper left corner).

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Supported types: FLOAT, FLOAT16, and UINT8.

    If not set, the input type is used.

  • fill_values (float or list of float, optional, default = [0.0]) –

    Determines padding values and is only relevant if out_of_bounds_policy is set to “pad”.

    If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension C in the layout) in the output slice.

  • image_type (nvidia.dali.types.DALIImageType) –

    Warning

    The argument image_type is no longer used and will be removed in a future release.

  • out_of_bounds_policy (str, optional, default = ‘error’) –

    Determines the policy when slicing the out of bounds area of the input.

    Here is a list of the supported values:

    • "error" (default): Attempting to slice outside of the bounds of the input will produce an error.

    • "pad": The input will be padded as needed with zeros or any other value that is specified with the fill_values argument.

    • "trim_to_shape": The slice window will be cut to the bounds of the input.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rounding (str, optional, default = ‘round’) –

    Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

    Possible values are:

    • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
    • "truncate" - Discards the fractional part of the number (truncates towards zero).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'CHW', 'DHWC', 'CDHW', 'FHWC', 'FCHW', 'CFHW', 'FDHWC', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.CropMirrorNormalize(*, device='cpu', **kwargs)#

Performs fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting.

Normalization takes the input images and produces the output by using the following formula:

output = scale * (input - mean) / std + shift

Note

If no cropping arguments are specified, only mirroring and normalization will occur.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Output data type.

    Supported types: FLOAT, FLOAT16, INT8, UINT8.

  • fill_values (float or list of float, optional, default = [0.0]) –

    Determines padding values and is only relevant if out_of_bounds_policy is set to “pad”.

    If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension C in the layout) in the output slice.

  • image_type (nvidia.dali.types.DALIImageType) –

    Warning

    The argument image_type is no longer used and will be removed in a future release.

  • mean (float or list of float or TensorList of float, optional, default = [0.0]) – Mean pixel values for image normalization.

  • mirror (int or TensorList of int, optional, default = 0) – If nonzero, the image will be flipped (mirrored) horizontally.

  • out_of_bounds_policy (str, optional, default = ‘error’) –

    Determines the policy when slicing the out of bounds area of the input.

    Here is a list of the supported values:

    • "error" (default): Attempting to slice outside of the bounds of the input will produce an error.

    • "pad": The input will be padded as needed with zeros or any other value that is specified with the fill_values argument.

    • "trim_to_shape": The slice window will be cut to the bounds of the input.

  • output_layout (layout str, optional, default = ‘CHW’) – Tensor data layout for the output.

  • pad_output (bool, optional, default = False) –

    Determines whether to pad the output so that the number of channels is a power of 2.

    The value used for padding is determined by the fill_values argument.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rounding (str, optional, default = ‘round’) –

    Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

    Possible values are:

    • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
    • "truncate" - Discards the fractional part of the number (truncates towards zero).

  • scale (float, optional, default = 1.0) –

    The value by which the result is multiplied.

    This argument is useful when using integer outputs to improve dynamic range utilization.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shift (float, optional, default = 0.0) –

    The value added to the (scaled) result.

    This argument is useful when using unsigned integer outputs to improve dynamic range utilization.

  • std (float or list of float or TensorList of float, optional, default = [1.0]) – Standard deviation values for image normalization.

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'CHW', 'DHWC', 'CDHW', 'FHWC', 'FCHW', 'CFHW', 'FDHWC', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.DLTensorPythonFunction(function, num_outputs=1, device='cpu', synchronize_stream=True, batch_processing=True, **kwargs)#

Executes a Python function that operates on DLPack tensors.

The function should not modify input tensors.

For the GPU operator, it is the user’s responsibility to synchronize the device code with DALI. To synchronize the device code with DALI, synchronize DALI’s work before the operator call with the synchronize_stream flag (enabled by default) and ensure that the scheduled device tasks are finished in the operator call. The GPU code can be executed on the CUDA stream used by DALI, which can be obtained by calling the current_dali_stream() function. In this case, the synchronize_stream flag can be set to False.

Warning

This operator is not compatible with TensorFlow integration.

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • function (object) –

    A callable object that defines the function of the operator.

    Warning

    The function must not hold a reference to the pipeline in which it is used. If it does, a circular reference to the pipeline will form and the pipeline will never be freed.

  • batch_processing (bool, optional, default = False) –

    Determines whether the function is invoked once per batch or separately for every sample in the batch.

    If set to True, the function will receive its arguments as lists of DLPack tensors.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_outputs (int, optional, default = 1) – Number of outputs.

  • output_layouts (layout str or list of layout str, optional) –

    Tensor data layouts for the outputs.

    This argument can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set and the rest of the outputs have no layout assigned.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • synchronize_stream (bool, optional, default = True) –

    Ensures that DALI synchronizes its CUDA stream before calling the Python function.

    Warning

    This argument should be set to False only if the called function schedules device work to the stream that is used by DALI.

class nvidia.dali.ops.DumpImage(*, device='cpu', **kwargs)#

Save images in batch to disk in PPM format.

Useful for debugging.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • input_layout (layout str, optional, default = ‘HWC’) – Layout of input images.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • suffix (str, optional, default = ‘’) – Suffix to be added to output file names.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.ElementExtract(*, device='cpu', **kwargs)#

Extracts one or more elements from input sequence.

The outputs are slices in the first (outermost) dimension of the input. There are as many outputs as the elements provided in the element_map.

For example, for element_map = [2, 0, 3] there will be three outputs, containing 2nd, 0th and 3rd element of the input sequences respectively.

The input layout, if provided, must begin with F dimension. The outputs will have one less dimension than the input, that is for FHWC inputs, the outputs will be HWC elements.

This operator expects sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • element_map (int or list of int) – Indices of the elements to extract.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Erase(*, device='cpu', **kwargs)#

Erases one or more regions from the input tensors.

The region is specified by an anchor (starting point) and a shape (dimensions). Only the relevant dimensions are specified. Not specified dimensions are treated as if the entire range of the axis was provided. To specify multiple regions, anchor and shape represent multiple points consecutively (for example, anchor = (y0, x0, y1, x1, …) and shape = (h0, w0, h1, w1, …)). The anchor and shape arguments are interpreted based on the value of the axis_names argument, or, alternatively, the value of the axes argument. If no axis_names or axes arguments are provided, all dimensions except C (channels) must be specified.

Example 1:

anchor = (10, 20), shape = (190, 200), axis_names = “HW”, fill_value = 0

input: layout = “HWC”, shape = (300, 300, 3)

The erase region covers the range between 10 and 200 in the vertical dimension (height) and between 20 and 220 in the horizontal dimension (width). The range for the channel dimension was not specified, so it is between 0 and 3. What gives:

output[y, x, c] = 0   if 20 <= x < 220 and 10 <= y < 200
output[y, x, c] = input[y, x, c]  otherwise

Example 2:

anchor = (10, 250), shape = (20, 30), axis_names = “W”, fill_value = (118, 185, 0)

input: layout = “HWC”, shape = (300, 300, 3)

Two erase regions are provided, which covers two vertical bands that range from x=(10, 30) and x=(250, 280), respectively. Each pixel in the erased regions is filled with a multi-channel value (118, 185, 0). What gives:

output[y, x, :] = (118, 185, 0)   if 10 <= x < 30 or 250 <= x < 280
output[y, x, :] = input[y, x, :]  otherwise

Example 3:

anchor = (0.15, 0.15), shape = (0.3, 0.3), axis_names = “HW”, fill_value = 100, normalized = True

input: layout = “HWC”, shape = (300, 300, 3)

One erase region with normalized coordinates in the height, and the width dimensions is provided. A fill value is provided for all the channels. The coordinates can be transformed to the absolute by multiplying by the input shape. What gives:

if (0.15 * 300 <= x < (0.3 + 0.15) * 300 and
    0.15 * 300 <= y < (0.3 + 0.15) * 300():
  output[y, x, c] = 100
else:
  output[y, x, c] = input[y, x, c]

Example 4: anchor = (0.15, 0.15), shape = (20, 30), normalized_anchor = True, normalized_shape = False

input: layout = “HWC”, shape = (300, 300, 3)

One erase region with an anchor is specified in normalized coordinates and the shape in absolute coordinates. Since no axis_names is provided, the anchor and shape must contain all dimensions except “C” (channels). What gives:

if (0.15 * 300 <= x < (0.15 * 300) + 20 and
   (0.15 * 300) <= y < (0.15 * 300) + 30):
  output[y, x, c] = 0
else:
  output[y, x, c] = input[y, x, c]

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • anchor (float or list of float or TensorList of float, optional, default = []) –

    Coordinates for the anchor or the starting point of the erase region.

    Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.

  • axes (int or list of int, optional, default = [1, 0]) –

    Order of dimensions used for anchor and shape arguments, as dimension indices.

    For instance, axes=(1, 0) means the coordinates in anchor and shape refer to axes 1 and 0 in that particular order.

  • axis_names (str, optional, default = ‘HW’) –

    Order of dimensions that are used for the anchor and shape arguments, as described in the layout.

    For instance, axis_names=”HW” means that the coordinates in anchor and shape refer to dimensions H (height) and W (width) in that particular order.

    Note

    axis_name*s has a higher priority than *axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • centered_anchor (bool, optional, default = False) –

    If set to True, the anchors refer to the center of the region instead of the top-left corner.

    This results in centered erased regions at the specified anchor.

  • fill_value (float or list of float or TensorList of float, optional, default = [0.0]) –

    Value to fill the erased region.

    Might be specified as one value (for example, 0) or a multi-channel value (for example, (200, 210, 220)). If a multi-channel fill value is provided, the input layout should contain a channel dimension C.

  • normalized (bool, optional, default = False) –

    Determines whether the anchor and shape arguments should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.

    Providing a value for the normalized_shape and normalized_anchor arguments separately is mutually exclusive.

  • normalized_anchor (bool, optional, default = False) –

    Determines whether the anchor argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.

    Providing a value for normalized is mutually exclusive.

  • normalized_shape (bool, optional, default = False) –

    Determines whether the shape argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.

    Providing a value for normalized is mutually exclusive.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (float or list of float or TensorList of float, optional, default = []) –

    Values for shape or dimensions of the erase region.

    Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.ExpandDims(*, device='cpu', **kwargs)#

Insert new dimension(s) with extent 1 to the data shape.

The new dimensions are inserted at the positions specified by axes.

If new_axis_names is provided, the new dimension names will be inserted in the data layout, at the positions specified by axes. If new_axis_names is not provided, the output data layout will be empty.”

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int or TensorList of int) – Indices at which the new dimensions are inserted.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • new_axis_names (layout str, optional, default = ‘’) –

    Names of the new dimensions in the data layout.

    The length of new_axis_names must match the length of axes. If argument isn’t be provided, the layout will be cleared.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__data (TensorList) – Data to be expanded

class nvidia.dali.ops.ExternalSource(source=None, num_outputs=None, *, cycle=None, layout=None, dtype=None, ndim=None, name=None, device='cpu', cuda_stream=None, use_copy_kernel=None, batch=None, parallel=None, no_copy=None, prefetch_queue_depth=None, bytes_per_sample_hint=None, batch_info=None, repeat_last=False, **kwargs)#

ExternalSource is a special operator that can provide data to a DALI pipeline from Python by several methods.

The simplest and preferred way is to specify a source, which can be a callable or iterable.

Note

nvidia.dali.fn.external_source() operator is partially compatible with TensorFlow integration via nvidia.dali.plugin.tf.experimental.DALIDatasetWithInputs(). Please refer to its documentation for details.

Note

To return a batch of copies of the same tensor, use nvidia.dali.types.Constant(), which is more performant.

Parameters:
  • source (callable or iterable) –

    The source of the data.

    The source is polled for data (via a call source() or next(source)) when the pipeline needs input for the next iteration. Depending on the value of num_outputs, the source can supply one or more data items. The data item can be a whole batch (default) or a single batch entry (when batch==False). If num_outputs is not set, the source is expected to return one item (a batch or a sample). If this value is specified (even if its value is 1), the data is expected to a be tuple, or list, where each element corresponds to respective return value of the external_source.

    The data samples must be in one of the compatible array types:

    • NumPy ndarray (CPU)

    • MXNet ndarray (CPU)

    • PyTorch tensor (CPU or GPU)

    • CuPy array (GPU)

    • objects implementing __cuda_array_interface__

    • DALI Tensor object

    Batch sources must produce entire batches of data. This can be achieved either by adding a new outermost dimension to an array or by returning a list of arrays (in which case they can be of different size, but must have the same rank and element type). A batch source can also produce a DALI TensorList object, which can be an output of another DALI pipeline.

    A per-batch source may accept one positional argument. If it does, it is the index of current iteration within epoch and consecutive calls will be source(0), source(1), and so on. If batch_info is set to True, instance of nvidia.dali.types.BatchInfo will be passed to the source, instead of a plain index.

    A per-sample source may accept one positional argument of type nvidia.dali.types.SampleInfo, which contains index of the sample in current epoch and in the batch, as well as current iteration number.

    If the source is a generator function, the function is invoked and treated as an iterable. However, unlike a generator, the function can be used with cycle. In this case, the function will be called again when the generator reaches the end of iteration.

    For GPU inputs, it is a user’s responsibility to modify the provided GPU memory content only in the provided stream. DALI schedules a copy on this stream, and all work is properly queued. If no stream is provided, DALI will use a default, with a best-effort approach at correctness. See the cuda_stream argument documentation for more information.

    Note

    After restoring from checkpoint, queries to sources that are single-argument callables (accepting nvidia.dali.types.SampleInfo, nvidia.dali.types.BatchInfo or batch index) will be resumed from the epoch and iteration saved in the checkpoint.

  • num_outputs (int, optional) –

    If specified, denotes the number of TensorLists that are produced by the source function.

    If set, the operator returns a list of DataNode objects, otherwise a single DataNode object is returned.

Keyword Arguments:
  • cycle (string or bool, optional) –

    Specifies if and how to cycle through the source. It can be one of the following values:

    • "no", False or None - don’t cycle; StopIteration is raised when

    end of data is reached; this is the default behavior * "quiet" or True - the data is repeated indefinitely, * "raise" - when the end of data is reached, StopIteration is raised, but the iteration is restarted on subsequent call.

    This flag requires that the source is a collection, for example, an iterable object where iter(source) returns a fresh iterator on each call, or a generator function. In the latter case, the generator function is called again when more data than was yielded by the function is requested.

    Specifying "raise" can be used with DALI iterators to create a notion of epoch.

  • name (str, optional) –

    The name of the data node.

    Used when feeding the data with a call to feed_input and can be omitted if the data is provided by source.

  • layout (layout str or list/tuple thereof, optional) –

    If provided, sets the layout of the data.

    When num_outputs > 1, the layout can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set, the rest of the outputs don’t have a layout set.

  • dtype (nvidia.dali.types.DALIDataType or list/tuple thereof, optional) –

    Input data type.

    When num_outputs > 1, the dtype can be a list that contains a distinct value for each output.

    The operator will validate that the fetched data is of the provided type. If the argument is omitted or DALIDataType.NO_TYPE is passed, the operator will infer the type from the provided data.

    This argument will be required starting from DALI 2.0.

  • ndim (int or list/tuple thereof, optional) –

    Number of dimensions in the input data.

    When num_outputs > 1, the ndim can be a list that contains a distinct value for each output.

    The dimensionality of the data provided to the operator will be verified against this value. Number of dimensions can be also inferred from the layout argument if provided.

    If the layout argument is provided, the ndim must match the number of dimensions in the layout.

    Specifying the input dimensionality will be required starting from DALI 2.0

  • cuda_stream (optional, cudaStream_t or an object convertible to cudaStream_t,) –

    such as cupy.cuda.Stream or torch.cuda.Stream The CUDA stream is used to copy data to the GPU or from a GPU source.

    If this parameter is not set, a best-effort will be taken to maintain correctness. That is, if the data is provided as a tensor/array from a recognized library such as CuPy or PyTorch, the library’s current stream is used. Although this approach works in typical scenarios, with advanced use cases, and code that uses unsupported libraries, you might need to explicitly supply the stream handle.

    This argument has two special values:

    • 0 - Use the default CUDA stream

    • 1 - Use DALI’s internal stream

    If internal stream is used, the call to feed_input will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.

  • use_copy_kernel (bool, optional) –

    If set to True, DALI will use a CUDA kernel to feed the data instead of cudaMemcpyAsync (default).

    Note

    This is applicable only when copying data to and from GPU memory.

  • blocking (bool, optional) – Advanced If True, this operator will block until the data is available (e.g. by calling feed_input). If False, the operator will raise an error, if the data is not available.

  • no_copy (bool, optional) –

    Determines whether DALI should copy the buffer when feed_input is called.

    If set to True, DALI passes the user memory directly to the pipeline, instead of copying it. It is the user responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.

    The buffer can be modified or freed again after the output of the relevant iterations has been consumed. Effectively, it happens after Pipeline’s prefetch_queue_depth or cpu_queue_depth * gpu_queue_depth (when they are not equal) iterations following the feed_input call.

    The memory location must match the specified device parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.

    Automatically set to True when parallel=True

  • batch (bool, optional) –

    If set to True or None, the source is expected to produce an entire batch at once. If set to False, the source is called per-sample.

    Setting parallel to True automatically sets batch to False if it was not provided.

  • batch_info (bool, optional, default = False) – Controls if a callable source that accepts an argument and returns batches should receive class:~nvidia.dali.types.BatchInfo instance or just an integer representing the iteration number. If set to False (the default), only the integer is passed. If source is not callable, does not accept arguments or batch is set to False, setting this flag has no effect.

  • parallel (bool, optional, default = False) –

    If set to True, the corresponding pipeline will start a pool of Python workers to run the callback in parallel. You can specify the number of workers by passing py_num_workers into pipeline’s constructor.

    When parallel is set to True, samples returned by source must be NumPy/MXNet/PyTorch CPU arrays or TensorCPU instances.


    Acceptable sources depend on the value specified for batch parameter.

    If batch is set to False, the source must be:

    • a callable (a function or an object with __call__ method) that accepts

      exactly one argument (SampleInfo instance that represents the index of the requested sample).

    If batch is set to True, the source can be either:

    • a callable that accepts exactly one argument (either BatchInfo

      instance or an integer - see batch_info for details)

    • an iterable,

    • a generator function.


    Warning

    Irrespective of batch value, callables should be stateless - they should produce requested sample or batch solely based on the SampleInfo/BatchInfo instance or index in batch, so that they can be run in parallel in a number of workers.

    The source callback must raise a StopIteration when the end of the data is reached. Note, that due to prefetching, the callback may be invoked with a few iterations past the end of dataset - make sure it consistently raises a StopIteration in that case.


    Warning

    When the pipeline has conditional execution enabled, additional steps must be taken to prevent the source from being rewritten by AutoGraph. There are two ways to achieve this:

    1. Define the function at global scope (i.e. outside of pipeline_def scope).

    2. If function is a result of another “factory” function, then the factory function must be defined outside pipeline definition function and decorated with @do_not_convert.

    More details can be found in @do_not_convert documentation.


    Note

    Callable source can be run in parallel by multiple workers. For batch=True multiple batches can be prepared in parallel, with batch=False it is possible to parallelize computation within the batch.

    When batch=True, callables performance might especially benefit from increasing prefetch_queue_depth so that a few next batches can be computed in parallel.


    Note

    Iterator or generator function will be assigned to a single worker that will iterate over them. The main advantage is execution in parallel to the main Python process, but due to their state it is not possible to calculate more than one batch at a time.

  • repeat_last (bool, optional, default = False) –

    Note

    This is an advanced setting that is usable mainly with Triton Inference Server with decoupled models.

    Normally, external_source consumes its input data and expects new ones to be fed in the upcoming iteration. Setting repeat_last=True changes this behavior so that external_source will detect that no new data was fed between the previous pipeline run and the current one and will self-refeed with the most recent data.

    Setting repeat_last to True only makes sense in “push” mode, i.e. when the data is actively provided by the user via a call to feed_input. Enabling this option is incompatible with specifying the source, which makes the external_source operate in “pull” mode.

  • prefetch_queue_depth (int, optional, default = 1) – When run in parallel=True mode, specifies the number of batches to be computed in advance and stored in the internal buffer, otherwise parameter is ignored.

  • bytes_per_sample_hint (int, optional, default = None) –

    If specified in parallel=True mode, the value serves as a hint when calculating initial capacity of shared memory slots used by the worker processes to pass parallel external source outputs to the pipeline. The argument is ignored in non-parallel mode.

    Setting a value large enough to accommodate the incoming data can prevent DALI from reallocation of shared memory during the pipeline’s run. Furthermore, providing the hint manually can prevent DALI from overestimating the necessary shared memory capacity.

    The value must be a positive integer. Please note that the samples in shared memory are accompanied by some internal meta-data, thus, the actual demand for the shared memory is slightly higher than just the size of binary data produced by the external source. The actual meta-data size depends on the number of factors and, for example, may change between Python or DALI releases without notice.

    Please refer to pipeline’s external_source_shm_statistics for inspecting how much shared memory is allocated for data produced by the pipeline’s parallel external sources.

__call__(*, source=None, cycle=None, name=None, layout=None, dtype=None, ndim=None, cuda_stream=None, use_copy_kernel=None, batch=None, parallel=None, no_copy=None, prefetch_queue_depth=None, bytes_per_sample_hint=None, batch_info=None, repeat_last=False, **kwargs)#
Parameters:
  • source (callable or iterable) –

    The source of the data.

    The source is polled for data (via a call source() or next(source)) when the pipeline needs input for the next iteration. Depending on the value of num_outputs, the source can supply one or more data items. The data item can be a whole batch (default) or a single batch entry (when batch==False). If num_outputs is not set, the source is expected to return one item (a batch or a sample). If this value is specified (even if its value is 1), the data is expected to a be tuple, or list, where each element corresponds to respective return value of the external_source.

    The data samples must be in one of the compatible array types:

    • NumPy ndarray (CPU)

    • MXNet ndarray (CPU)

    • PyTorch tensor (CPU or GPU)

    • CuPy array (GPU)

    • objects implementing __cuda_array_interface__

    • DALI Tensor object

    Batch sources must produce entire batches of data. This can be achieved either by adding a new outermost dimension to an array or by returning a list of arrays (in which case they can be of different size, but must have the same rank and element type). A batch source can also produce a DALI TensorList object, which can be an output of another DALI pipeline.

    A per-batch source may accept one positional argument. If it does, it is the index of current iteration within epoch and consecutive calls will be source(0), source(1), and so on. If batch_info is set to True, instance of nvidia.dali.types.BatchInfo will be passed to the source, instead of a plain index.

    A per-sample source may accept one positional argument of type nvidia.dali.types.SampleInfo, which contains index of the sample in current epoch and in the batch, as well as current iteration number.

    If the source is a generator function, the function is invoked and treated as an iterable. However, unlike a generator, the function can be used with cycle. In this case, the function will be called again when the generator reaches the end of iteration.

    For GPU inputs, it is a user’s responsibility to modify the provided GPU memory content only in the provided stream. DALI schedules a copy on this stream, and all work is properly queued. If no stream is provided, DALI will use a default, with a best-effort approach at correctness. See the cuda_stream argument documentation for more information.

    Note

    After restoring from checkpoint, queries to sources that are single-argument callables (accepting nvidia.dali.types.SampleInfo, nvidia.dali.types.BatchInfo or batch index) will be resumed from the epoch and iteration saved in the checkpoint.

  • num_outputs (int, optional) –

    If specified, denotes the number of TensorLists that are produced by the source function.

    If set, the operator returns a list of DataNode objects, otherwise a single DataNode object is returned.

Keyword Arguments:
  • cycle (string or bool, optional) –

    Specifies if and how to cycle through the source. It can be one of the following values:

    • "no", False or None - don’t cycle; StopIteration is raised when

    end of data is reached; this is the default behavior * "quiet" or True - the data is repeated indefinitely, * "raise" - when the end of data is reached, StopIteration is raised, but the iteration is restarted on subsequent call.

    This flag requires that the source is a collection, for example, an iterable object where iter(source) returns a fresh iterator on each call, or a generator function. In the latter case, the generator function is called again when more data than was yielded by the function is requested.

    Specifying "raise" can be used with DALI iterators to create a notion of epoch.

  • name (str, optional) –

    The name of the data node.

    Used when feeding the data with a call to feed_input and can be omitted if the data is provided by source.

  • layout (layout str or list/tuple thereof, optional) –

    If provided, sets the layout of the data.

    When num_outputs > 1, the layout can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set, the rest of the outputs don’t have a layout set.

  • dtype (nvidia.dali.types.DALIDataType or list/tuple thereof, optional) –

    Input data type.

    When num_outputs > 1, the dtype can be a list that contains a distinct value for each output.

    The operator will validate that the fetched data is of the provided type. If the argument is omitted or DALIDataType.NO_TYPE is passed, the operator will infer the type from the provided data.

    This argument will be required starting from DALI 2.0.

  • ndim (int or list/tuple thereof, optional) –

    Number of dimensions in the input data.

    When num_outputs > 1, the ndim can be a list that contains a distinct value for each output.

    The dimensionality of the data provided to the operator will be verified against this value. Number of dimensions can be also inferred from the layout argument if provided.

    If the layout argument is provided, the ndim must match the number of dimensions in the layout.

    Specifying the input dimensionality will be required starting from DALI 2.0

  • cuda_stream (optional, cudaStream_t or an object convertible to cudaStream_t,) –

    such as cupy.cuda.Stream or torch.cuda.Stream The CUDA stream is used to copy data to the GPU or from a GPU source.

    If this parameter is not set, a best-effort will be taken to maintain correctness. That is, if the data is provided as a tensor/array from a recognized library such as CuPy or PyTorch, the library’s current stream is used. Although this approach works in typical scenarios, with advanced use cases, and code that uses unsupported libraries, you might need to explicitly supply the stream handle.

    This argument has two special values:

    • 0 - Use the default CUDA stream

    • 1 - Use DALI’s internal stream

    If internal stream is used, the call to feed_input will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.

  • use_copy_kernel (bool, optional) –

    If set to True, DALI will use a CUDA kernel to feed the data instead of cudaMemcpyAsync (default).

    Note

    This is applicable only when copying data to and from GPU memory.

  • blocking (bool, optional) – Advanced If True, this operator will block until the data is available (e.g. by calling feed_input). If False, the operator will raise an error, if the data is not available.

  • no_copy (bool, optional) –

    Determines whether DALI should copy the buffer when feed_input is called.

    If set to True, DALI passes the user memory directly to the pipeline, instead of copying it. It is the user responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.

    The buffer can be modified or freed again after the output of the relevant iterations has been consumed. Effectively, it happens after Pipeline’s prefetch_queue_depth or cpu_queue_depth * gpu_queue_depth (when they are not equal) iterations following the feed_input call.

    The memory location must match the specified device parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.

    Automatically set to True when parallel=True

  • batch (bool, optional) –

    If set to True or None, the source is expected to produce an entire batch at once. If set to False, the source is called per-sample.

    Setting parallel to True automatically sets batch to False if it was not provided.

  • batch_info (bool, optional, default = False) – Controls if a callable source that accepts an argument and returns batches should receive class:~nvidia.dali.types.BatchInfo instance or just an integer representing the iteration number. If set to False (the default), only the integer is passed. If source is not callable, does not accept arguments or batch is set to False, setting this flag has no effect.

  • parallel (bool, optional, default = False) –

    If set to True, the corresponding pipeline will start a pool of Python workers to run the callback in parallel. You can specify the number of workers by passing py_num_workers into pipeline’s constructor.

    When parallel is set to True, samples returned by source must be NumPy/MXNet/PyTorch CPU arrays or TensorCPU instances.


    Acceptable sources depend on the value specified for batch parameter.

    If batch is set to False, the source must be:

    • a callable (a function or an object with __call__ method) that accepts

      exactly one argument (SampleInfo instance that represents the index of the requested sample).

    If batch is set to True, the source can be either:

    • a callable that accepts exactly one argument (either BatchInfo

      instance or an integer - see batch_info for details)

    • an iterable,

    • a generator function.


    Warning

    Irrespective of batch value, callables should be stateless - they should produce requested sample or batch solely based on the SampleInfo/BatchInfo instance or index in batch, so that they can be run in parallel in a number of workers.

    The source callback must raise a StopIteration when the end of the data is reached. Note, that due to prefetching, the callback may be invoked with a few iterations past the end of dataset - make sure it consistently raises a StopIteration in that case.


    Warning

    When the pipeline has conditional execution enabled, additional steps must be taken to prevent the source from being rewritten by AutoGraph. There are two ways to achieve this:

    1. Define the function at global scope (i.e. outside of pipeline_def scope).

    2. If function is a result of another “factory” function, then the factory function must be defined outside pipeline definition function and decorated with @do_not_convert.

    More details can be found in @do_not_convert documentation.


    Note

    Callable source can be run in parallel by multiple workers. For batch=True multiple batches can be prepared in parallel, with batch=False it is possible to parallelize computation within the batch.

    When batch=True, callables performance might especially benefit from increasing prefetch_queue_depth so that a few next batches can be computed in parallel.


    Note

    Iterator or generator function will be assigned to a single worker that will iterate over them. The main advantage is execution in parallel to the main Python process, but due to their state it is not possible to calculate more than one batch at a time.

  • repeat_last (bool, optional, default = False) –

    Note

    This is an advanced setting that is usable mainly with Triton Inference Server with decoupled models.

    Normally, external_source consumes its input data and expects new ones to be fed in the upcoming iteration. Setting repeat_last=True changes this behavior so that external_source will detect that no new data was fed between the previous pipeline run and the current one and will self-refeed with the most recent data.

    Setting repeat_last to True only makes sense in “push” mode, i.e. when the data is actively provided by the user via a call to feed_input. Enabling this option is incompatible with specifying the source, which makes the external_source operate in “pull” mode.

  • prefetch_queue_depth (int, optional, default = 1) – When run in parallel=True mode, specifies the number of batches to be computed in advance and stored in the internal buffer, otherwise parameter is ignored.

  • bytes_per_sample_hint (int, optional, default = None) –

    If specified in parallel=True mode, the value serves as a hint when calculating initial capacity of shared memory slots used by the worker processes to pass parallel external source outputs to the pipeline. The argument is ignored in non-parallel mode.

    Setting a value large enough to accommodate the incoming data can prevent DALI from reallocation of shared memory during the pipeline’s run. Furthermore, providing the hint manually can prevent DALI from overestimating the necessary shared memory capacity.

    The value must be a positive integer. Please note that the samples in shared memory are accompanied by some internal meta-data, thus, the actual demand for the shared memory is slightly higher than just the size of binary data produced by the external source. The actual meta-data size depends on the number of factors and, for example, may change between Python or DALI releases without notice.

    Please refer to pipeline’s external_source_shm_statistics for inspecting how much shared memory is allocated for data produced by the pipeline’s parallel external sources.

class nvidia.dali.ops.FastResizeCropMirror(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use ResizeCropMirror() instead.

Legacy alias for ResizedCropMirror, with antialiasing disabled by default.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • antialias (bool, optional, default = False) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mirror (int or TensorList of int, optional, default = 0) –

    Mask for flipping

    Supported values:

    • 0 - No flip

    • 1 - Horizontal flip

    • 2 - Vertical flip

    • 4 - Depthwise flip

    • any bitwise combination of the above

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • rounding (str, optional, default = ‘round’) –

    Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

    Possible values are:

    • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
    • "truncate" - Discards the fractional part of the number (truncates towards zero).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'CHW', 'FCHW', 'CFHW', 'DHWC', 'FDHWC', 'CDHW', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.FileReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.File() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.file().

class nvidia.dali.ops.Flip(*, device='cpu', **kwargs)#

Flips the images in selected dimensions (horizontal, vertical, and depthwise).

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • depthwise (int or TensorList of int, optional, default = 0) – Flip the depthwise dimension.

  • horizontal (int or TensorList of int, optional, default = 1) – Flip the horizontal dimension.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • vertical (int or TensorList of int, optional, default = 0) – Flip the vertical dimension.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('FDHWC', 'FHWC', 'DHWC', 'HWC', 'FCDHW', 'FCHW', 'CDHW', 'CHW')) – Input to the operator.

class nvidia.dali.ops.Full(*, device='cpu', **kwargs)#

Returns new data of given shape and type, filled with a fill value.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(fill_value, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__fill_value (TensorList) – The fill value.

class nvidia.dali.ops.FullLike(*, device='cpu', **kwargs)#

Returns new data with the same shape and type as the input data, filled with a fill_value.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data_like, fill_value, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data_like (TensorList) – The input data value to copy the shape and type from.

  • __fill_value (TensorList) – The fill value.

class nvidia.dali.ops.GaussianBlur(*, device='cpu', **kwargs)#

Applies a Gaussian Blur to the input.

Gaussian blur is calculated by applying a convolution with a Gaussian kernel, which can be parameterized with windows_size and sigma. If only the sigma is specified, the radius of the Gaussian kernel defaults to ceil(3 * sigma), so the kernel window size is 2 * ceil(3 * sigma) + 1.

If only the window size is provided, the sigma is calculated by using the following formula:

radius = (window_size - 1) / 2
sigma = (radius - 1) * 0.3 + 0.8

The sigma and kernel window size can be specified as one value for all data axes or a value per data axis.

When specifying the sigma or window size per axis, the axes are provided same as layouts, from outermost to innermost.

Note

The channel C and frame F dimensions are not considered data axes. If channels are present, only channel-first or channel-last inputs are supported.

For example, with HWC input, you can provide sigma=1.0 or sigma=(1.0, 2.0) because there are two data axes, H and W.

The same input can be provided as per-sample tensors.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Supported type: FLOAT. If not set, the input type is used.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • sigma (float or list of float or TensorList of float, optional, default = [0.0]) –

    Sigma value for the Gaussian Kernel.

    Supports per-frame inputs.

  • window_size (int or list of int or TensorList of int, optional, default = [0]) –

    The diameter of the kernel.

    Supports per-frame inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.GetProperty(*, device='cpu', **kwargs)#

Returns a property of the tensor passed as an input.

The type of the output will depend on the key of the requested property.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • key (str) –

    Specifies, which property is requested.

    The following properties are supported:

    • "source_info": Returned type: byte-array.

      String-like byte array, which contains information about the origin of the sample. Fox example, fn.get_property() called on tensor loaded via fn.readers.file() returns full path of the file, from which the tensor comes from.

    • "layout": Returned type: byte-array

      Data layout in the given Tensor.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.GridMask(*, device='cpu', **kwargs)#

Performs the gridmask augmentation (https://arxiv.org/abs/2001.04086).

Zeroes out pixels of an image in a grid-like fashion. The grid consists of squares repeating in x and y directions, with the same spacing in both directions. Can be rotated around the origin.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • angle (float or TensorList of float, optional, default = 0.0) – Angle, in radians, by which the grid is rotated.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • ratio (float or TensorList of float, optional, default = 0.5) – The ratio between black square width and tile width.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shift_x (float or TensorList of float, optional, default = 0.0) – The x component of the translation vector, applied after rotation.

  • shift_y (float or TensorList of float, optional, default = 0.0) – The y component of the translation vector, applied after rotation.

  • tile (int or TensorList of int, optional, default = 100) – The length of a single tile, which is equal to width of black squares plus the spacing between them.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Hsv(*, device='cpu', **kwargs)#

Adjusts hue, saturation and value (brightness) of the images.

To change the hue, the saturation, and/or the value of the image, pass the corresponding coefficients. Remember that the hue is an additive delta argument, while for saturation and value, the arguments are multiplicative.

This operator accepts images in the RGB color space.

For performance reasons, the operation is approximated by a linear transform in the RGB space. The color vector is projected along the neutral (gray) axis, rotated based on the hue delta, scaled based on the value and saturation multipliers, and restored to the original color space.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    The output data type.

    If a value is not set, the input type is used.

  • hue (float or TensorList of float, optional, default = 0.0) –

    Hue delta, in degrees.

    The hue component can be interpreted as an angle and values outside 0-360 range wrap around, as they would in case of rotation.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • saturation (float or TensorList of float, optional, default = 1.0) –

    The saturation multiplier.

    Supports per-frame inputs.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • value (float or TensorList of float, optional, default = 1.0) –

    The value multiplier.

    Supports per-frame inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'DHWC')) – Input to the operator.

class nvidia.dali.ops.Hue(*, device='cpu', **kwargs)#

Changes the hue level of the image.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • hue (float or TensorList of float, optional, default = 0.0) –

    The hue change in degrees.

    Supports per-frame inputs.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'DHWC')) – Input to the operator.

class nvidia.dali.ops.ImageDecoder(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use decoders.Image() instead.

In DALI 1.0 all decoders were moved into a dedicated decoders submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for decoders.image().

class nvidia.dali.ops.ImageDecoderCrop(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use decoders.ImageCrop() instead.

In DALI 1.0 all decoders were moved into a dedicated decoders submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for decoders.image_crop().

class nvidia.dali.ops.ImageDecoderRandomCrop(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use decoders.ImageRandomCrop() instead.

In DALI 1.0 all decoders were moved into a dedicated decoders submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for decoders.image_random_crop().

class nvidia.dali.ops.ImageDecoderSlice(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use decoders.ImageSlice() instead.

In DALI 1.0 all decoders were moved into a dedicated decoders submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for decoders.image_slice().

class nvidia.dali.ops.Jitter(*, device='cpu', **kwargs)#

Performs a random Jitter augmentation.

The output images are produced by moving each pixel by a random amount, in the x and y dimensions, and bounded by half of the nDegree parameter.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • fill_value (float, optional, default = 0.0) – Color value that is used for padding.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or TensorList of int, optional, default = 1) –

    Determines whether to apply this augmentation to the input image.

    Here are the values:

    • 0: Do not apply this transformation.

    • 1: Apply this transformation.

  • nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in the [-nDegree/2, nDegree/2] range

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.JpegCompressionDistortion(*, device='cpu', **kwargs)#

Introduces JPEG compression artifacts to RGB images.

JPEG is a lossy compression format which exploits characteristics of natural images and human visual system to achieve high compression ratios. The information loss originates from sampling the color information at a lower spatial resolution than the brightness and from representing high frequency components of the image with a lower effective bit depth. The conversion to frequency domain and quantization is applied independently to 8x8 pixel blocks, which introduces additional artifacts at block boundaries.

This operation produces images by subjecting the input to a transformation that mimics JPEG compression with given quality factor followed by decompression.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • quality (int or TensorList of int, optional, default = 50) –

    JPEG compression quality from 1 (lowest quality) to 100 (highest quality).

    Any values outside the range 1-100 will be clamped.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC')) – Input to the operator.

class nvidia.dali.ops.Laplacian(*, device='cpu', **kwargs)#

Computes the Laplacian of an input.

The Laplacian is calculated as the sum of second order partial derivatives with respect to each spatial dimension. Each partial derivative is approximated with a separable convolution, that uses a derivative window in the direction of the partial derivative and smoothing windows in the remaining axes.

By default, each partial derivative is approximated by convolving along all spatial axes: the axis in partial derivative direction uses derivative window of window_size and the remaining axes are convolved with smoothing windows of the same size. If smoothing_size is specified, the smoothing windows applied to a given axis can have different size than the derivative window. Specifying smoothing_size = 1 implies no smoothing in axes perpendicular to the derivative direction.

Both window_size and smoothing_size can be specified as a single value or per axis. For example, for volumetric input, if window_size=[dz, dy, dx] and smoothing_size=[sz, sy, sx] are specified, the following windows will be used:

  • for partial derivative in z direction: derivative windows of size dz along z axis, and smoothing windows of size sy and sx along y and x respectively.

  • for partial derivative in y direction: derivative windows of size dy along y axis, and smoothing windows of size sz and sx along z and x respectively.

  • for partial derivative in x direction: derivative windows of size dx along x axis, and smoothing windows of size sz and sy along z and y respectively.

Window sizes and smoothing sizes must be odd. The size of a derivative window must be at least 3. Smoothing window can be of size 1, which implies no smoothing along corresponding axis.

To normalize partial derivatives, normalized_kernel=True can be used. Each partial derivative is scaled by 2^(-s + n + 2), where s is the sum of the window sizes used to calculate a given partial derivative (including the smoothing windows) and n is the number of data dimensions/axes. Alternatively, you can specify scale argument to customize scaling factors. Scale can be either a single value or n values, one for every partial derivative.

Operator uses 32-bit floats as an intermediate type.

Note

The channel C and frame F dimensions are not considered data axes. If channels are present, only channel-first or channel-last inputs are supported.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Supported type: FLOAT. If not set, the input type is used.

  • normalized_kernel (bool, optional, default = False) – If set to True, automatically scales partial derivatives kernels. Must be False if scale is specified.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • scale (float or list of float or TensorList of float, optional, default = [1.0]) –

    Factors to manually scale partial derivatives.

    Supports per-frame inputs.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • smoothing_size (int or list of int or TensorList of int, optional) –

    Size of smoothing window used in convolutions.

    Smoothing size must be odd and between 1 and 23.

    Supports per-frame inputs.

  • window_size (int or list of int or TensorList of int, optional, default = [3]) –

    Size of derivative window used in convolutions.

    Window size must be odd and between 3 and 23.

    Supports per-frame inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.LookupTable(*, device='cpu', **kwargs)#

Maps the input to output by using a lookup table that is specified by keys and values, and a default_value for unspecified keys.

For example when keys and values are used to define the lookup table in the following way:

keys[] =   {0,     2,   3,   4,   5,    3}
values[] = {0.2, 0.4, 0.5, 0.6, 0.7, 0.10}
default_value = 0.99

0 <= i < max(keys)
lut[i] = values[keys.index[i]]   if i in keys
lut[i] = default_value           otherwise

the operator creates the following table:

lut[] = {0.2, 0.99, 0.4, 0.10, 0.6, 0.7}  // only last occurrence of a key is considered

and produces the output according to this formula:

Output[i] = lut[Input[i]]   if 0 <= Input[i] <= len(lut)
Output[i] = default_value   otherwise

Here is a practical example, considering the table defined above:

Input[] =  {1,      4,    1,   0,  100,   2,     3,   4}
Output[] = {0.99, 0.6, 0.99, 0.2, 0.99, 0.4,  0.10, 0.6}

Note

Only integer types can be used as inputs for this operator.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • default_value (float, optional, default = 0.0) – Default output value for keys that are not present in the table.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.

  • keys (int or list of int, optional, default = []) –

    A list of input values (keys) in the lookup table.

    The length of keys and values argument must match. The values in keys should be in the [0, 65535 ] range.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • values (float or list of float, optional, default = []) –

    A list of mapped output values for each keys entry.

    The length of the keys and the values argument must match.

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.MFCC(*, device='cpu', **kwargs)#

Computes Mel Frequency Cepstral Coefficients (MFCC) from a mel spectrogram.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axis (int, optional, default = 0) –

    Axis over which the transform will be applied.

    If a value is not provided, the outer-most dimension will be used.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dct_type (int, optional, default = 2) –

    Discrete Cosine Transform type.

    The supported types are 1, 2, 3, 4. The formulas that are used to calculate the DCT are equivalent to those described in https://en.wikipedia.org/wiki/Discrete_cosine_transform (the numbers correspond to types listed in https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition).

  • lifter (float, optional, default = 0.0) –

    Cepstral filtering coefficient, which is also known as the liftering coefficient.

    If the lifter coefficient is greater than 0, the MFCCs will be scaled based on the following formula:

    MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)
    

  • n_mfcc (int, optional, default = 20) – Number of MFCC coefficients.

  • normalize (bool, optional, default = False) –

    If set to True, the DCT uses an ortho-normal basis.

    Note

    Normalization is not supported when dct_type=1.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.MXNetReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.MXNet() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.mxnet().

class nvidia.dali.ops.MelFilterBank(*, device='cpu', **kwargs)#

Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.

The frequency (‘f’) dimension is selected from the input layout. In case of no layout, “f”, “ft”, or “*ft” is assumed, depending on the number of dimensions.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • freq_high (float, optional, default = 0.0) –

    The maximum frequency.

    If this value is not provided, sample_rate/2 is used.

  • freq_low (float, optional, default = 0.0) – The minimum frequency.

  • mel_formula (str, optional, default = ‘slaney’) –

    Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.

    The mel scale is a perceptual scale of pitches, so there is no single formula.

    The supported values are:

    • slaney, which follows Slaney’s MATLAB Auditory Modelling Work behavior.
      This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.
    • htk, which follows O’Shaughnessy’s book formula, m = 2595 * log10(1 + (f/700)).
      This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).

  • nfilter (int, optional, default = 128) – Number of mel filters.

  • normalize (bool, optional, default = True) –

    Determines whether to normalize the triangular filter weights by the width of their frequency bands.

    • If set to True, the integral of the filter function is 1.

    • If set to False, the peak of the filter function will be 1.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.MultiPaste(*, device='cpu', **kwargs)#

Performs multiple pastes from image batch to each of the outputs.

If the in_ids is specified, the operator expects exactly one input batch. In that case, for each output sample, in_ids describes which samples from the input batch should be pasted to the corresponding sample in the output batch.

If the in_ids argument is omitted, the operator accepts multiple inputs. In that case, the i-th sample from each input batch will be pasted to the i-th sample of the output batch. All the input batches must have the same type and device placement.

If the input shapes are uniform and no explicit output_size is provided, the operator assumes the same output shape (the output canvas size). Otherwise, the output_size must be specified.

This operator can also change the type of data.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) – Output data type. If not set, the input type is used.

  • in_anchors (int or TensorList of int, optional) –

    Absolute coordinates of the top-left corner of the source region.

    The anchors are represented as 2D tensors where the first dimension is equal to the number of pasted regions and the second one is 2 (for the H and W extents).

    If neither in_anchors nor in_anchors_rel are provided, all anchors are zero.

    Supports per-frame inputs.

  • in_anchors_rel (float or TensorList of float, optional) –

    Relative coordinates of the top-left corner of the source region.

    The argument works like in_anchors, but the values should be floats in [0, 1] range, describing the anchor placement relative to the input sample shape.

    Supports per-frame inputs.

  • in_ids (int or list of int or TensorList of int, optional) –

    Indices of the inputs to paste data from.

    If specified, the operator accepts exactly one batch as an input.

  • out_anchors (int or TensorList of int, optional) –

    Absolute coordinates of the top-left corner of the pasted region in the output canvas.

    The anchors are represented as 2D tensors where the first dimension is equal to the number of pasted regions and the second one is 2 (for the H and W extents).

    If neither out_anchors nor out_anchors_rel are provided, all anchors are zero, making all the pasted regions start at the top-left corner of the output canvas.

    Supports per-frame inputs.

  • out_anchors_rel (float or TensorList of float, optional) –

    Relative coordinates of the top-left corner of the pasted region in the output canvas.

    Works like out_anchors argument, but the values should be floats in [0, 1] range, describing the top-left corner of the pasted region relative to the output canvas size.

    Supports per-frame inputs.

  • output_size (int or list of int or TensorList of int, optional) –

    A tuple (H, W) describing the output shape (i.e. the size of the canvas for the output pastes).

    Can be omitted if the operator is run with inputs of uniform shape. In that case, the same shape is used as the canvas size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shapes (int or TensorList of int, optional) –

    Shape of the paste regions.

    The shapes are represented as 2D tensors where the first dimension is equal to the number of pasted regions and the second one is 2 (for the H and W extents).

    If neither shapes nor shapes_rel are provided, the shape is calculated so that the region spans from the input anchor until the end of the input image.

    Supports per-frame inputs.

  • shapes_rel (float or TensorList of float, optional) –

    Relative shape of the paste regions.

    Works like shape argument, but the values should be floats in [0, 1] range, describing the paste region shape relative to the input shape.

    Supports per-frame inputs.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.MultiPaste() class for complete information.

class nvidia.dali.ops.NemoAsrReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.NemoAsr() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.nemo_asr().

class nvidia.dali.ops.NonsilentRegion(*, device='cpu', **kwargs)#

Performs leading and trailing silence detection in an audio buffer.

The operator returns the beginning and length of the non-silent region by comparing the short term power calculated for window_length of the signal with a silence cut-off threshold. The signal is considered to be silent when the short_term_power_db is less than the cutoff_db. where:

short_term_power_db = 10 * log10( short_term_power / reference_power )

Unless specified otherwise, reference_power is the maximum power of the signal.

Inputs and outputs:

  • Input 0 - 1D audio buffer.

  • Output 0 - Index of the first sample in the nonsilent region.

  • Output 1 - Length of nonsilent region.

Note

If Outputs[1] == 0, the value in Outputs[0] is undefined.

Warning

At this moment, the ‘gpu’ backend of this operator is implemented in terms of the ‘cpu’ implementation. This results in a device-to-host copy of the inputs and a host-to-device copy of the outputs. While using the ‘gpu’ implementation of this operator doesn’t add any performance benefit on its own, using it might make sense in order to enable moving preceding operations in the pipeline to the GPU.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cutoff_db (float, optional, default = -60.0) – The threshold, in dB, below which the signal is considered silent.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reference_power (float, optional, default = 0.0) –

    The reference power that is used to convert the signal to dB.

    If a value is not provided, the maximum power of the signal will be used as the reference.

  • reset_interval (int, optional, default = 8192) –

    The number of samples after which the moving mean average is recalculated to avoid loss of precision.

    If reset_interval == -1, or the input type allows exact calculation, the average will not be reset. The default value can be used for most of the use cases.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • window_length (int, optional, default = 2048) – Size of the sliding window used to calculate of the short-term power of the signal.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.NormalDistribution(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use random.Normal() instead.

Generates random numbers following a normal distribution.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • mean (float or TensorList of float, optional, default = 0.0) – Mean of the distribution.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

  • stddev (float or TensorList of float, optional, default = 1.0) – Standard deviation of the distribution.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.Normalize(*, device='cpu', **kwargs)#

Normalizes the input by removing the mean and dividing by the standard deviation.

The mean and standard deviation can be calculated internally for the specified subset of axes or can be externally provided as the mean and stddev arguments.

The normalization is done following the formula:

out = scale * (in - mean) / stddev + shift

The formula assumes that out and in are equally shaped tensors, but mean and stddev might be either tensors of same shape, scalars, or a mix of these.

Note

The expression follows the numpy broadcasting rules.

Sizes of the non-scalar mean and stddev must have an extent of 1, if given axis is reduced, or match the corresponding extent of the input. A dimension is considered reduced if it is listed in axes or axis_names. If neither the axes nor the axis_names argument is present, the set of reduced axes is inferred by comparing the input shape to the shape of the mean/stddev arguments, but the set of reduced axes must be the same for all tensors in the batch.

Here are some examples of valid argument combinations:

  1. Per-sample normalization of dimensions 0 and 2:

    axes = 0,2                                        # optional
    input.shape = [ [480, 640, 3], [1080, 1920, 4] ]
    batch = False
    mean.shape =  [ [1, 640, 1], [1, 1920, 1] ]
    stddev = (not supplied)
    

With these shapes, batch normalization is not possible, because the non-reduced dimension has a different extent across samples.

  1. Batch normalization of dimensions 0 and 1:

    axes = 0,1                                        # optional
    input.shape = [ [480, 640, 3], [1080, 1920, 3] ]
    batch = True
    mean = (scalar)
    stddev.shape =  [ [1, 1, 3] ] ]
    

For color images, this example normalizes the 3 color channels separately, but across all samples in the batch.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional, default = []) –

    Indices of dimensions along which the input is normalized.

    By default, all axes are used, and the axes can also be specified by name. See axis_names for more information.

  • axis_names (layout str, optional, default = ‘’) –

    Names of the axes in the input.

    Axis indices are taken from the input layout, and this argument cannot be used with axes.

  • batch (bool, optional, default = False) –

    If set to True, the mean and standard deviation are calculated across tensors in the batch.

    This argument also requires that the input sample shapes in the non-reduced axes match.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • ddof (int, optional, default = 0) –

    Delta Degrees of Freedom for Bessel’s correction.

    The variance is estimated by using the following formula:

    sum(Xi - mean)**2 / (N - ddof).
    

    This argument is ignored when an externally supplied standard deviation is used.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Output data type.

    When using integral types, use shift and scale to improve the usage of the output type’s dynamic range. If dtype is an integral type, out of range values are clamped, and non-integer values are rounded to nearest integer.

  • epsilon (float, optional, default = 0.0) – A value that is added to the variance to avoid division by small numbers.

  • mean (float or TensorList of float, optional) –

    Mean value to be subtracted from the data.

    The value can be a scalar or a batch of tensors with the same dimensionality as the input. The extent in each dimension must match the value of the input or be equal to 1. If the extent is 1, the value will be broadcast in this dimension. If the value is not specified, the mean is calculated from the input. A non-scalar mean cannot be used when batch argument is set to True.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • scale (float, optional, default = 1.0) –

    The scaling factor applied to the output.

    This argument is useful for integral output types.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shift (float, optional, default = 0.0) –

    The value to which the mean will map in the output.

    This argument is useful for unsigned output types.

  • stddev (float or TensorList of float, optional) –

    Standard deviation value to scale the data.

    See mean argument for more information about shape constraints. If a value is not specified, the standard deviation is calculated from the input. A non-scalar stddev cannot be used when batch argument is set to True.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.NumbaFunction(*, device='cpu', **kwargs)#

Invokes a njit compiled Numba function.

The run function should be a Python function that can be compiled in Numba nopython mode. A function taking a single input and producing a single output should follow the following definition:

def run_fn(out0, in0)

where out0 and in0 are numpy array views of the input and output tensors. If the operator is configured to run in batch mode, then the first dimension of the arrays is the sample index.

Note that the function can take at most 6 inputs and 6 outputs.

Additionally, an optional setup function calculating the shape of the output so DALI can allocate memory for the output with the following definition:

def setup_fn(outs, ins)

The setup function is invoked once for the whole batch. The first dimension of outs, ins is the number of outputs/inputs, respectively. The second dimension is the sample index. For example, the first sample on the second output can be accessed by outs[1][0].

If no setup function provided, the output shape and data type will be the same as the input.

Note

This operator is experimental and its API might change without notice.

Warning

When the pipeline has conditional execution enabled, additional steps must be taken to prevent the run_fn and setup_fn functions from being rewritten by AutoGraph. There are two ways to achieve this:

  1. Define the functions at global scope (i.e. outside of pipeline_def scope).

  2. If functions are a result of another “factory” function, then the factory function must be defined outside pipeline definition function and decorated with @do_not_convert.

More details can be found in @do_not_convert documentation.

Example 1:

The following example shows a simple setup function which permutes the order of dimensions in the shape.

def setup_change_out_shape(outs, ins):
    out0 = outs[0]
    in0 = ins[0]
    perm = [1, 0, 2]
    for sample_idx in range(len(out0)):
        for d in range(len(perm)):
            out0[sample_idx][d] = in0[sample_idx][perm[d]]

Since the setup function is running for the whole batch, we need to iterate and permute each sample’s shape individually. For shapes = [(10, 20, 30), (20, 10, 30)] it will produce output with shapes = [(20, 10, 30), (10, 20, 30)].

Also lets provide run function:

def run_fn(out0, in0):
    for i in range(in0.shape[0]):
        for j in range(in0.shape[1]):
            out0[j, i] = in0[i, j]

The run function can work per-sample or per-batch, depending on the batch_processing argument.

A run function working per-batch may look like this:

def run_fn(out0_samples, in0_samples):
    for out0, in0 in zip(out0_samples, in0_samples):
        for i in range(in0.shape[0]):
            for j in range(in0.shape[1]):
                out0[j, i] = in0[i, j]

A run function working per-sample may look like this:

def run_fn(out0, in0):
    for i in range(in0.shape[0]):
        for j in range(in0.shape[1]):
            out0[j, i] = in0[i, j]

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • in_types (DALIDataType or list of DALIDataType) – Types of inputs.

  • ins_ndim (int or list of int) – Number of dimensions which inputs shapes should have.

  • out_types (DALIDataType or list of DALIDataType) – Types of outputs.

  • outs_ndim (int or list of int) – Number of dimensions which outputs shapes should have.

  • run_fn (object) – Function to be invoked. This function must work in Numba nopython mode.

  • batch_processing (bool, optional, default = False) –

    Determines whether the function is invoked once per batch or separately for each sample in the batch.

    When batch_processing is set to True, the function processes the whole batch. It is necessary if the function has to perform cross-sample operations and may be beneficial if significant part of the work can be reused. For other use cases, specifying False and using per-sample processing function allows the operator to process samples in parallel.

  • blocks (int or list of int, optional) –

    3-item list specifying the number of blocks per grid used to

    execute a CUDA kernel

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • setup_fn (object, optional) – Setup function setting shapes for outputs. This function is invoked once per batch. Also this function must work in Numba nopython mode.

  • threads_per_block (int or list of int, optional) –

    3-item list specifying the number of threads per

    block used to execute a CUDA kernel

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.NumbaFunction() class for complete information.

class nvidia.dali.ops.NumpyReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.Numpy() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.numpy().

class nvidia.dali.ops.OneHot(*, device='cpu', **kwargs)#

Produces a one-hot encoding of the input.

Adds a new axis or converts scalar input into an axis of num_classes elements.

For given input coordinate (x0, x1, ..., xn), and axis = k, the output sample is specified as:

cls = input[x0, x1, ..., xn]
output[x0, x1, ..., xk-1, i, xk, ..., xn] = on_value if i == cls else off_value

for all i in range [0, num_classes).

For scalars, the output is set to on_value at the index taken from input and off_value elsewhere:

output[i] = on_value if i == input else off_value

For backward compatibility, any input in which all tensors have only one element (regardless of the number of dimensions) is considered scalar. Legacy interpretation of tensors as scalars is not supported if axis argument is specified.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axis (int, optional, default = -1) – Dimension to place the one-hot encoding axis of num_classes size. By default it’s appended as the last dimension for non-scalar inputs. For scalar inputs, it becomes the only dimension.

  • axis_name (str, optional) – Single character that will be used as a name for the newly added dimension in the output layout. If no character is provided, the output layout will be empty.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.

  • num_classes (int, optional, default = 0) – Number of all classes in the data.

  • off_value (float, optional, default = 0.0) –

    Value that will be used to fill the output to indicate the lack of given class in the corresponding input coordinate.

    This value will be cast to the dtype type.

  • on_value (float, optional, default = 1.0) –

    Value that will be used to fill the output to indicate given class in the corresponding input coordinate.

    This value will be cast to the dtype type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Ones(*, device='cpu', **kwargs)#

Returns new data of given shape and type, filled with ones.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT32) – Output data type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.OnesLike(*, device='cpu', **kwargs)#

Returns new data with the same shape and type as the input array, filled with ones.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT32) – Overrides the output data type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data_like, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__data_like (TensorList) – The input data value to copy the shape and type from.

class nvidia.dali.ops.OpticalFlow(*, device='cpu', **kwargs)#

Calculates the optical flow between images in the input.

The main input for this operator is a sequence of frames. Optionally, the operator can be provided with external hints for the optical flow calculation. The output format of this operator matches the output format of the optical flow driver API. Refer to https://developer.nvidia.com/opticalflow-sdk for more information about the Turing, Ampere and Hopper optical flow hardware that is used by DALI.

Note

The calculated optical flow is always with respect to the resolution of the input image, however the output optical flow image can be a lower resolution, dictated by output_grid. If instead you would like the optical flow vectors be consistent with the resolution of the output of this operator, you must divide the output vector field by output_grid.

This operator allows sequence inputs.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • enable_external_hints (bool, optional, default = False) –

    Enables or disables the external hints for optical flow calculations.

    External hints are analogous to temporal hints, but the only difference is that external hints come from an external source. When this option is enabled, the operator requires two inputs.

  • enable_temporal_hints (bool, optional, default = False) –

    Enables or disables temporal hints for sequences that are longer than two images.

    The hints are used to improve the quality of the output motion field as well as to speed up the calculations. The hints are especially useful in presence of large displacements or periodic patterns which might confuse the optical flow algorithms. )

  • hint_grid (int, optional, default = 4) –

    Sets the grid size for the hint vector field.

    The hints are used to improve the quality of the output motion field as well as to speed up the calculations. The grid resolution could be set to a different value than the output.

    Note

    Currently, only a 1, 2, 4 and 8 are supported for Ampere and 4 for Turing.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – Input color space (RGB, BGR or GRAY).

  • output_grid (int, optional, default = 4) –

    Sets the grid size for the output vector field.

    This operator produces the motion vector field at a coarser resolution than the input pixels. This parameter specifies the size of the pixel grid cell corresponding to one motion vector. For example, a value of 4 will produce one motion vector for each 4x4 pixel block. Hence, to use optical flow with an output_grid of 4 to resample a full resolution image, the flow field is upsampled without scaling the vector quantities.

    Note

    Currently, only a 1, 2 and 4 are supported for Ampere and 4 for Turing.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • preset (float, optional, default = 0.0) –

    Speed and quality level of the optical flow calculation.

    Allowed values are:

    • 0.0 is the lowest speed and the best quality.

    • 0.5 is the medium speed and quality.

    • 1.0 is the fastest speed and the lowest quality.

    The lower the speed, the more additional pre- and postprocessing is used to enhance the quality of the optical flow result.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • output_format (int) –

    Warning

    The argument output_format is a deprecated alias for output_grid. Use output_grid instead.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.OpticalFlow() class for complete information.

class nvidia.dali.ops.Pad(*, device='cpu', **kwargs)#

Pads all samples with the fill_value in the specified axes to match the biggest extent in the batch for those axes or to match the minimum shape specified.

Here are a few examples:

  • 1-D samples, fill_value = -1, axes = (0,)

The samples are padded in the first axis to match the extent of the largest sample.

input  = [[3,   4,   2,   5,   4],
          [2,   2],
          [3, 199,   5]];
output = [[3,   4,   2,   5,   4],
          [2,   2,  -1,  -1,  -1],
          [3, 199,   5,  -1,  -1]]
  • 1-D samples, fill_value = -1, axes = (0,), shape = (7,)

The samples are padded in the first axis to a minimum extent of 7.

input  = [[3,   4,   2,   5,   4],
          [2,   2],
          [3, 199,   5],
          [1,   2,   3,   4,   5,   6,   7,   8]];
output = [[3,   4,   2,   5,   4,  -1,  -1],
          [2,   2,  -1,  -1,  -1,  -1,  -1],
          [3, 199,   5,  -1,  -1,  -1,  -1],
          [1,   2,   3,   4,   5,   6,   7,   8]]
  • 1-D samples, fill_value = -1, axes = (0,), align = (4,)

The samples are padded in the first axis to match the extent of the largest sample and the alignment requirements. The output extent is 8, which is a result of rounding up the largest extent (5) to a multiple of alignment (4).

input  = [[3,   4,   2,   5,   4],
          [2,   2],
          [3, 199,   5]];
output = [[3,   4,   2,   5,   4,  -1,  -1,  -1],
          [2,   2,  -1,  -1,  -1,  -1,  -1,  -1],
          [3, 199,   5,  -1,  -1,  -1,  -1,  -1]]
  • 1-D samples, fill_value = -1, axes = (0,), shape = (1,), align = (2,)

The samples are padded in the first axis to match the alignments requirements only. The minimum extent (shape) is set to 1 to avoid any padding other than the necessary for alignment.

input  = [[3,   4,   2,   5,   4],
          [2,   2],
          [3, 199,   5]];
output = [[3,   4,   2,   5,   4,  -1],
          [2,   2],
          [3, 199,   5,  -1]]
  • 2-D samples, fill_value = 42, axes = (1,)

The samples are padded in the second axis to match the extent of the largest sample and uses a custom fill value 42 instead of the default 0.

input  = [[[1,  2,  3,  4],
           [5,  6,  7,  8]],
          [[1,  2],
           [4,  5]]]
output = [[[1,  2,  3,  4],
           [5,  6,  7,  8]],
          [[1,  2, 42, 42],
           [4,  5, 42, 42]]]
  • 2-D samples, fill_value = 0, axes = (0, 1), align = (4, 5)

The samples are padded in the first and second axes to match the alignment requirements of each axis.

input  = [[[1,  2,  3,  4],
           [5,  6,  7,  8],
           [9, 10, 11, 12]],
          [[1, 2],
           [4, 5]]]
output = [[[1,  2,  3,  4,  0],
           [5,  6,  7,  8,  0],
           [9, 10, 11, 12,  0],
           [0,  0,  0,  0,  0]],
          [[1,  2,  0,  0,  0],
           [4,  5,  0,  0,  0],
           [0,  0,  0,  0,  0],
           [0,  0,  0,  0,  0]]]
  • 2-D samples, fill_value = 0, axes = (0, 1), align = (1, 2), shape = (4, -1)

The samples are padded in the first axis to match a minimum extent of 4, and in the second axis to match the largest sample in the batch and an alignment of 2.

input  = [[[1,  2,  3],
           [4,  5,  6]],
          [[1, 2],
           [4, 5],
           [6, 7]]]
output = [[[1,  2,  3,  0],
           [4,  5,  6,  0],
           [0,  0,  0,  0],
           [0,  0,  0,  0]],
          [[1,  2,  0,  0],
           [4,  5,  0,  0],
           [6,  7,  0,  0],
           [0,  0,  0,  0]]]
Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • align (int or list of int or TensorList of int, optional, default = []) –

    If specified, this argument determines the alignment on the dimensions specified by axes or axis_names.

    The extent on axis = axes[i] will be adjusted to be a multiple of align[i].

    If an integer value is provided, the alignment restrictions are applied to all the padded axes.

    To use alignment only, that is without any default or explicit padding behavior, set the minimum shape to 1 for the specified axis.

  • axes (int or list of int or TensorList of int, optional, default = []) –

    Indices of the axes on which the batch samples will be padded.

    Negative values are interpreted as counting dimensions from the back. Valid range: [-ndim, ndim-1], where ndim is the number of dimensions in the input data.

    The axis_names and axes arguments are mutually exclusive. If axes and axis_names are empty, or have not been provided, the output will be padded on all of the axes.

  • axis_names (layout str, optional, default = ‘’) –

    Names of the axes on which the batch samples will be padded.

    The axis_names and axes arguments are mutually exclusive. If axes and axis_names are empty, or have not been provided, the output will be padded on all of the axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • fill_value (float or TensorList of float, optional, default = 0.0) – The value to pad the batch with.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional, default = []) –

    The extents of the output shape in the axes specified by the axes or axis_names.

    Specifying -1 for an axis restores the default behavior of extending the axis to accommodate the aligned size of the largest sample in the batch.

    If the provided extent is smaller than the one of the samples, padding will be applied only to match the required alignment. For example, to disable padding in an axis, except for the necessary for alignment, you can specify a value of 1.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Paste(*, device='cpu', **kwargs)#

Pastes the input images on a larger canvas, where the canvas size is equal to input size * ratio.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • fill_value (int or list of int) –

    Tuple of the values of the color that is used to fill the canvas.

    The length of the tuple must be equal to n_channels.

  • ratio (float or TensorList of float) – Ratio of canvas size to input size. Must be >= 1.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • min_canvas_size (float or TensorList of float, optional, default = 0.0) – Enforces the minimum paste canvas dimension after scaling the input size by the ratio.

  • n_channels (int, optional, default = 3) – Number of channels in the image.

  • paste_x (float or TensorList of float, optional, default = 0.5) – Horizontal position of the paste in (0.0 - 1.0) image coordinates.

  • paste_y (float or TensorList of float, optional, default = 0.5) – Vertical position of the paste in (0.0 - 1.0) image coordinates.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.PeekImageShape(*, device='cpu', **kwargs)#

Obtains the shape of the encoded image.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type, to which the sizes are converted.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • type (nvidia.dali.types.DALIDataType) –

    Warning

    The argument type is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.PerFrame(*, device='cpu', **kwargs)#

Marks the input tensor as a sequence.

The operator modifies the layout string of the input data to indicate that the batch contains sequences. Only the layout is affected, while the data stays untouched.

The operator can be used to feed per-frame tensor arguments when processing sequences. For example, the following snippet shows how to apply gaussian_blur to a batch of sequences, so that a different sigma is used for each frame in each sequence:

@pipeline_def
def random_per_frame_blur():
  video, _ = fn.readers.video_resize(sequence_length=50, ...)
  sigma = fn.random.uniform(range=[0.5, 5], shape=(50,))
  blurred = fn.gaussian_blur(video, sigma=fn.per_frame(sigma))
  return blurred

Note that the outermost dimension of each tensor from a batch specified as per-frame argument must match the number of frames in the corresponding sequence processed by a given operator. For instance, in the presented example, every sequence in video batch has 50 frames, thus the shape of sigma is (50,).

Please consult documentation of a given argument of a sequence processing operator to find out if it supports per-frame input.

If the input passed to per-frame operator has no layout, a new layout is set, that starts with F and is padded with * to match dimensionality of the input. Otherwise, depending on the replace flag, the operator either checks if the first character of the layout is equal to F or replaces the character with F.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • replace (bool, optional, default = False) – Controls handling of the input with already specified layout. If set to False, the operator errors-out if the first character of the layout is not F. If set to True, the first character of the layout is replaced with F.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.PermuteBatch(*, device='cpu', **kwargs)#

Returns a batch of tensors constructed by selecting tensors from the input based on indices given in indices argument:

out_tensor[i] = in_tensor[indices[i]]
Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • indices (int or list of int or TensorList of int) –

    List of indices, matching current batch size, or a batch of scalars representing indices of the tensors in the input batch.

    The indices must be within [0..batch_size) range. Repetitions and omissions are allowed.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.PowerSpectrum(*, device='cpu', **kwargs)#

Calculates power spectrum of the signal.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • axis (int, optional, default = -1) –

    Index of the dimension to be transformed to the frequency domain.

    By default, the last dimension is selected.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • nfft (int, optional) –

    Size of the FFT.

    By default, the nfft is selected to match the length of the data in the transformation axis.

    The number of bins that are created in the output is calculated with the following formula:

    nfft // 2 + 1
    

    Note

    The output only represents the positive part of the spectrum.

  • power (int, optional, default = 2) –

    Exponent of the FFT magnitude.

    The supported values are:

    • 2 for power spectrum (real*real + imag*imag)

    • 1 for the complex magnitude (sqrt(real*real + imag*imag)).

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.PreemphasisFilter(*, device='cpu', **kwargs)#

Applies preemphasis filter to the input data.

This filter, in simple form, can be expressed by the formula:

Y[t] = X[t] - coeff * X[t-1]    if t > 1
Y[t] = X[t] - coeff * X_border  if t == 0

with X and Y being the input and output signal, respectively.

The value of X_border depends on the border argument:

X_border = 0                    if border_type == 'zero'
X_border = X[0]                 if border_type == 'clamp'
X_border = X[1]                 if border_type == 'reflect'
Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • border (str, optional, default = ‘clamp’) – Border value policy. Possible values are "zero", "clamp", "reflect".

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Data type for the output.

  • preemph_coeff (float or TensorList of float, optional, default = 0.97) – Preemphasis coefficient coeff.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.PythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)#

Executes a Python function.

This operator can be used to execute custom Python code in the DALI pipeline. The function receives the data from DALI as NumPy arrays in case of CPU operators or as CuPy arrays for GPU operators. It is expected to return the results in the same format. For a more universal data format, see nvidia.dali.fn.dl_tensor_python_function(). The function should not modify input tensors.

Warning

This operator is not compatible with TensorFlow integration.

Warning

When the pipeline has conditional execution enabled, additional steps must be taken to prevent the function from being rewritten by AutoGraph. There are two ways to achieve this:

  1. Define the function at global scope (i.e. outside of pipeline_def scope).

  2. If function is a result of another “factory” function, then the factory function must be defined outside pipeline definition function and decorated with @do_not_convert.

More details can be found in @do_not_convert documentation.

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • function (object) –

    A callable object that defines the function of the operator.

    Warning

    The function must not hold a reference to the pipeline in which it is used. If it does, a circular reference to the pipeline will form and the pipeline will never be freed.

  • batch_processing (bool, optional, default = False) –

    Determines whether the function is invoked once per batch or separately for every sample in the batch.

    If set to True, the function will receive its arguments as lists of NumPy or CuPy arrays, for CPU and GPU backend, respectively.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_outputs (int, optional, default = 1) – Number of outputs.

  • output_layouts (layout str or list of layout str, optional) –

    Tensor data layouts for the outputs.

    This argument can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set and the rest of the outputs have no layout assigned.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

static current_stream()#

Gets DALI’s current CUDA stream.

class nvidia.dali.ops.ROIRandomCrop(*, device='cpu', **kwargs)#

Produces a fixed shape cropping window, randomly placed so that as much of the provided region of interest (ROI) is contained in it.

If the ROI is bigger than the cropping window, the cropping window will be a subwindow of the ROI. If the ROI is smaller than the cropping window, the whole ROI shall be contained in the cropping window.

If an input shape (in_shape) is given, the resulting cropping window is selected to be within the bounds of that input shape. Alternatively, the input data subject to cropping can be passed to the operator, in the operator. When providing an input shape, the region of interest should be within the bounds of the input and the cropping window shape should not be larger than the input shape.

If no input shape is provided, the resulting cropping window is unbounded, potentially resulting in out of bounds cropping.

The cropping window dimensions should be explicitly provided (crop_shape), and the ROI should be either specified with roi_start/roi_end or roi_start/roi_shape.

The operator produces an output representing the cropping window start coordinates.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • crop_shape (int or list of int or TensorList of int) – Cropping window dimensions.

  • roi_start (int or list of int or TensorList of int) – ROI start coordinates.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • in_shape (int or list of int or TensorList of int, optional) –

    Shape of the input data.

    If provided, the cropping window start will be selected so that the cropping window is within the bounds of the input.

    Note

    Providing in_shape is incompatible with feeding the input data directly as a positional input.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • roi_end (int or list of int or TensorList of int, optional) –

    ROI end coordinates.

    Note

    Using roi_end is mutually exclusive with roi_shape.

  • roi_shape (int or list of int or TensorList of int, optional) –

    ROI shape.

    Note

    Using roi_shape is mutually exclusive with roi_end.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.RandomBBoxCrop(*, device='cpu', **kwargs)#

Applies a prospective random crop to an image coordinate space while keeping the bounding boxes, and optionally labels, consistent.

This means that after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped ROI. The applied random crop operation is constrained by the arguments that are provided to the operator.

The cropping window candidates are randomly selected until one matches the overlap restrictions that are specified by the thresholds argument. thresholds values represent a minimum overlap metric that is specified by threshold_type, such as the intersection-over-union of the cropping window and the bounding boxes or the relative overlap as a ratio of the intersection area and the bounding box area.

Additionally, if allow_no_crop is True, the cropping may be skipped entirely as one of the valid results of the operator.

The following modes of a random crop are available:

  • Randomly shaped window, which is randomly placed in the original input space.
    The random crop window dimensions are selected to satisfy the aspect ratio and relative area restrictions.
    If input_shape is provided, it will be taken into account for the aspect ratio range check.
    Otherwise, the aspect ratios are calculated in relative terms.
    In other words, without input_shape, an aspect ratio of 1.0 is equivalent to the aspect ratio of the input image.
  • Fixed size window, which is randomly placed in the original input space.
    The random crop window dimensions are taken from the crop_shape argument and the anchor is
    randomly selected.
    When providing crop_shape, input_shape is also required (these dimensions are required to
    scale the output bounding boxes).

The num_attempts argument can be used to control the maximum number of attempts to produce a valid crop to match a minimum overlap metric value from thresholds.

Warning

When allow_no_crop is False and thresholds does not contain 0.0, if you do not increase the num_attempts value, it might continue to loop for a long time.

Inputs: 0: bboxes, (1: labels)

The first input, bboxes, refers to the bounding boxes that are provided as a two-dimensional tensor where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.

The coordinates are relative to the original image dimensions (that means, a range of [0.0, 1.0]) that represent the start and, depending on the value of bbox_layout, the end of the region or start and shape. For example, bbox_layout=”xyXY” means the bounding box coordinates follow the start_x, start_y, end_x, and end_y order, and bbox_layout=”xyWH” indicates that the order is start_x, start_y, width, and height. See the bbox_layout argument description for more information.

An optional input labels can be provided, representing the labels that are associated with each of the bounding boxes.

Outputs: 0: anchor, 1: shape, 2: bboxes (, 3: labels, 4: bboxes_indices)

The resulting crop parameters are provided as two separate outputs, anchor and shape, that can be fed directly to the nvidia.dali.fn.slice() operator to complete the cropping of the original image. anchor and shape contain the starting coordinates and dimensions for the crop in the [x, y, (z)] and [w, h, (d)] formats, respectively. The coordinates can be represented in absolute or relative terms, and the representation depends on whether the fixed crop_shape was used.

Note

Both anchor and shape are returned as a float, even if they represent absolute coordinates due to providing crop_shape argument. In order for them to be interpreted correctly by nvidia.dali.fn.slice(), normalized_anchor and normalized_shape should be set to False.

The third output contains the bounding boxes, after filtering them out by centroid or area thresholding (see bbox_prune_threshold argument), and with the coordinates mapped to the new coordinate space.

The next output is optional, and it represents the labels associated with the filtered bounding boxes. The output will be present if a labels input was provided.

The last output, also optional, corresponds to the original indices of the bounding boxes that passed the aforementioned filtering process and are present in the output. This output will be present if the option output_bbox_indices is set to True.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • all_boxes_above_threshold (bool, optional, default = True) –

    If set to True, all bounding boxes in a sample should overlap with the cropping window as specified by thresholds.

    If the bounding boxes do not overlap, the cropping window is considered to be invalid. If set to False, and at least one bounding box overlaps the window, the window is considered to be valid.

  • allow_no_crop (bool, optional, default = True) – If set to True, one of the possible outcomes of the random process will be to not crop, as if the outcome was one more thresholds value from which to choose.

  • aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –

    Valid range of aspect ratio of the cropping windows.

    This parameter can be specified as either two values (min, max) or six values (three pairs), depending on the dimensionality of the input.

    • For 2D bounding boxes, one range of valid aspect ratios (x/y) should be provided (e.g. [min_xy, max_xy]).
    • For 3D bounding boxes, three separate aspect ratio ranges may be specified, for x/y, x/z and y/z pairs of dimensions.
      They are provided in the following order [min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]. Alternatively, if only one aspect ratio range is provided, it will be used for all three pairs of dimensions.

    The value for min should be greater than 0.0, and min should be less than or equal to the max value. By default, square windows are generated.

    Note

    Providing aspect_ratio and scaling is incompatible with explicitly specifying crop_shape.

    Note

    If input_shape is provided, it will be taken into account for the calculation of the cropping window aspect ratio. Otherwise, the aspect ratio ranges are relative to the image dimensions. In other words, when input_shape is not specified, an aspect ratio of 1.0 is equivalent to the original aspect ratio of the image.

  • bbox_layout (layout str, optional, default = ‘’) –

    Determines the meaning of the coordinates of the bounding boxes.

    The value of this argument is a string containing the following characters:

    x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor),
    X (horizontal end anchor),   Y (vertical end anchor),   Z (depthwise end anchor),
    W (width),                   H (height),                D (depth).
    

    Note

    If this value is left empty, depending on the number of dimensions, “xyXY” or “xyzXYZ” is assumed.

  • bbox_prune_threshold (float, optional) –

    Controls when bboxes are considered outside of the ROI and pruned. If this argument is set, boxes are kept if the fraction of their area within the ROI is greater than or equal to the threshold specified [0.0,1.0]. If this argument is not set, boxes are pruned if their centroid is outside of the ROI.

    For example, when bbox_prune_threshold=0.2 bboxes that have at least 20% of their original area within the ROI are kept, bboxes less than or equal to are pruned. If bbox_prune_threshold=0.0, all boxes that have some presence in the ROI are kept.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop_shape (int or list of int or TensorList of int, optional, default = []) –

    If provided, the random crop window dimensions will be fixed to this shape.

    The order of dimensions is determined by the layout provided in shape_layout.

    Note

    When providing crop_shape, input_shape should be provided as well. Providing explicit crop_shape is incompatible with using scaling and aspect_ratio arguments.

  • input_shape (int or list of int or TensorList of int, optional, default = []) –

    Specifies the shape of the original input image.

    The order of dimensions is determined by the layout that is provided in shape_layout.

  • ltrb (bool, optional, default = True) –

    If set to True, bboxes are returned as [left, top, right, bottom]; otherwise they are provided as [left, top, width, height].

    Warning

    This argument has been deprecated. To specify the bbox encoding, use bbox_layout instead. For example, ltrb=True is equal to bbox_layout=”xyXY”, and ltrb=False corresponds to bbox_layout=”xyWH”.

  • num_attempts (int, optional, default = 1) –

    Number of attempts to get a crop window that matches the aspect_ratio and a selected value from thresholds.

    After each num_attempts, a different threshold will be picked, until the threshold reaches a maximum of total_num_attempts (if provided) or otherwise indefinitely.

  • output_bbox_indices (bool, optional, default = False) – If set to True, an extra output will be returned, containing the original indices of the bounding boxes that passed the centroid filter and are present in the output bounding boxes.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • scaling (float or list of float, optional, default = [1.0, 1.0]) –

    Range [min, max] for the crop size with respect to the original image dimensions.

    The value of min and max must satisfy the condition 0.0 <= min <= max.

    Note

    Providing aspect_ratio and scaling is incompatible when explicitly specifying the crop_shape value.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape_layout (layout str, optional, default = ‘’) –

    Determines the meaning of the dimensions provided in crop_shape and input_shape.

    The values are:

    • W (width)

    • H (height)

    • D (depth)

    Note

    If left empty, depending on the number of dimensions "WH" or "WHD" will be assumed.

  • threshold_type (str, optional, default = ‘iou’) –

    Determines the meaning of thresholds.

    By default, thresholds refers to the intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, the threshold can be set to “overlap” to specify the fraction (by area) of the bounding box that will will fall inside the crop window. For example, a threshold value of 1.0 means the entire bounding box must be contained in the resulting cropping window.

  • thresholds (float or list of float, optional, default = [0.0]) –

    Minimum IoU or a different metric, if specified by threshold_type, of the bounding boxes with respect to the cropping window.

    Each sample randomly selects one of the thresholds, and the operator will complete up to the specified number of attempts to produce a random crop window that has the selected metric above that threshold. See num_attempts for more information about configuring the number of attempts.

  • total_num_attempts (int, optional, default = -1) –

    If provided, it indicates the total maximum number of attempts to get a crop window that matches the aspect_ratio and any selected value from thresholds.

    After total_num_attempts attempts, the best candidate will be selected.

    If this value is not specified, the crop search will continue indefinitely until a valid crop is found.

    Warning

    If you do not provide a total_num_attempts value, this can result in an infinite loop if the conditions imposed by the arguments cannot be satisfied.

__call__(boxes, labels=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __boxes (2D TensorList of float) – Relative coordinates of the bounding boxes that are represented as a 2D tensor, where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.

  • __labels (1D TensorList of integers, optional) – Labels that are associated with each of the bounding boxes.

class nvidia.dali.ops.RandomCropGenerator(*, device='cpu', **kwargs)#

Produces a cropping window with a randomly selected area and aspect ratio.

Expects a one-dimensional input representing the shape of the input we want to crop (HW or HWC representation).

Produces two outputs, representing the anchor and shape of the cropping window.

The outputs of this operator (anchor and shape) can be fed to fn.slice, fn.decoders.image_slice or any other operator accepting a region of interest. For example:

crop_anchor, crop_shape = fn.random_crop_generator(image_shapes)
images_crop = fn.slice(images, start=crop_anchor, shape=crop_shape, axes=[0, 1])
Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) –

    Range from which to choose random area fraction A.

    The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.RandomResizedCrop(*, device='cpu', **kwargs)#

Performs a crop with a randomly selected area and aspect ratio and resizes it to the specified size.

Expects a three-dimensional input with samples in height, width, channels (HWC) layout.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • size (int or list of int) – Size of the resized image.

  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) –

    Range from which to choose random area fraction A.

    The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'CHW', 'FHWC', 'FCHW', 'CFHW')) – Input to the operator.

class nvidia.dali.ops.Reinterpret(*, device='cpu', **kwargs)#

Treats content of the input as if it had a different type, shape, and/or layout.

The buffer contents are not copied.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    The total size, in bytes, of the output must match the input. If no shape is provided, the innermost dimension is adjusted accordingly. If the byte size of the innermost dimension is not divisible by the size of the target type, an error occurs.

  • layout (layout str, optional, default = ‘’) –

    New layout for the data.

    If a value is not specified, if number of dimension matches existing layout, the output layout is preserved. If the number of dimensions does not match, the argument is reset to empty. If a value is set, and is not empty, the layout must match the dimensionality of the output.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rel_shape (float or list of float or TensorList of float, optional, default = []) –

    The relative shape of the output.

    The output shape is calculated by multiplying the input shape by rel_shape:

    out_shape[i] = in_shape[i] * rel_shape[i]
    

    An additional argument src_dims may be used to alter which source dimension is used for calculating the output shape:

    out_shape[i] = in_shape[src_dims[i]] * rel_shape[i]
    

    There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape [480, 640, 3] and a rel_shape = [0.5, -1] results in the output shape [240, 3840].

    The number of dimensions is subject to the following restrictions:

    • if src_dims argument is used, the number of elements in src_dims and rel_shape must match

    • otherwise, the length of rel_shape must not exceed the number of dimensions in the input except when the last element in rel_shape is negative, in which case an extra dimension at the end will be added

    Note

    rel_shape and shape are mutually exclusive.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional, default = []) –

    The desired shape of the output.

    There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape [480, 640, 3] and shape = [240, -1] results in the output shape [240, 3840].

    Note

    rel_shape and shape are mutually exclusive.

  • src_dims (int or list of int or TensorList of int, optional, default = []) –

    Indices of dimensions to keep.

    This argument can be used to manipulate the order of existing dimensions or to remove or add dimension. A special index value -1 can be used to insert new dimensions.

    For example, reshaping a sample with shape [300, 200, 1] and a src_dims argument [-1, 1, 0] produces an output shape [1, 200, 300]. A leading dimension with extent 1 is inserted at the beginning, followed by the first original dimensions but in reverse order. The last dimension is removed.

    The src_dims argument can be used together with rel_shape, in which case the relative extents in rel_shape describe to the target dimensions. In the example above, specifying rel_shape = [-1, 0.5, 2] would result in the output shape [1, 100, 600].

    All indices must be in the range of valid dimensions of the input, or -1.

__call__(data, shape_input=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Data to be reshaped

  • __shape_input (1D TensorList of integers, optional) – Same as shape keyword argument

class nvidia.dali.ops.Reshape(*, device='cpu', **kwargs)#

Treats content of the input as if it had a different shape and/or layout.

The buffer contents are not copied.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • layout (layout str, optional, default = ‘’) –

    New layout for the data.

    If a value is not specified, if number of dimension matches existing layout, the output layout is preserved. If the number of dimensions does not match, the argument is reset to empty. If a value is set, and is not empty, the layout must match the dimensionality of the output.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rel_shape (float or list of float or TensorList of float, optional, default = []) –

    The relative shape of the output.

    The output shape is calculated by multiplying the input shape by rel_shape:

    out_shape[i] = in_shape[i] * rel_shape[i]
    

    An additional argument src_dims may be used to alter which source dimension is used for calculating the output shape:

    out_shape[i] = in_shape[src_dims[i]] * rel_shape[i]
    

    There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape [480, 640, 3] and a rel_shape = [0.5, -1] results in the output shape [240, 3840].

    The number of dimensions is subject to the following restrictions:

    • if src_dims argument is used, the number of elements in src_dims and rel_shape must match

    • otherwise, the length of rel_shape must not exceed the number of dimensions in the input except when the last element in rel_shape is negative, in which case an extra dimension at the end will be added

    Note

    rel_shape and shape are mutually exclusive.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional, default = []) –

    The desired shape of the output.

    There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape [480, 640, 3] and shape = [240, -1] results in the output shape [240, 3840].

    Note

    rel_shape and shape are mutually exclusive.

  • src_dims (int or list of int or TensorList of int, optional, default = []) –

    Indices of dimensions to keep.

    This argument can be used to manipulate the order of existing dimensions or to remove or add dimension. A special index value -1 can be used to insert new dimensions.

    For example, reshaping a sample with shape [300, 200, 1] and a src_dims argument [-1, 1, 0] produces an output shape [1, 200, 300]. A leading dimension with extent 1 is inserted at the beginning, followed by the first original dimensions but in reverse order. The last dimension is removed.

    The src_dims argument can be used together with rel_shape, in which case the relative extents in rel_shape describe to the target dimensions. In the example above, specifying rel_shape = [-1, 0.5, 2] would result in the output shape [1, 100, 600].

    All indices must be in the range of valid dimensions of the input, or -1.

__call__(data, shape_input=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Data to be reshaped

  • __shape_input (1D TensorList of integers, optional) – Same as shape keyword argument

class nvidia.dali.ops.Resize(*, device='cpu', **kwargs)#

Resize images.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • image_type (nvidia.dali.types.DALIImageType) –

    Warning

    The argument image_type is no longer used and will be removed in a future release.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • save_attrs (bool, optional, default = False) – Save reshape attributes for testing.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'CHW', 'FCHW', 'CFHW', 'DHWC', 'FDHWC', 'CDHW', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.ResizeCropMirror(*, device='cpu', **kwargs)#

Performs a fused resize, crop, mirror operation.

The result of the operation is equivalent to applying resize, followed by crop and flip. Internally, the operator calculates the relevant region of interest and performs a single resizing operation on that region. .

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mirror (int or TensorList of int, optional, default = 0) –

    Mask for flipping

    Supported values:

    • 0 - No flip

    • 1 - Horizontal flip

    • 2 - Vertical flip

    • 4 - Depthwise flip

    • any bitwise combination of the above

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • rounding (str, optional, default = ‘round’) –

    Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

    Possible values are:

    • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
    • "truncate" - Discards the fractional part of the number (truncates towards zero).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'CHW', 'FCHW', 'CFHW', 'DHWC', 'FDHWC', 'CDHW', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.Rotate(*, device='cpu', **kwargs)#

Rotates the images by the specified angle.

This operator supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • angle (float or TensorList of float) –

    Angle, in degrees, by which the image is rotated.

    For two-dimensional data, the rotation is counter-clockwise, assuming the top-left corner is at (0,0). For three-dimensional data, the angle is a positive rotation around the provided axis.

    Supports per-frame inputs.

  • axis (float or list of float or TensorList of float, optional, default = []) –

    Applies only to three-dimension and is the axis around which to rotate the image.

    The vector does not need to be normalized, but it must have a non-zero length. Reversing the vector is equivalent to changing the sign of angle.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • fill_value (float, optional) –

    Value used to fill areas that are outside the source image.

    If a value is not specified, the source coordinates are clamped and the border pixel is repeated.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

  • keep_size (bool, optional, default = False) –

    If True, original canvas size is kept.

    If set to False (default), and the size is not set, the canvas size is adjusted to accommodate the rotated image with the least padding possible.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional, default = []) –

    Output size, in pixels/points.

    Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify (480,640), not (480,640,3).

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'DHWC', 'FDHWC')) – Input to the operator.

class nvidia.dali.ops.SSDRandomCrop(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use RandomBBoxCrop() instead.

Performs a random crop with bounding boxes where Intersection Over Union (IoU) meets a randomly selected threshold between 0-1.

When the IoU falls below the threshold, a new random crop is generated up to num_attempts. As an input, the operator accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_attempts (int, optional, default = 1) – Number of attempts.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.SSDRandomCrop() class for complete information.

class nvidia.dali.ops.Saturation(*, device='cpu', **kwargs)#

Changes the saturation level of the image.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • saturation (float or TensorList of float, optional, default = 1.0) –

    The saturation change factor.

    Values must be non-negative.

    Example values:

    • 0 - Completely desaturated image.

    • 1 - No change to image’s saturation.

    Supports per-frame inputs.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'DHWC')) – Input to the operator.

class nvidia.dali.ops.SequenceReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.Sequence() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.sequence().

class nvidia.dali.ops.SequenceRearrange(*, device='cpu', **kwargs)#

Rearranges frames in a sequence.

Assumes that the outermost dimension represents the frame index in the sequence. If the input has a non-empty layout description, it must start with F (frame).

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • new_order (int or list of int or TensorList of int) –

    List that describes the new order for the elements in each sample.

    Output sequence at position i will contain element new_order[i] from input sequence:

    out[i, ...] = in[new_order[i], ...]
    

    Elements can be repeated or dropped, but empty output sequences are not allowed. Only indices in [0, input_outermost_extent) are allowed to be used in new_order. Can be specified per sample as 1D tensors.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Shapes(*, device='cpu', **kwargs)#

Returns the shapes of inputs.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type to which the sizes are converted.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • type (nvidia.dali.types.DALIDataType) –

    Warning

    The argument type is a deprecated alias for dtype. Use dtype instead.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Slice(*, device='cpu', **kwargs)#

Extracts a subtensor, or slice.

Note

For generic indexing and slicing you can use Python indexing syntax. See Indexing and Slicing for details.

The slice can be specified by proving the start and end coordinates, or start coordinates and shape of the slice. Both coordinates and shapes can be provided in absolute or relative terms.

The slice arguments can be specified by the following named arguments:

  1. start: Slice start coordinates (absolute)

  2. rel_start: Slice start coordinates (relative)

  3. end: Slice end coordinates (absolute)

  4. rel_end: Slice end coordinates (relative)

  5. shape: Slice shape (absolute)

  6. rel_shape: Slice shape (relative)

The slice can be configured by providing start and end coordinates or start and shape. Relative and absolute arguments can be mixed (for example, rel_start can be used with shape) as long as start and shape or end are uniquely defined.

Alternatively, two extra positional inputs can be provided, specifying anchor and shape. When using positional inputs, two extra boolean arguments normalized_anchor/normalized_shape can be used to specify the nature of the arguments provided. Using positional inputs for anchor and shape is incompatible with the named arguments specified above.

Note

For GPU backend and positional inputs anchor and shape, both CPU and GPU data nodes are accepted, though CPU inputs are preferred. Providing those arguments as GPU inputs will result in an additional device-to-host copy with its associated synchronization point. When possible, provide anchor and shape as CPU inputs.

The slice arguments should provide as many dimensions as specified by the axis_names or axes arguments.

By default, the nvidia.dali.fn.slice() operator uses normalized coordinates and WH order for the slice arguments.

This operator supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int or TensorList of int, optional, default = [1, 0]) –

    Order of dimensions used for the anchor and shape slice inputs as dimension indices.

    Negative values are interpreted as counting dimensions from the back. Valid range: [-ndim, ndim-1], where ndim is the number of dimensions in the input data.

  • axis_names (layout str, optional, default = ‘WH’) –

    Order of the dimensions used for the anchor and shape slice inputs, as described in layout.

    If a value is provided, axis_names will have a higher priority than axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Supported types: FLOAT, FLOAT16, and UINT8.

    If not set, the input type is used.

  • end (int or list of int or TensorList of int, optional) –

    End coordinates of the slice.

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • fill_values (float or list of float, optional, default = [0.0]) –

    Determines padding values and is only relevant if out_of_bounds_policy is set to “pad”.

    If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension C in the layout) in the output slice.

  • image_type (nvidia.dali.types.DALIImageType) –

    Warning

    The argument image_type is no longer used and will be removed in a future release.

  • normalized_anchor (bool, optional, default = True) –

    Determines whether the anchor positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

    Note

    This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

  • normalized_shape (bool, optional, default = True) –

    Determines whether the shape positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

    Note

    This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

  • out_of_bounds_policy (str, optional, default = ‘error’) –

    Determines the policy when slicing the out of bounds area of the input.

    Here is a list of the supported values:

    • "error" (default): Attempting to slice outside of the bounds of the input will produce an error.

    • "pad": The input will be padded as needed with zeros or any other value that is specified with the fill_values argument.

    • "trim_to_shape": The slice window will be cut to the bounds of the input.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rel_end (float or list of float or TensorList of float, optional) –

    End relative coordinates of the slice (range [0.0 - 1.0].

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • rel_shape (float or list of float or TensorList of float, optional) –

    Relative shape of the slice (range [0.0 - 1.0]).

    Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • rel_start (float or list of float or TensorList of float, optional) –

    Start relative coordinates of the slice (range [0.0 - 1.0]).

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) –

    Shape of the slice.

    Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • start (int or list of int or TensorList of int, optional) –

    Start coordinates of the slice.

    Note: Providing named arguments start/end or start/shape is incompatible with providing positional inputs anchor and shape.

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(data, anchor=None, shape=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Batch that contains the input data.

  • __anchor (1D TensorList of float or int, optional) –

    (Optional) Input that contains normalized or absolute coordinates for the starting point of the slice (x0, x1, x2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_anchor.

  • __shape (1D TensorList of float or int, optional) –

    (Optional) Input that contains normalized or absolute coordinates for the dimensions of the slice (s0, s1, s2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_shape.

class nvidia.dali.ops.Spectrogram(*, device='cpu', **kwargs)#

Produces a spectrogram from a 1D signal (for example, audio).

Input data is expected to be one channel (shape being (nsamples,), (nsamples, 1), or (1, nsamples)) of type float32.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • center_windows (bool, optional, default = True) –

    Indicates whether extracted windows should be padded so that the window function is centered at multiples of window_step.

    If set to False, the signal will not be padded, that is, only windows within the input range will be extracted.

  • layout (layout str, optional, default = ‘ft’) – Output layout: “ft” (frequency-major) or “tf” (time-major).

  • nfft (int, optional) –

    Size of the FFT.

    The number of bins that are created in the output is nfft // 2 + 1.

    Note

    The output only represents the positive part of the spectrum.

  • power (int, optional, default = 2) –

    Exponent of the magnitude of the spectrum.

    Supported values:

    • 1 - amplitude,

    • 2 - power (faster to compute).

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reflect_padding (bool, optional, default = True) –

    Indicates the padding policy when sampling outside the bounds of the signal.

    If set to True, the signal is mirrored with respect to the boundary, otherwise the signal is padded with zeros.

    Note

    When center_windows is set to False, this option is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • window_fn (float or list of float, optional, default = []) –

    Samples of the window function that will be multiplied to each extracted window when calculating the STFT.

    If a value is provided, it should be a list of floating point numbers of size window_length. If a value is not provided, a Hann window will be used.

  • window_length (int, optional, default = 512) – Window size in number of samples.

  • window_step (int, optional, default = 256) – Step between the STFT windows in number of samples.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Sphere(*, device='cpu', **kwargs)#

Performs a sphere augmentation.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • fill_value (float, optional, default = 0.0) – Color value that is used for padding.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or TensorList of int, optional, default = 1) –

    Determines whether to apply this augmentation to the input image.

    Here are the values:

    • 0: Do not apply this transformation.

    • 1: Apply this transformation.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.Squeeze(*, device='cpu', **kwargs)#

Removes the dimensions given as axes or axis_names.

It’s an error to remove a dimension that would cause the total volume to change.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int or TensorList of int, optional, default = []) –

    Indices of dimensions which should be removed.

    All squeezed dimensions should have size 1, unless the total volume of the tensor is 0 before and after squeeze. All indices must be in the range of valid dimensions of the input

  • axis_names (layout str, optional, default = ‘’) –

    Layout columns which should be removed.

    All squeezed dimensions should have size 1, unless the total volume of the tensor is 0 before and after squeeze. All layout names should be present in data layout.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__data (TensorList) – Data to be squeezed

class nvidia.dali.ops.Stack(*, device='cpu', **kwargs)#

Joins the input tensors along a new axis.

The shapes of respective tensors in the inputs must match.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axis (int, optional, default = 0) –

    The axis in the output tensor along which the inputs are stacked.

    The axis is inserted before a corresponding axis in the inputs. A value of 0 indicates that whole tensors are stacked. Specifying axis equal to the number of dimensions in the inputs causes the values from the inputs to be interleaved).

    Accepted range is [-ndim, ndim]. Negative indices are counted from the back.

  • axis_name (str, optional) –

    Name of the new axis to be inserted.

    A one-character string that will denote the new axis in the output layout. The output layout will be constructed by inserting that character into the input layout at the position indicated by axis. For example, specifying axis = 0 and axis_name = "C" with input layout “HW” will yield the output layout “CHW”

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.Stack() class for complete information.

class nvidia.dali.ops.TFRecordReader(path, index_path, features, **kwargs)#

Warning

This operator is now deprecated. Use readers.TFRecord() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.tfrecord().

class nvidia.dali.ops.ToDecibels(*, device='cpu', **kwargs)#

Converts a magnitude (real, positive) to the decibel scale.

Conversion is done according to the following formula:

min_ratio = pow(10, cutoff_db / multiplier)
out[i] = multiplier * log10( max(min_ratio, input[i] / reference) )
Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cutoff_db (float, optional, default = -200.0) –

    Minimum or cut-off ratio in dB.

    Any value below this value will saturate. For example, a value of cutoff_db=-80 corresponds to a minimum ratio of 1e-8.

  • multiplier (float, optional, default = 10.0) – Factor by which the logarithm is multiplied. The value is typically 10.0 or 20.0, which depends on whether the magnitude is squared.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reference (float, optional, default = 0.0) –

    Reference magnitude.

    If a value is not provided, the maximum value for the input will be used as reference.

    Note

    The maximum of the input will be calculated on a per-sample basis.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.TorchPythonFunction(*, device='cpu', **kwargs)#

Executes a function that is operating on Torch tensors.

This class is analogous to nvidia.dali.fn.python_function() but the tensor data is handled as PyTorch tensors.

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • function (object) –

    A callable object that defines the function of the operator.

    Warning

    The function must not hold a reference to the pipeline in which it is used. If it does, a circular reference to the pipeline will form and the pipeline will never be freed.

  • batch_processing (bool, optional, default = True) – Determines whether the function gets an entire batch as an input.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_outputs (int, optional, default = 1) – Number of outputs.

  • output_layouts (layout str or list of layout str, optional) –

    Tensor data layouts for the outputs.

    This argument can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set and the rest of the outputs have no layout assigned.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.TorchPythonFunction() class for complete information.

class nvidia.dali.ops.Transpose(*, device='cpu', **kwargs)#

Transposes the tensors by reordering the dimensions based on the perm parameter.

Destination dimension i is obtained from source dimension perm[i].

For example, for a source image with HWC layout, shape = (100, 200, 3), and perm = [2, 0, 1], it will produce a destination image with CHW layout and shape = (3, 100, 200), holding the equality:

\[dst(x_2, x_0, x_1) = src(x_0, x_1, x_2)\]

which is equivalent to:

\[dst(x_{perm[0]}, x_{perm[1]}, x_{perm[2]}) = src(x_0, x_1, x_2)\]

for all valid coordinates.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • output_layout (layout str, optional, default = ‘’) –

    Explicitly sets the output data layout.

    If this argument is specified, transpose_layout is ignored.

  • perm (int or list of int, optional) –

    Permutation of the dimensions of the input, for example, [2, 0, 1].

    If not given, the dimensions are reversed.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • transpose_layout (bool, optional, default = True) –

    When set to True, the axis names in the output data layout are permuted according to perm, Otherwise, the input layout is copied to the output.

    If output_layout is set, this argument is ignored.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.Uniform(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use random.Uniform() instead.

Generates random numbers following a uniform distribution.

It can be configured to produce a continuous uniform distribution in the range [min, max), or a discrete uniform distribution where any of the specified values [v0, v1, …, vn] occur with equal probability.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • range (float or list of float or TensorList of float, optional, default = [-1.0, 1.0]) –

    Range [min, max) of a continuous uniform distribution.

    This argument is mutually exclusive with values.

    Warning

    When specifying an integer type as dtype, the generated numbers can go outside the specified range, due to rounding.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

  • values (float or list of float or TensorList of float, optional) –

    The discrete values [v0, v1, …, vn] produced by a discrete uniform distribution.

    This argument is mutually exclusive with range.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.VideoReader(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.Video() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.video().

class nvidia.dali.ops.VideoReaderResize(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use readers.VideoResize() instead.

In DALI 1.0 all readers were moved into a dedicated readers submodule and renamed to follow a common pattern. This is a placeholder operator with identical functionality to allow for backward compatibility.

Legacy alias for readers.video_resize().

class nvidia.dali.ops.WarpAffine(*, device='cpu', **kwargs)#

Applies an affine transformation to the images.

This operator supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    If not set, the input type is used.

  • fill_value (float, optional) –

    Value used to fill areas that are outside the source image.

    If a value is not specified, the source coordinates are clamped and the border pixel is repeated.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

  • inverse_map (bool, optional, default = True) – Set to False if the given transform is a destination to source mapping, True otherwise.

  • matrix (float or list of float or TensorList of float, optional, default = []) –

    Transform matrix.

    When the inverse_map option is set to true (default), the matrix represents a destination to source mapping. With a list of values (M11, M12, M13, M21, M22, M23), this operation produces a new image by using the following formula:

    dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)
    

    Where [0, 0] coordinate means the corner of the first pixel.

    If the inverse_map option is set to false, the matrix represents a source to destination transform and it is inverted before applying the formula above.

    It is equivalent to OpenCV’s warpAffine operation with the inverse_map argument being analog to the WARP_INVERSE_MAP flag.

    Note

    Instead of this argument, the operator can take a second positional input, in which case the matrix can be placed on the GPU.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional, default = []) –

    Output size, in pixels/points.

    Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify (480,640), not (480,640,3).

  • output_dtype (nvidia.dali.types.DALIDataType) –

    Warning

    The argument output_dtype is a deprecated alias for dtype. Use dtype instead.

__call__(data, mtx=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList ('HWC', 'FHWC', 'DHWC', 'FDHWC')) – The image or volume to be warped

  • __mtx (TensorList of float, optional) – Like matrix argument, but can be placed in GPU memory

class nvidia.dali.ops.Water(*, device='cpu', **kwargs)#

Performs a water augmentation, which makes the image appear to be underwater.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • ampl_x (float, optional, default = 10.0) – Amplitude of the wave in the x direction.

  • ampl_y (float, optional, default = 10.0) – Amplitude of the wave in the y direction.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • fill_value (float, optional, default = 0.0) – Color value that is used for padding.

  • freq_x (float, optional, default = 0.049087) – Frequency of the wave in the x direction.

  • freq_y (float, optional, default = 0.049087) – Frequence of the wave in the y direction.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.

  • mask (int or TensorList of int, optional, default = 1) –

    Determines whether to apply this augmentation to the input image.

    Here are the values:

    • 0: Do not apply this transformation.

    • 1: Apply this transformation.

  • phase_x (float, optional, default = 0.0) – Phase of the wave in the x direction.

  • phase_y (float, optional, default = 0.0) – Phase of the wave in the y direction.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC')) – Input to the operator.

class nvidia.dali.ops.Zeros(*, device='cpu', **kwargs)#

Returns new data of given shape and type, filled with zeros.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT32) – Output data type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.ZerosLike(*, device='cpu', **kwargs)#

Returns new data with the same shape and type as the input array, filled with zeros.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT32) – Overrides the output data type.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data_like, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__data_like (TensorList) – The input data value to copy the shape and type from.

nvidia.dali.ops.python_op_factory(name, schema_name, internal_schema_name=None, generated=True)#

Generate the ops API class bindings for operator.

Parameters:
  • name (str) – The name of the operator (without the module) - this will be the name of the class

  • schema_name (str) – Name of the schema, used for documentation lookups and schema/spec retrieval unless internal_schema_name is provided

  • internal_schema_name (str, optional) – If provided, this will be the schema used to process the arguments, by default None

  • generated (bool, optional) – Mark this class as fully generated API binding (True), or as a (base) class used for manually extending the binding code (False), by default True.

nvidia.dali.ops.decoders#

class nvidia.dali.ops.decoders.Audio(*, device='cpu', **kwargs)#

Decodes waveforms from encoded audio data.

It supports the following audio formats: wav, flac and ogg. This operator produces the following outputs:

  • output[0]: A batch of decoded data

  • output[1]: A batch of sampling rates [Hz].

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • downmix (bool, optional, default = False) –

    If set to True, downmix all input channels to mono.

    If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Output data type.

    Supported types: INT16, INT32, FLOAT.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • quality (float, optional, default = 50.0) –

    Resampling quality, where 0 is the lowest, and 100 is the highest.

    0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.

  • sample_rate (float or TensorList of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.decoders.Image(*, device='cpu', **kwargs)#

Decodes images.

For jpeg images, depending on the backend selected (“mixed” and “cpu”), the implementation uses the nvJPEG library or libjpeg-turbo, respectively. Other image formats are decoded with OpenCV or other specific libraries, such as libtiff.

If used with a mixed backend, and the hardware is available, the operator will use a dedicated hardware decoder.

Warning

Due to performance reasons, hardware decoder is disabled for driver < 455.x

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000, WebP. Please note that GPU acceleration for JPEG 2000 decoding is only available for CUDA 11 and newer.

Note

WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container

Note

EXIF orientation metadata is disregarded.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cache_batch_copy (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, multiple images from the cache are copied with a batched copy kernel call. Otherwise, unless the order in the batch is the same as in the cache, each image is copied with cudaMemcpy.

  • cache_debug (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints the debug information about the decoder cache.

  • cache_size (int, optional, default = 0) –

    Applies only to the mixed backend type.

    Total size of the decoder cache in megabytes. When provided, the decoded images that are larger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The size threshold, in bytes, for decoded images to be cached. When an image is cached, it no longer needs to be decoded when it is encountered at the operator input saving processing time.

  • cache_type (str, optional, default = ‘’) –

    Applies only to the mixed backend type.

    Here is a list of the available cache types:

    • threshold: caches every image with a size that is larger than cache_threshold until
      the cache is full.

      The warm-up time for threshold policy is 1 epoch.

    • largest: stores the largest images that can fit in the cache.
      The warm-up time for largest policy is 2 epochs

      Note

      To take advantage of caching, it is recommended to configure readers with stick_to_shard=True to limit the amount of unique images seen by each decoder instance in a multi node environment.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • hw_decoder_load (float, optional, default = 0.65) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for device_memory_padding and host_memory_padding for a dataset.

    Note

    The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) –

    The color space of the output image.

    Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

  • preallocate_height_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preallocate_width_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • split_stages (bool) –

    Warning

    The argument split_stages is no longer used and will be removed in a future release.

  • use_chunk_allocator (bool) –

    Warning

    The argument use_chunk_allocator is no longer used and will be removed in a future release.

  • use_fast_idct (bool, optional, default = False) –

    Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

    According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.decoders.ImageCrop(*, device='cpu', **kwargs)#

Decodes images and extracts regions-of-interest (ROI) that are specified by fixed window dimensions and variable anchors.

When possible, the argument uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to reduce the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000, WebP.

Note

JPEG 2000 region-of-interest (ROI) decoding is not accelerated on the GPU, and will use a CPU implementation regardless of the selected backend. For a GPU accelerated implementation, consider using separate decoders.image and crop operators.

Note

EXIF orientation metadata is disregarded.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • hw_decoder_load (float, optional, default = 0.65) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for device_memory_padding and host_memory_padding for a dataset.

    Note

    The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) –

    The color space of the output image.

    Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

  • preallocate_height_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preallocate_width_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rounding (str, optional, default = ‘round’) –

    Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

    Possible values are:

    • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
    • "truncate" - Discards the fractional part of the number (truncates towards zero).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • split_stages (bool) –

    Warning

    The argument split_stages is no longer used and will be removed in a future release.

  • use_chunk_allocator (bool) –

    Warning

    The argument use_chunk_allocator is no longer used and will be removed in a future release.

  • use_fast_idct (bool, optional, default = False) –

    Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

    According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.decoders.ImageRandomCrop(*, device='cpu', **kwargs)#

Decodes images and randomly crops them.

The cropping window’s area (relative to the entire image) and aspect ratio can be restricted to a range of values specified by area and aspect_ratio arguments, respectively.

When possible, the operator uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to reduce the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.

The output of the decoder is in HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000, WebP.

Note

JPEG 2000 region-of-interest (ROI) decoding is not accelerated on the GPU, and will use a CPU implementation regardless of the selected backend. For a GPU accelerated implementation, consider using separate decoders.image and random_crop operators.

Note

EXIF orientation metadata is disregarded.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • hw_decoder_load (float, optional, default = 0.65) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for device_memory_padding and host_memory_padding for a dataset.

    Note

    The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.

  • num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) –

    The color space of the output image.

    Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

  • preallocate_height_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preallocate_width_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_area (float or list of float, optional, default = [0.08, 1.0]) –

    Range from which to choose random area fraction A.

    The cropped image’s area will be equal to A * original image’s area.

  • random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • split_stages (bool) –

    Warning

    The argument split_stages is no longer used and will be removed in a future release.

  • use_chunk_allocator (bool) –

    Warning

    The argument use_chunk_allocator is no longer used and will be removed in a future release.

  • use_fast_idct (bool, optional, default = False) –

    Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

    According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.decoders.ImageSlice(*, device='cpu', **kwargs)#

Decodes images and extracts regions of interest.

The slice can be specified by proving the start and end coordinates, or start coordinates and shape of the slice. Both coordinates and shapes can be provided in absolute or relative terms.

The slice arguments can be specified by the following named arguments:

  1. start: Slice start coordinates (absolute)

  2. rel_start: Slice start coordinates (relative)

  3. end: Slice end coordinates (absolute)

  4. rel_end: Slice end coordinates (relative)

  5. shape: Slice shape (absolute)

  6. rel_shape: Slice shape (relative)

The slice can be configured by providing start and end coordinates or start and shape. Relative and absolute arguments can be mixed (for example, rel_start can be used with shape) as long as start and shape or end are uniquely defined.

Alternatively, two extra positional inputs can be provided, specifying anchor and shape. When using positional inputs, two extra boolean arguments normalized_anchor/normalized_shape can be used to specify the nature of the arguments provided. Using positional inputs for anchor and shape is incompatible with the named arguments specified above.

The slice arguments should provide as many dimensions as specified by the axis_names or axes arguments.

By default, the nvidia.dali.fn.decoders.image_slice() operator uses normalized coordinates and “WH” order for the slice arguments.

When possible, the argument uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to optimize the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.

The output of the decoder is in the HWC layout.

Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000, WebP.

Note

JPEG 2000 region-of-interest (ROI) decoding is not accelerated on the GPU, and will use a CPU implementation regardless of the selected backend. For a GPU accelerated implementation, consider using separate decoders.image and slice operators.

Note

EXIF orientation metadata is disregarded.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • axes (int or list of int or TensorList of int, optional, default = [1, 0]) –

    Order of dimensions used for the anchor and shape slice inputs as dimension indices.

    Negative values are interpreted as counting dimensions from the back. Valid range: [-ndim, ndim-1], where ndim is the number of dimensions in the input data.

  • axis_names (layout str, optional, default = ‘WH’) –

    Order of the dimensions used for the anchor and shape slice inputs, as described in layout.

    If a value is provided, axis_names will have a higher priority than axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True and then copy the largest allocation value that was printed in the statistics.

  • end (int or list of int or TensorList of int, optional) –

    End coordinates of the slice.

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the memory_stats argument set to True, and then copy the largest allocation value that is printed in the statistics.

  • hw_decoder_load (float, optional, default = 0.65) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for device_memory_padding and host_memory_padding for a dataset.

    Note

    The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.

  • normalized_anchor (bool, optional, default = True) –

    Determines whether the anchor positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

    Note

    This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

  • normalized_shape (bool, optional, default = True) –

    Determines whether the shape positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

    Note

    This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

  • output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) –

    The color space of the output image.

    Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

  • preallocate_height_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preallocate_width_hint (int, optional, default = 0) –

    Image width hint.

    Applies only to the mixed backend type in NVIDIA Ampere GPU and newer architecture.

    The hint is used to preallocate memory for the HW JPEG decoder.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • rel_end (float or list of float or TensorList of float, optional) –

    End relative coordinates of the slice (range [0.0 - 1.0].

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • rel_shape (float or list of float or TensorList of float, optional) –

    Relative shape of the slice (range [0.0 - 1.0]).

    Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • rel_start (float or list of float or TensorList of float, optional) –

    Start relative coordinates of the slice (range [0.0 - 1.0]).

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) –

    Shape of the slice.

    Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • split_stages (bool) –

    Warning

    The argument split_stages is no longer used and will be removed in a future release.

  • start (int or list of int or TensorList of int, optional) –

    Start coordinates of the slice.

    Note: Providing named arguments start/end or start/shape is incompatible with providing positional inputs anchor and shape.

  • use_chunk_allocator (bool) –

    Warning

    The argument use_chunk_allocator is no longer used and will be removed in a future release.

  • use_fast_idct (bool, optional, default = False) –

    Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

    According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(data, anchor=None, shape=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Batch that contains the input data.

  • __anchor (1D TensorList of float or int, optional) –

    Input that contains normalized or absolute coordinates for the starting point of the slice (x0, x1, x2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_anchor.

  • __shape (1D TensorList of float or int, optional) –

    Input that contains normalized or absolute coordinates for the dimensions of the slice (s0, s1, s2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_shape.

nvidia.dali.ops.experimental#

class nvidia.dali.ops.experimental.AudioResample(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated. Use AudioResample() instead.

This operator was moved out from the experimental phase, and is now a regular DALI operator. This is just an deprecated alias kept for backward compatibility.

Legacy alias for audio_resample().

class nvidia.dali.ops.experimental.Debayer(*, device='cpu', **kwargs)#

Performs image demosaicing/debayering.

Converts single-channel image to RGB using specified color filter array.

The supported input types are uint8_t and uint16_t. The input images must be 2D tensors (HW) or 3D tensors (HWC) where the number of channels is 1. The operator supports sequence of images/video-like inputs (layout FHW).

For example, the following snippet presents debayering of batch of image sequences:

def bayered_sequence(sample_info):
  # some actual source of video inputs with corresponding pattern
  # as opencv-style string
  video, bayer_pattern = get_sequence(sample_info)
  if bayer_pattern == "bggr":
      blue_position = [0, 0]
  elif bayer_pattern == "gbrg":
      blue_position = [0, 1]
  elif bayer_pattern == "grbg":
      blue_position = [1, 0]
  else:
      assert bayer_pattern == "rggb"
      blue_position = [1, 1]
  return video, np.array(blue_position, dtype=np.int32)

@pipeline_def
def debayer_pipeline():
  bayered_sequences, blue_positions = fn.external_source(
    source=bayered_sequence, batch=False, num_outputs=2,
    layout=["FHW", None])  # note the "FHW" layout, for plain images it would be "HW"
  debayered_sequences = fn.experimental.debayer(
    bayered_sequences.gpu(), blue_position=blue_positions)
  return debayered_sequences

This operator allows sequence inputs.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • blue_position (int or list of int or TensorList of int) –

    The layout of color filter array/bayer tile.

    A position of the blue value in the 2x2 bayer tile. The supported values correspond to the following OpenCV bayer layouts:

    • (0, 0) - BG/BGGR

    • (0, 1) - GB/GBRG

    • (1, 0) - GR/GRBG

    • (1, 1) - RG/RGGB

    The argument follows OpenCV’s convention of referring to a 2x2 tile that starts in the second row and column of the sensors’ matrix.

    For example, the (0, 0)/BG/BGGR corresponds to the following matrix of sensors:

    R

    G

    R

    G

    R

    G

    B

    G

    B

    G

    R

    G

    R

    G

    R

    G

    B

    G

    B

    G

    Supports per-frame inputs.

  • algorithm (str, optional, default = ‘bilinear_npp’) –

    The algorithm to be used when inferring missing colours for any given pixel. Currently only bilinear_npp is supported.

    • The bilinear_npp algorithm uses bilinear interpolation to infer red and blue values. For green values a bilinear interpolation with chroma correlation is used as explained in NPP documentation.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HW', 'HWC', 'FHW', 'FHWC')) – Input to the operator.

class nvidia.dali.ops.experimental.Dilate(*, device='cpu', **kwargs)#

Performs a dilation operation on the input image.

This operator allows sequence inputs.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • anchor (int or list of int or TensorList of int, optional, default = [-1, -1]) –

    Sets the anchor point of the structuring element. Default value (-1, -1) uses the element’s center as the anchor point.

    Supports per-frame inputs.

  • border_mode (str, optional, default = ‘constant’) – Border mode to be used when accessing elements outside input image.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • iterations (int, optional, default = 1) – Number of times to execute the operation, typically set to 1. Setting to a value higher than 1 is equivelent to increasing the mask size by (mask_width - 1, mask_height -1) for every additional iteration.

  • mask_size (int or list of int or TensorList of int, optional, default = [3, 3]) –

    Size of the structuring element.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HW', 'HWC', 'FHWC', 'CHW', 'FCHW')) – Input data. Must be images in HWC or CHW layout, or a sequence of those.

class nvidia.dali.ops.experimental.Equalize(*, device='cpu', **kwargs)#

Performs grayscale/per-channel histogram equalization.

The supported inputs are images and videos of uint8_t type.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HW', 'HWC', 'CHW', 'FHW', 'FHWC', 'FCHW')) – Input to the operator.

class nvidia.dali.ops.experimental.Erode(*, device='cpu', **kwargs)#

Performs an erosion operation on the input image.

This operator allows sequence inputs.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • anchor (int or list of int or TensorList of int, optional, default = [-1, -1]) –

    Sets the anchor point of the structuring element. Default value (-1, -1) uses the element’s center as the anchor point.

    Supports per-frame inputs.

  • border_mode (str, optional, default = ‘constant’) – Border mode to be used when accessing elements outside input image.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • iterations (int, optional, default = 1) – Number of times to execute the operation, typically set to 1. Setting to a value higher than 1 is equivelent to increasing the mask size by (mask_width - 1, mask_height -1) for every additional iteration.

  • mask_size (int or list of int or TensorList of int, optional, default = [3, 3]) –

    Size of the structuring element.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HW', 'HWC', 'FHWC', 'CHW', 'FCHW')) – Input data. Must be images in HWC or CHW layout, or a sequence of those.

class nvidia.dali.ops.experimental.Filter(*, device='cpu', **kwargs)#

Convolves the image with the provided filter.

Note

In fact, the operator computes a correlation, not a convolution, i.e. the order of filter elements is not flipped when computing the product of the filter and the image.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • anchor (int or list of int or TensorList of int, optional, default = [-1]) –

    Specifies the position of the filter over the input.

    If the filter size is (r, s) and the anchor is (a, b), the output at position (x, y) is a product of the filter and the input rectangle spanned between the corners: top-left (x - a, y - b) and bottom-right (x - a + r - 1, x - b + s - 1).

    If the -1 (the default) is specifed, the middle (rounded down to integer) of the filter extents is used, which, for odd sized filters, results in the filter centered over the input.

    The anchor must be, depending on the input dimensionality, a 2D or 3D point whose each extent lies within filter boundaries ([0, ..., filter_extent - 1]). The ordering of anchor’s extents corresponds to the order of filter’s extents.

    The parameter is ignored in "valid" mode. .

    Supports per-frame inputs.

  • border (str, optional, default = ‘reflect_101’) –

    Controls how to handle out-of-bound filter positions over the sample.

    Supported values are: "reflect_101", "reflect_1001", "wrap", "clamp", "constant".

    • "reflect_101" (default), reflects the input but does not repeat the outermost values (dcb|abcdefghi|hgf).

    • "reflect_1001": reflects the input including outermost values (cba|abcdefghi|ihg)

    • "wrap": wraps the input (ghi|abcdefghi|abc).

    • "clamp": the input is padded with outermost values (aaa|abcdefghi|iii).

    • "constant": the input is padded with the user-provided scalar (zeros by default). within the sample.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type. The output type can either be float or must be same as input type. If not set, the input type is used.

    Note

    The intermediate type used for actual computation is float32. If the output is of integral type, the values will be clamped to the output type range.

  • mode (str, optional, default = ‘same’) –

    Supported values are: "same" and "valid".

    • "same" (default): The input and output sizes are the same and border is used to handle out-of-bound filter positions.

    • "valid": the output sample is cropped (by filter_extent - 1) so that all filter positions lie fully within the input sample.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data, filter, fill_value=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) –

    Batch of input samples.

    Sample can be an image, a video or volumetric (3D) data. Samples can contain channels: channel-first and channel-last layouts are supported. In case of video/sequences, the frame extent must preced the channels extent, i.e., for example, a video with "FCHW" layout is supported, but "CFHW" samples are not.

    Samples with the following types are supported: int8, int16, uint8, uint16, float16, float32.

    Please note that the intermediate type used for the computation is always float32.

    Note

    The CPU variant does not support volumetric (3D) data, nor inputs of types: int8 and float16.

  • __filter (TensorList) –

    Batch of filters.

    For inputs with two spatial dimensions (images or video), each filter must be a 2D array (or a sequence of 2D arrays to be applied per-frame to a video input). For volumetric inputs, the filter must be a 3D array. The filter values must have float32 type.

  • __fill_value (TensorList, optional) –

    Batch of scalars used for padding.

    If "border" is set to "constant", the input samples will be padded with the corresponding scalars when convolved with the filter. The scalars must be of the same type as the input samples. For video/sequence input, an array of scalars can be specified to be applied per-frame.

class nvidia.dali.ops.experimental.Inflate(*, device='cpu', **kwargs)#

Inflates/decompresses the input using specified decompression algorithm.

The input must be a 1D tensor of bytes (uint8). Passing the shape and dtype of the decompressed samples is required.

Each input sample can either be a single compressed chunk or consist of multiple compressed chunks that have the same shape and type when inflated, so that they can be be merged into a single tensor where the outermost extent of the tensor corresponds to the number of the chunks.

If the sample is comprised of multiple chunks, the chunk_offsets or chunk_sizes must be specified. In that case, the shape must describe the shape of a single inflated (output) chunk. The number of the chunks will automatically be added as the outermost extent to the output tensors.

For example, the following snippet presents decompression of a video-like sequences. Each video sequence was deflated by, first, compressing each frame separately and then concatenating compressed frames from the corresponding sequences.:

@pipeline_def
def inflate_sequence_pipeline():
  compres_seq, uncompres_hwc_shape, compres_chunk_sizes = fn.external_source(...)
  sequences = fn.experimental.inflate(
      compres_seq.gpu(),
      chunk_sizes=compres_chunk_sizes,  # refers to sizes in ``compres_seq``
      shape=uncompres_hwc_shape,
      layout="HWC",
      sequence_axis_name="F")
  return sequences
Supported backends
  • ‘gpu’

Keyword Arguments:
  • shape (int or list of int or TensorList of int) – The shape of the output (inflated) chunk.

  • algorithm (str, optional, default = ‘LZ4’) –

    Algorithm to be used to decode the data.

    Currently only LZ4 is supported.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • chunk_offsets (int or list of int or TensorList of int, optional) –

    A list of offsets within the input sample describing where the consecutive chunks begin.

    If the chunk_sizes is not specified, it is assumed that the chunks are densely packed in the input tensor and the last chunk ends with the sample’s end.

  • chunk_sizes (int or list of int or TensorList of int, optional) –

    A list of sizes of corresponding input chunks.

    If the chunk_offsets is not specified, it is assumed that the chunks are densely packed in the input tensor and the first chunk starts at the beginning of the sample.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) – The output (inflated) data type.

  • layout (layout str, optional, default = ‘’) –

    Layout of the output (inflated) chunk.

    If the samples consist of multiple chunks, additionally, the sequence_axis_name extent will be added to the beginning of the specified layout.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • sequence_axis_name (layout str, optional, default = ‘F’) –

    The name for the sequence axis.

    If the samples consist of multiple chunks, an extra outer dimension will be added to the output tensor. By default, it is assumed to be video frames, hence the default label ‘F’

    The value is ignored if the layout is not specified or the input is not a sequence ( neither chunk_offsets nor chunk_sizes is specified).

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.MedianBlur(*, device='cpu', **kwargs)#

Median blur performs smoothing of an image or sequence of images by replacing each pixel with the median color of a surrounding rectangular region.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • window_size (int or list of int or TensorList of int, optional, default = [3, 3]) – The size of the window over which the smoothing is performed

__call__(input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HW', 'HWC', 'FHWC', 'CHW', 'FCHW')) – Input data. Must be images in HWC or CHW layout, or a sequence of those.

class nvidia.dali.ops.experimental.PeekImageShape(*, device='cpu', **kwargs)#

Obtains the shape of the encoded image.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • adjust_orientation (bool, optional, default = True) – Use the EXIF orientation metadata when calculating the shape.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type, to which the sizes are converted.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – Color format of the image.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.Remap(*, device='cpu', **kwargs)#

The remap operation applies a generic geometrical transformation to an image. In other words, it takes pixels from one place in the input image and puts them in another place in the output image. The transformation is described by mapx and mapy parameters, where:

output(x,y) = input(mapx(x,y),mapy(x,y))

The type of the output tensor will match the type of the input tensor.

Handles only HWC layout.

Currently picking border policy is not supported. The DALIBorderType will always be CONSTANT with the value 0.

This operator allows sequence inputs.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • interp (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Interpolation type.

  • pixel_origin (str, optional, default = ‘corner’) –

    Pixel origin. Possible values: "corner", "center".

    Defines which part of the pixel (upper-left corner or center) is interpreted as its origin. This value impacts the interpolation result. To match OpenCV, please pick "center".

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(input, mapx, mapy, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __input (TensorList ('HWC', 'FHWC')) – Input data. Must be a 1- or 3-channel HWC image.

  • __mapx (TensorList of float ('HWC', 'HW', 'FHWC', 'FHW', 'F***', 'F**')) – Defines the remap transformation for x coordinates.

  • __mapy (TensorList of float ('HWC', 'HW', 'FHWC', 'FHW', 'F***', 'F**')) – Defines the remap transformation for y coordinates.

class nvidia.dali.ops.experimental.Resize(*, device='cpu', **kwargs)#

Resize images.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • image_type (nvidia.dali.types.DALIImageType) –

    Warning

    The argument image_type is no longer used and will be removed in a future release.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • save_attrs (bool, optional, default = False) – Save reshape attributes for testing.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList ('HWC', 'FHWC', 'CHW', 'FCHW', 'CFHW', 'DHWC', 'FDHWC', 'CDHW', 'FCDHW', 'CFDHW')) – Input to the operator.

class nvidia.dali.ops.experimental.TensorResize(*, device='cpu', **kwargs)#

Resize tensors.

This operator allows sequence inputs and supports volumetric data.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • alignment (float or list of float or TensorList of float, optional, default = [0.5]) –

    Determines the position of the ROI when using scales (provided or calculated).

    The real output size must be integral and may differ from “ideal” output size calculated as input (or ROI) size multiplied by the scale factor. In that case, the output size is rounded (according to size_rounding policy) and the input ROI needs to be adjusted to maintain the scale factor. This parameter defines which relative point of the ROI should retain its position in the output.

    This point is calculated as center = (1 - alignment) * roi_start + alignment * roi_end. Alignment 0.0 denotes alignment with the start of the ROI, 0.5 with the center of the region, and 1.0 with the end. Note that when ROI is not specified, roi_start=0 and roi_end=input_size is assumed.

    When using 0.5 (default), the resize operation has flip invariant properties (flipping after resizing is mathematically equivalent to resizing after flipping).

    The value of this argument contains as many elements as dimensions provided for sizes/scales. If only one value is provided, it is applied to all dimensions.

  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • axes (int or list of int, optional) –

    Indices of dimensions that sizes, scales, max_size, roi_start, roi_end refer to.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    By default, all dimensions are assumed. The axis_names and axes arguments are mutually exclusive.

  • axis_names (layout str, optional) –

    Names of the axes that sizes, scales, max_size, roi_start, roi_end refer to.

    By default, all dimensions are assumed. The axis_names and axes arguments are mutually exclusive.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Must be same as input type or float. If not set, input type is used.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • scales (float or list of float or TensorList of float, optional) –

    Scale factors.

    The resulting output size is calculated as out_size = size_rounding(scale_factor * original_size). See size_rounding for a list of supported rounding policies.

    When axes is provided, the scale factor values refer to the axes specified. Note: Arguments sizes and scales are mutually exclusive.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size_rounding (str, optional, default = ‘round’) –

    Determines the rounding policy when using scales.

    Possible values are: * | "round" - Rounds the resulting size to the nearest integer value, with halfway cases rounded away from zero. * | "truncate" - Discards the fractional part of the resulting size. * | "ceil" - Rounds up the resulting size to the next integer value.

  • sizes (float or list of float or TensorList of float, optional) –

    Output sizes.

    When axes is provided, the size values refer to the axes specified. Note: Arguments sizes and scales are mutually exclusive.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.WarpPerspective(*, device='cpu', **kwargs)#

Performs a perspective transform on the images.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • border_mode (str, optional, default = ‘constant’) – Border mode to be used when accessing elements outside input image. Supported values are: “constant”, “replicate”, “reflect”, “reflect_101”, “wrap”.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • fill_value (float or list of float, optional, default = []) – Value used to fill areas that are outside the source image when the “constant” border_mode is chosen.

  • interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.

  • inverse_map (bool, optional, default = True) – If set to true (default), the matrix is interpreted as destination to source coordinates mapping. Otherwise it’s interpreted as source to destination coordinates mapping.

  • matrix (float or list of float or TensorList of float, optional, default = []) –

    Perspective transform mapping of destination to source coordinates.

    If inverse_map argument is set to false, the matrix is interpreted as a source to destination coordinates mapping.

    It is equivalent to OpenCV’s warpPerspective operation with the inverse_map argument being analog to the WARP_INVERSE_MAP flag.

    Note

    Instead of this argument, the operator can take a second positional input, in which case the matrix can be placed on the GPU.

    Supports per-frame inputs.

  • pixel_origin (str, optional, default = ‘corner’) –

    Pixel origin. Possible values: “corner”, “center”.

    Determines the meaning of (0, 0) coordinates - “corner” places the origin at the top-left corner of the top-left pixel (like in OpenGL); “center” places (0, 0) in the center of the top-left pixel (like in OpenCV).)

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • size (float or list of float or TensorList of float, optional, default = []) –

    Output size, in pixels/points.

    The channel dimension should be excluded (for example, for RGB images, specify (480,640), not (480,640,3).

__call__(input, matrix_gpu=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __input (TensorList of uint8, uint16, int16 or float ('HW', 'HWC', 'FHWC', 'CHW', 'FCHW')) – Input data. Must be images in HWC or CHW layout, or a sequence of those.

  • __matrix_gpu (1D TensorList of float, optional) – Transformation matrix data. Should be used to pass the GPU data. For CPU data, the matrix argument should be used.

nvidia.dali.ops.experimental.decoders#

class nvidia.dali.ops.experimental.decoders.Image(*, device='cpu', **kwargs)#

Decodes images.

Supported formats: JPEG, JPEG 2000, TIFF, PNG, BMP, PNM, PPM, PGM, PBM, WebP.

The output of the decoder is in HWC layout.

The implementation uses NVIDIA nvImageCodec to decode images. You need to install it separately. See https://developer.nvidia.com/nvimgcodec-downloads or simply do pip install nvidia-nvimgcodec-cu${CUDA_MAJOR_VERSION} where CUDA_MAJOR_VERSION is your CUDA major version (e.g. 12).

Note

GPU accelerated decoding is only available for a subset of the image formats (JPEG, and JPEG2000). For other formats, a CPU based decoder is used. For JPEG, a dedicated HW decoder will be used when available.

Note

WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • adjust_orientation (bool, optional, default = True) – Use EXIF orientation metadata to rectify the images

  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cache_batch_copy (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, multiple images from the cache are copied with a batched copy kernel call. Otherwise, unless the order in the batch is the same as in the cache, each image is copied with cudaMemcpy.

  • cache_debug (bool, optional, default = False) –

    Applies only to the mixed backend type.

    Prints the debug information about the decoder cache.

  • cache_size (int, optional, default = 0) –

    Applies only to the mixed backend type.

    Total size of the decoder cache in megabytes. When provided, the decoded images that are larger than cache_threshold will be cached in GPU memory.

  • cache_threshold (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The size threshold, in bytes, for decoded images to be cached. When an image is cached, it no longer needs to be decoded when it is encountered at the operator input saving processing time.

  • cache_type (str, optional, default = ‘’) –

    Applies only to the mixed backend type.

    Here is a list of the available cache types:

    • threshold: caches every image with a size that is larger than cache_threshold until
      the cache is full.

      The warm-up time for threshold policy is 1 epoch.

    • largest: stores the largest images that can fit in the cache.
      The warm-up time for largest policy is 2 epochs

      Note

      To take advantage of caching, it is recommended to configure readers with stick_to_shard=True to limit the amount of unique images seen by each decoder instance in a multi node environment.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type of the image.

    Values will be converted to the dynamic range of the requested type.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • hw_decoder_load (float, optional, default = 0.9) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats

output_typenvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB

The color space of the output image.

Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

preallocate_height_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preallocate_width_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preservebool, optional, default = False

Prevents the operator from being removed from the graph even if its outputs are not used.

seedint, optional, default = -1

Random seed.

If not provided, it will be populated based on the global seed of the pipeline.

split_stages : bool, optional, default = False

Warning

The argument split_stages is now deprecated and its usage is discouraged.

use_chunk_allocator : bool, optional, default = False

Warning

The argument use_chunk_allocator is now deprecated and its usage is discouraged.

use_fast_idctbool, optional, default = False

Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.decoders.ImageCrop(*, device='cpu', **kwargs)#

Decodes images and extracts regions-of-interest (ROI) that are specified by fixed window dimensions and variable anchors.

Supported formats: JPEG, JPEG 2000, TIFF, PNG, BMP, PNM, PPM, PGM, PBM, WebP.

The output of the decoder is in HWC layout.

The implementation uses NVIDIA nvImageCodec to decode images. You need to install it separately. See https://developer.nvidia.com/nvimgcodec-downloads or simply do pip install nvidia-nvimgcodec-cu${CUDA_MAJOR_VERSION} where CUDA_MAJOR_VERSION is your CUDA major version (e.g. 12).

When possible, the operator uses the ROI decoding, reducing the decoding time and memory consumption.

Note

GPU accelerated decoding is only available for a subset of the image formats (JPEG, and JPEG2000). For other formats, a CPU based decoder is used. For JPEG, a dedicated HW decoder will be used when available.

Note

WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • adjust_orientation (bool, optional, default = True) – Use EXIF orientation metadata to rectify the images

  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop (float or list of float or TensorList of float, optional) –

    Shape of the cropped image, specified as a list of values (for example, (crop_H, crop_W) for the 2D crop and (crop_D, crop_H, crop_W) for the volumetric crop).

    Providing crop argument is incompatible with providing separate arguments such as crop_d, crop_h, and crop_w.

  • crop_d (float or TensorList of float, optional, default = 0.0) –

    Applies only to volumetric inputs; cropping window depth (in voxels).

    crop_w, crop_h, and crop_d must be specified together. Providing values for crop_w, crop_h, and crop_d is incompatible with providing the fixed crop window dimensions (argument crop).

  • crop_h (float or TensorList of float, optional, default = 0.0) –

    Cropping the window height (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • crop_pos_x (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).

    The actual position is calculated as crop_x = crop_x_norm * (W - crop_W), where crop_x_norm is the normalized position, W is the width of the image, and crop_W is the width of the cropping window.

    See rounding argument for more details on how crop_x is converted to an integral value.

  • crop_pos_y (float or TensorList of float, optional, default = 0.5) –

    Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).

    The actual position is calculated as crop_y = crop_y_norm * (H - crop_H), where crop_y_norm is the normalized position, H is the height of the image, and crop_H is the height of the cropping window.

    See rounding argument for more details on how crop_y is converted to an integral value.

  • crop_pos_z (float or TensorList of float, optional, default = 0.5) –

    Applies only to volumetric inputs.

    Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as crop_z = crop_z_norm * (D - crop_D), where crop_z_norm is the normalized position, D is the depth of the image and crop_D is the depth of the cropping window.

    See rounding argument for more details on how crop_z is converted to an integral value.

  • crop_w (float or TensorList of float, optional, default = 0.0) –

    Cropping window width (in pixels).

    Providing values for crop_w and crop_h is incompatible with providing fixed crop window dimensions (argument crop).

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type of the image.

    Values will be converted to the dynamic range of the requested type.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • hw_decoder_load (float, optional, default = 0.9) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats

output_typenvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB

The color space of the output image.

Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

preallocate_height_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preallocate_width_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preservebool, optional, default = False

Prevents the operator from being removed from the graph even if its outputs are not used.

roundingstr, optional, default = ‘round’

Determines the rounding function used to convert the starting coordinate of the window to an integral value (see crop_pos_x, crop_pos_y, crop_pos_z).

Possible values are:

  • "round" - Rounds to the nearest integer value, with halfway cases rounded away from zero.
  • "truncate" - Discards the fractional part of the number (truncates towards zero).
seedint, optional, default = -1

Random seed.

If not provided, it will be populated based on the global seed of the pipeline.

split_stages : bool, optional, default = False

Warning

The argument split_stages is now deprecated and its usage is discouraged.

use_chunk_allocator : bool, optional, default = False

Warning

The argument use_chunk_allocator is now deprecated and its usage is discouraged.

use_fast_idctbool, optional, default = False

Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.decoders.ImageRandomCrop(*, device='cpu', **kwargs)#

Decodes images and randomly crops them.

Supported formats: JPEG, JPEG 2000, TIFF, PNG, BMP, PNM, PPM, PGM, PBM, WebP.

The output of the decoder is in HWC layout.

The implementation uses NVIDIA nvImageCodec to decode images. You need to install it separately. See https://developer.nvidia.com/nvimgcodec-downloads or simply do pip install nvidia-nvimgcodec-cu${CUDA_MAJOR_VERSION} where CUDA_MAJOR_VERSION is your CUDA major version (e.g. 12).

The cropping window’s area (relative to the entire image) and aspect ratio can be restricted to a range of values specified by area and aspect_ratio arguments, respectively.

When possible, the operator uses the ROI decoding, reducing the decoding time and memory consumption.

Note

GPU accelerated decoding is only available for a subset of the image formats (JPEG, and JPEG2000). For other formats, a CPU based decoder is used. For JPEG, a dedicated HW decoder will be used when available.

Note

WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • adjust_orientation (bool, optional, default = True) – Use EXIF orientation metadata to rectify the images

  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type of the image.

    Values will be converted to the dynamic range of the requested type.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • hw_decoder_load (float, optional, default = 0.9) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats

num_attemptsint, optional, default = 10

Maximum number of attempts used to choose random area and aspect ratio.

output_typenvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB

The color space of the output image.

Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

preallocate_height_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preallocate_width_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preservebool, optional, default = False

Prevents the operator from being removed from the graph even if its outputs are not used.

random_areafloat or list of float, optional, default = [0.08, 1.0]

Range from which to choose random area fraction A.

The cropped image’s area will be equal to A * original image’s area.

random_aspect_ratiofloat or list of float, optional, default = [0.75, 1.333333]

Range from which to choose random aspect ratio (width/height).

seedint, optional, default = -1

Random seed.

If not provided, it will be populated based on the global seed of the pipeline.

split_stages : bool, optional, default = False

Warning

The argument split_stages is now deprecated and its usage is discouraged.

use_chunk_allocator : bool, optional, default = False

Warning

The argument use_chunk_allocator is now deprecated and its usage is discouraged.

use_fast_idctbool, optional, default = False

Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.experimental.decoders.ImageSlice(*, device='cpu', **kwargs)#

Decodes images and extracts regions of interest.

Supported formats: JPEG, JPEG 2000, TIFF, PNG, BMP, PNM, PPM, PGM, PBM, WebP.

The output of the decoder is in HWC layout.

The implementation uses NVIDIA nvImageCodec to decode images. You need to install it separately. See https://developer.nvidia.com/nvimgcodec-downloads or simply do pip install nvidia-nvimgcodec-cu${CUDA_MAJOR_VERSION} where CUDA_MAJOR_VERSION is your CUDA major version (e.g. 12).

The slice can be specified by proving the start and end coordinates, or start coordinates and shape of the slice. Both coordinates and shapes can be provided in absolute or relative terms.

The slice arguments can be specified by the following named arguments:

  1. start: Slice start coordinates (absolute)

  2. rel_start: Slice start coordinates (relative)

  3. end: Slice end coordinates (absolute)

  4. rel_end: Slice end coordinates (relative)

  5. shape: Slice shape (absolute)

  6. rel_shape: Slice shape (relative)

The slice can be configured by providing start and end coordinates or start and shape. Relative and absolute arguments can be mixed (for example, rel_start can be used with shape) as long as start and shape or end are uniquely defined.

Alternatively, two extra positional inputs can be provided, specifying anchor and shape. When using positional inputs, two extra boolean arguments normalized_anchor/normalized_shape can be used to specify the nature of the arguments provided. Using positional inputs for anchor and shape is incompatible with the named arguments specified above.

The slice arguments should provide as many dimensions as specified by the axis_names or axes arguments.

By default, the nvidia.dali.fn.decoders.image_slice() operator uses normalized coordinates and “WH” order for the slice arguments.

When possible, the operator uses the ROI decoding, reducing the decoding time and memory consumption.

Note

GPU accelerated decoding is only available for a subset of the image formats (JPEG, and JPEG2000). For other formats, a CPU based decoder is used. For JPEG, a dedicated HW decoder will be used when available.

Note

WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • adjust_orientation (bool, optional, default = True) – Use EXIF orientation metadata to rectify the images

  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • axes (int or list of int or TensorList of int, optional, default = [1, 0]) –

    Order of dimensions used for the anchor and shape slice inputs as dimension indices.

    Negative values are interpreted as counting dimensions from the back. Valid range: [-ndim, ndim-1], where ndim is the number of dimensions in the input data.

  • axis_names (layout str, optional, default = ‘WH’) –

    Order of the dimensions used for the anchor and shape slice inputs, as described in layout.

    If a value is provided, axis_names will have a higher priority than axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • device_memory_padding (int, optional, default = 16777216) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • device_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type of the image.

    Values will be converted to the dynamic range of the requested type.

  • end (int or list of int or TensorList of int, optional) –

    End coordinates of the slice.

    Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

  • host_memory_padding (int, optional, default = 8388608) –

    Applies only to the mixed backend type.

    The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution.

  • host_memory_padding_jpeg2k (int, optional, default = 0) –

    Applies only to the mixed backend type.

    The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.

    If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.

  • hw_decoder_load (float, optional, default = 0.9) –

    The percentage of the image data to be processed by the HW JPEG decoder.

    Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

    Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100

  • hybrid_huffman_threshold (int, optional, default = 1000000) –

    Applies only to the mixed backend type.

    Images with a total number of pixels (height * width) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.

    Note

    Hybrid Huffman decoder still largely uses the CPU.

  • jpeg_fancy_upsampling (bool, optional, default = False) –

    Make the mixed backend use the same chroma upsampling approach as the cpu one.

    The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.

  • memory_stats

normalized_anchorbool, optional, default = True

Determines whether the anchor positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

Note

This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

normalized_shapebool, optional, default = True

Determines whether the shape positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.

Note

This argument is only relevant when anchor data type is float. For integer types, the coordinates are always absolute.

output_typenvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB

The color space of the output image.

Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.

preallocate_height_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preallocate_width_hintint, optional, default = 0

Image width hint.

Applies only to the mixed backend type in NVIDIA Ampere GPU architecture.

The hint is used to preallocate memory for the HW JPEG decoder.

preservebool, optional, default = False

Prevents the operator from being removed from the graph even if its outputs are not used.

rel_endfloat or list of float or TensorList of float, optional

End relative coordinates of the slice (range [0.0 - 1.0].

Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

rel_shapefloat or list of float or TensorList of float, optional

Relative shape of the slice (range [0.0 - 1.0]).

Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

rel_startfloat or list of float or TensorList of float, optional

Start relative coordinates of the slice (range [0.0 - 1.0]).

Note: Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

seedint, optional, default = -1

Random seed.

If not provided, it will be populated based on the global seed of the pipeline.

shapeint or list of int or TensorList of int, optional

Shape of the slice.

Providing named arguments start, end, shape, rel_start, rel_end, rel_shape is incompatible with providing positional inputs anchor and shape.

split_stages : bool, optional, default = False

Warning

The argument split_stages is now deprecated and its usage is discouraged.

startint or list of int or TensorList of int, optional

Start coordinates of the slice.

Note: Providing named arguments start/end or start/shape is incompatible with providing positional inputs anchor and shape.

use_chunk_allocator : bool, optional, default = False

Warning

The argument use_chunk_allocator is now deprecated and its usage is discouraged.

use_fast_idctbool, optional, default = False

Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when device is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.

According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.

__call__(data, anchor=None, shape=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Batch that contains the input data.

  • __anchor (1D TensorList of float or int, optional) –

    Input that contains normalized or absolute coordinates for the starting point of the slice (x0, x1, x2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_anchor.

  • __shape (1D TensorList of float or int, optional) –

    Input that contains normalized or absolute coordinates for the dimensions of the slice (s0, s1, s2, …).

    Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of normalized_shape.

class nvidia.dali.ops.experimental.decoders.Video(*, device='cpu', **kwargs)#

Decodes a video file from a memory buffer (e.g. provided by external source).

The video streams can be in most of the container file formats. FFmpeg is used to parse video

containers and returns a batch of sequences of frames with shape (F, H, W, C) where F is the number of frames in a sequence and can differ for each sample.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core.

    Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(buffer, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__buffer (TensorList) – Data buffer with a loaded video file.

nvidia.dali.ops.experimental.inputs#

class nvidia.dali.ops.experimental.inputs.Video(*, device='cpu', **kwargs)#

Streams and decodes a video from a memory buffer. To be used with long and high resolution videos.

Returns a batch of sequences of frames, with the layout: (F, H, W, C), where:

  • F - number of frames in a sequence,

  • H - height of the frame,

  • W - width of the frame,

  • C - number of channels in the frame.

When using fn.inputs.video operator inside the DALI Pipeline, the user needs to provide the data using Pipeline.feed_input(). When the Operator is fed with data, the Pipeline can be run multiple times and the fn.inputs.video operator will return consecutive sequences, as long as there is enough data to decode. When the source of the frames (the video file) depletes, user needs to call another feed_input again to provide the next video file to the operator. This Operator has an inner-queue for the data, so the feed_input may be called multiple times and when given video file ends, the Operator will fetch the next one automatically from the top of the queue. Running the pipeline while there is no data for the fn.inputs.video to run results in an error.

This operator takes only one video as and input (i.e. input_batch_size=1) and will return batches of sequences. Every output batch will have the max_batch_size samples, set during the Pipeline creation. When the number of frames in the video file does not allow to split the frames uniformly across batches, the last batch returned by this operator for a given video will be partial and the last sequence in this batch will be determined using last_sequence_policy parameter. For example:

This is a video that consists of 67 frames (every '-' is a frame):
-------------------------------------------------------------------


User decided that there shall be 5 frames per sequence and
the last_sequence_policy='partial':
-------------------------------------------------------------------
[   ][   ][   ][   ][   ][   ][   ][   ][   ][   ][   ][   ][   ][]
-------------------------------------------------------------------
Since there are not enough frames, the last sequence comprises 2 frames.


The Pipeline has max_batch_size=3, therefore the operator will return
5 batches of sequences.
First 4 batches comprise 3 sequences and the last batch is partial and
comprises 2 sequences.
---------------   ---------------   ---------------   ---------------   -------
[   ][   ][   ]   [   ][   ][   ]   [   ][   ][   ]   [   ][   ][   ]   [   ][]
---------------   ---------------   ---------------   ---------------   -------


With the last_sequence_policy='pad', the last sequence of the last batch
will be padded with 0:
---------------   ---------------   ---------------   ---------------   -------000
[   ][   ][   ]   [   ][   ][   ]   [   ][   ][   ]   [   ][   ][   ]   [   ][   ]
---------------   ---------------   ---------------   ---------------   -------000

The difference between fn.inputs.video and fn.readers.video is that the former reads an encoded video from memory and the latter reads the encoded video from disk.

The difference between fn.inputs.video and fn.decoders.video is that the former does not decode the whole video file in one go. This behaviour is needed for longer videos. E.g. 5-min, 4k, 30fps decoded video takes about 1.7 TB of memory.

This operator accepts most of the video containers and file formats. FFmpeg is used to parse the video container. In the situations, that the container does not contain required metadata (e.g. frames sizes, number of frames, etc…), the operator needs to find it out itself, which may result in a slowdown.

Supported backends
  • ‘cpu’

  • ‘mixed’

Keyword Arguments:
  • sequence_length (int) – Number of frames in each sequence.

  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type. If set to True, each thread in the internal thread pool will be tied to a specific CPU core.

    Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • blocking (bool, optional, default = False) – Advanced If True, this operator will block until the data is available (e.g. by calling feed_input). If False, the operator will raise an error, if the data is not available.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • last_sequence_policy (str, optional, default = ‘partial’) –

    Specifies, how to handle the last sequence in the video file.

    For a given number of frames in the video file and frames_per_sequence parameter, it might happen that the video can’t be split uniformly across sequences. If the last_sequence_policy='partial', the last sequence might have fewer frames than frames_per_sequence value specified. If the last_sequence_policy='partial', the last sequence will always have frames_per_sequence frames and will be padded with empty frames.

    Allowed values are 'partial' and 'pad'.

  • no_copy (bool, optional, default = False) –

    Determines whether DALI should copy the buffer when feed_input is called.

    If set to True, DALI passes the user’s memory directly to the pipeline, instead of copying it. It is the user’s responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.

    The buffer can be modified or freed again after the outputs of the relevant iterations have been consumed. Effectively, it happens after prefetch_queue_depth or cpu_queue_depth * gpu_queue_depth (when they are not equal) iterations following the feed_input call.

    The memory location must match the specified device parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

nvidia.dali.ops.experimental.readers#

class nvidia.dali.ops.experimental.readers.Fits(*, device='cpu', **kwargs)#

Reads Fits image HDUs from a directory.

This operator can be used in the following modes:

  1. Read all files from a directory indicated by file_root that match given file_filter.

  2. Read file names from a text file indicated in file_list argument.

  3. Read files listed in files argument.

4. Number of outputs per sample corresponds to the length of hdu_indices argument. By default, first HDU with data is read from each file, so the number of outputs defaults to 1.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • dtypes (DALIDataType or list of DALIDataType, optional) –

    Data types of the respective outputs.

    If specified, it must be a list of types of respective outputs. By default, all outputs are assumed to be UINT8.”

  • file_filter (str, optional, default = ‘*.fits’) –

    If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the file_root.

    This argument is ignored when file paths are taken from file_list or files.

  • file_list (str, optional) –

    Path to a text file that contains filenames (one per line). The filenames are relative to the location of the text file or to file_root, if specified.

    This argument is mutually exclusive with files.

  • file_root (str, optional) –

    Path to a directory that contains the data files.

    If not using file_list or files, this directory is traversed to discover the files. file_root is required in this mode of operation.

  • files (str or list of str, optional) –

    A list of file paths to read the data from.

    If file_root is provided, the paths are treated as being relative to it.

    This argument is mutually exclusive with file_list.

  • hdu_indices (int or list of int, optional, default = [2]) – HDU indices to read. If not provided, the first HDU after the primary will be yielded. Since HDUs are indexed starting from 1, the default value is as follows: hdu_indices = [2]. Size of the provided list hdu_indices defines number of outputs per sample.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) –

    If set to True, the reader shuffles the entire dataset after each epoch.

    stick_to_shard and random_shuffle cannot be used when this argument is set to True.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.experimental.readers.Video(*, device='cpu', **kwargs)#

Loads and decodes video files using FFmpeg.

The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences of sequence_length frames with shape (N, F, H, W, C), where N is the batch size, and F is the number of frames).

Note

Containers which do not support indexing, like MPEG, require DALI to build the index.

DALI will go through the video and mark keyframes to be able to seek effectively, even in the variable frame rate scenario.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • sequence_length (int) – Frames to load per sequence.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • enable_frame_num (bool, optional, default = False) – If set, returns the index of the first frame in the decoded sequence as an additional output.

  • filenames (str or list of str, optional, default = []) – Absolute paths to the video files to load.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • labels (int or list of int, optional) – Labels associated with the files listed in filenames argument. If not provided, no labels will be yielded.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • step (int, optional, default = -1) –

    Frame interval between each sequence.

    When the value is less than 0, step is set to sequence_length.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

nvidia.dali.ops.io#

Operators in this module are data-reading operators that read data from a source specified at runtime by operator inputs. For inputless data readers that are able to build the dataset at pipeline constructions, see nvidia.dali.fn.readers module.

nvidia.dali.ops.io.file#

class nvidia.dali.ops.io.file.Read(*, device='cpu', **kwargs)#

Reads raw file contents from an encoded filename represented by a 1D byte array.

Note

To produce a compatible encoded filepath from Python (e.g. in an external_source node generator), use np.frombuffer(filepath_str.encode(“utf-8”), dtype=types.UINT8).

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, it will use plain file I/O instead of trying to map the file into memory.

    Mapping provides a small performance benefit when accessing a local file system, but for most network file systems, it does not provide a benefit

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • use_o_direct (bool, optional, default = False) –

    If set to True, the data will be read directly from the storage bypassing system cache.

    Mutually exclusive with dont_use_mmap=False.

__call__(filepaths, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__filepaths (TensorList) – File paths to read from.

nvidia.dali.ops.noise#

class nvidia.dali.ops.noise.Gaussian(*, device='cpu', **kwargs)#

Applies gaussian noise to the input.

The shape and data type of the output will match the input.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • mean (float or TensorList of float, optional, default = 0.0) – Mean of the distribution.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • stddev (float or TensorList of float, optional, default = 1.0) – Standard deviation of the distribution.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.noise.SaltAndPepper(*, device='cpu', **kwargs)#

Applies salt-and-pepper noise to the input.

The shape and data type of the output will match the input.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • pepper_val (float or TensorList of float, optional) –

    Value of “pepper”.

    If not provided, the pepper value will be -1.0 for floating point types or the minimum value of the input data type otherwise, converted to the data type of the input.

  • per_channel (bool, optional, default = False) –

    Determines whether the noise should be generated for each channel independently.

    If set to True, the noise is generated for each channel independently, resulting in some channels being corrupted and others kept intact. If set to False, the noise is generated once and applied to all channels, so that all channels in a pixel should either be kept intact, take the “pepper” value, or the “salt” value.

    Note: Per-channel noise generation requires the input layout to contain a channels (‘C’) dimension, or be empty. In the case of the layout being empty, channel-last layout is assumed.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • prob (float or TensorList of float, optional, default = 0.05) – Probability of an output value to take a salt or pepper value.

  • salt_val (float or TensorList of float, optional) –

    Value of “salt”.

    If not provided, the salt value will be 1.0 for floating point types or the maximum value of the input data type otherwise, converted to the data type of the input.

  • salt_vs_pepper (float or TensorList of float, optional, default = 0.5) – Probability of a corrupted output value to take a salt value.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.noise.Shot(*, device='cpu', **kwargs)#

Applies shot noise to the input.

The shot noise is generated by applying the following formula:

output[:] = poisson_dist(max(0, input[:] / factor)) * factor) if factor != 0
output[:] = input[:]                                          if factor == 0

where poisson_dist represents a poisson distribution.

Shot noise is a noise that’s present in data generated by a Poisson process, like registering photons by an image sensor. This operator simulates the data acquisition process where each event increases the output value by factor and the input tensor contains the expected values of corresponding output points. For example, a factor of 0.1 means that 10 events are needed to increase the output value by 1, while a factor of 10 means that a single event increases the output by 10. The output values are quantized to multiples of factor. The larger the factor, the more noise is present in the output. A factor of 0 makes this an identity operation.

The shape and data type of the output will match the input.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • factor (float or TensorList of float, optional, default = 20.0) – Factor parameter.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

nvidia.dali.ops.plugin#

nvidia.dali.ops.plugin.video#

Note

This module belongs to the nvidia-dali-video plugin, that needs to be installed as a separate package. Refer to the Installation Guide for more details.

class nvidia.dali.ops.plugin.video.Decoder(*, device='cpu', **kwargs)#

Decodes a video file from a memory buffer (e.g. provided by external source).

The video streams can be in most of the container file formats. FFmpeg is used to parse video

containers and returns a batch of sequences of frames with shape (F, H, W, C) where F is the number of frames in a sequence and can differ for each sample.

Supported backends
  • ‘mixed’

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Applies only to the mixed backend type.

    If set to True, each thread in the internal thread pool will be tied to a specific CPU core.

    Otherwise, the threads can be reassigned to any CPU core by the operating system.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • end_frame (int, optional, default = 0) – Index of the end frame to be decoded.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(buffer, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__buffer (TensorList) – Data buffer with a loaded video file.

nvidia.dali.ops.random#

class nvidia.dali.ops.random.Beta(*, device='cpu', **kwargs)#

Generates a random number from [0, 1] range following the beta distribution.

The beta distribution has the following probabilty distribution function:

\[f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1} (1-x)^{\beta-1}\]

where Г is the gamma function defined as:

\[\Gamma(\alpha) = \int_0^\infty x^{\alpha-1} e^{-x} \, dx\]

The operator supports float32 and float64 output types.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • alpha (float or TensorList of float, optional, default = 1.0) – The alpha parameter, a positive float32 scalar.

  • beta (float or TensorList of float, optional, default = 1.0) – The beta parameter, a positive float32 scalar.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.random.Choice(*, device='cpu', **kwargs)#

Generates a random sample from a given 1D array.

The probability of selecting a sample from the input is determined by the corresponding probability specified in p argument.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

The type of the output matches the type of the input. For scalar inputs, only integral types are supported, otherwise any type can be used. The operator supports selection from an input containing elements of one of DALI enum types, that is: nvidia.dali.types.DALIDataType(), nvidia.dali.types.DALIImageType(), or nvidia.dali.types.DALIInterpType().

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • p (float or list of float or TensorList of float, optional) – Distribution of the probabilities. If not specified, uniform distribution is assumed.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(a, shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __a (scalar or TensorList) – If a scalar value __a is provided, the operator behaves as if [0, 1, ..., __a-1] list was passed as input. Otherwise __a is treated as 1D array of input samples.

  • __shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.random.CoinFlip(*, device='cpu', **kwargs)#

Generates random boolean values following a bernoulli distribution.

The probability of generating a value 1 (true) is determined by the probability argument.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • probability (float or TensorList of float, optional, default = 0.5) – Probability of value 1.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.random.Normal(*, device='cpu', **kwargs)#

Generates random numbers following a normal distribution.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • mean (float or TensorList of float, optional, default = 0.0) – Mean of the distribution.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

  • stddev (float or TensorList of float, optional, default = 1.0) – Standard deviation of the distribution.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

class nvidia.dali.ops.random.Uniform(*, device='cpu', **kwargs)#

Generates random numbers following a uniform distribution.

It can be configured to produce a continuous uniform distribution in the range [min, max), or a discrete uniform distribution where any of the specified values [v0, v1, …, vn] occur with equal probability.

The shape of the generated data can be either specified explicitly with a shape argument, or chosen to match the shape of the __shape_like input, if provided. If none are present, a single value per sample is generated.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) –

    Output data type.

    Note

    The generated numbers are converted to the output data type, rounding and clamping if necessary.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • range (float or list of float or TensorList of float, optional, default = [-1.0, 1.0]) –

    Range [min, max) of a continuous uniform distribution.

    This argument is mutually exclusive with values.

    Warning

    When specifying an integer type as dtype, the generated numbers can go outside the specified range, due to rounding.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape (int or list of int or TensorList of int, optional) – Shape of the output data.

  • values (float or list of float or TensorList of float, optional) –

    The discrete values [v0, v1, …, vn] produced by a discrete uniform distribution.

    This argument is mutually exclusive with range.

__call__(shape_like=None, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__shape_like (TensorList, optional) – Shape of this input will be used to infer the shape of the output, if provided.

nvidia.dali.ops.readers#

Operators in this module are data-producing operators that read data from storage or a different source, and where the data locations are known at pipeline construction time via arguments. For data readers that are able to read from sources specified dynamically via regular inputs, see nvidia.dali.fn.io module.

class nvidia.dali.ops.readers.COCO(*, device='cpu', **kwargs)#

Reads data from a COCO dataset that is composed of a directory with images and annotation JSON files.

This readers produces the following outputs:

images, bounding_boxes, labels, ((polygons, vertices) | (pixelwise_masks)),
(image_ids)
  • images Each sample contains image data with layout HWC (height, width, channels).

  • bounding_boxes Each sample can have an arbitrary M number of bounding boxes, each described by 4 coordinates:

    [[x_0, y_0, w_0, h_0],
     [x_1, y_1, w_1, h_1]
     ...
     [x_M, y_M, w_M, h_M]]
    

    or in [l, t, r, b] format if requested (see ltrb argument).

  • labels Each bounding box is associated with an integer label representing a category identifier:

    [label_0, label_1, ..., label_M]
    
  • polygons and vertices (Optional, present if polygon_masks is set to True) If polygon_masks is enabled, two extra outputs describing masks by a set of polygons. Each mask contains an arbitrary number of polygons P, each associated with a mask index in the range [0, M) and composed by a group of V vertices. The output polygons describes the polygons as follows:

    [[mask_idx_0, start_vertex_idx_0, end_vertex_idx_0],
     [mask_idx_1, start_vertex_idx_1, end_vertex_idx_1],
     ...
     [mask_idx_P, start_vertex_idx_P, end_vertex_idx_P]]
    

    where mask_idx is the index of the mask the polygon, in the range [0, M), and start_vertex_idx and end_vertex_idx define the range of indices of vertices, as they appear in the output vertices, belonging to this polygon. Each sample in vertices contains a list of vertices that composed the different polygons in the sample, as 2D coordinates:

    [[x_0, y_0],
     [x_1, y_1],
     ...
     [x_V, y_V]]
    
  • pixelwise_masks (Optional, present if argument pixelwise_masks is set to True) Contains image-like data, same shape and layout as images, representing a pixelwise segmentation mask.

  • image_ids (Optional, present if argument image_ids is set to True) One element per sample, representing an image identifier.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • annotations_file (str, optional, default = ‘’) – List of paths to the JSON annotations files.

  • avoid_class_remapping (bool, optional, default = False) –

    If set to True, lasses ID values are returned directly as they are defined in the manifest file.

    Otherwise, classes’ ID values are mapped to consecutive values in range 1-number of classes disregarding exact values from the manifest (0 is reserved for a special background class.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • file_root (str, optional) –

    Path to a directory that contains the data files.

    If a file list is not provided, this argument is required.

  • image_ids (bool, optional, default = False) – If set to True, the image IDs will be produced in an extra output.

  • images (str or list of str, optional) –

    A list of image paths.

    If provided, it specifies the images that will be read. The images will be read in the same order as they appear in the list, and in case of duplicates, multiple copies of the relevant samples will be produced.

    If left unspecified or set to None, all images listed in the annotation file are read exactly once, ordered by their image id.

    The paths to be kept should match exactly those in the annotations file.

    Note: This argument is mutually exclusive with preprocessed_annotations.

  • include_iscrowd (bool, optional, default = True) – If set to True annotations marked as iscrowd=1 are included as well.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • ltrb (bool, optional, default = False) –

    If set to True, bboxes are returned as [left, top, right, bottom].

    If set to False, the bboxes are returned as [x, y, width, height].

  • masks (bool, optional, default = False) –

    Enable polygon masks.

    Warning

    Use polygon_masks instead. Note that the polygon format has changed mask_id, start_coord, end_coord to mask_id, start_vertex, end_vertex where start_coord and end_coord are total number of coordinates, effectly start_coord = 2 * start_vertex and end_coord = 2 * end_vertex. Example: A polygon with vertices [[x0, y0], [x1, y1], [x2, y2]] would be represented as [mask_id, 0, 6] when using the deprecated argument masks, but [mask_id, 0, 3] when using the new argument polygon_masks.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • pixelwise_masks (bool, optional, default = False) – If true, segmentation masks are read and returned as pixel-wise masks. This argument is mutually exclusive with polygon_masks.

  • polygon_masks (bool, optional, default = False) – If set to True, segmentation mask polygons are read in the form of two outputs: polygons and vertices. This argument is mutually exclusive with pixelwise_masks.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preprocessed_annotations (str, optional, default = ‘’) – Path to the directory with meta files that contain preprocessed COCO annotations.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • ratio (bool, optional, default = False) – If set to True, the returned bbox and mask polygon coordinates are relative to the image dimensions.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • save_preprocessed_annotations (bool, optional, default = False) – If set to True, the operator saves a set of files containing binary representations of the preprocessed COCO annotations.

  • save_preprocessed_annotations_dir (str, optional, default = ‘’) – Path to the directory in which to save the preprocessed COCO annotations files.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) – If set to True, the reader shuffles the entire dataset after each epoch.

  • size_threshold (float, optional, default = 0.1) – If the width or the height, in number of pixels, of a bounding box that represents an instance of an object is lower than this value, the object will be ignored.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

  • dump_meta_files (bool) –

    Warning

    The argument dump_meta_files is a deprecated alias for save_preprocessed_annotations. Use save_preprocessed_annotations instead.

  • dump_meta_files_path (str) –

    Warning

    The argument dump_meta_files_path is a deprecated alias for save_preprocessed_annotations_dir. Use save_preprocessed_annotations_dir instead.

  • meta_files_path (str) –

    Warning

    The argument meta_files_path is a deprecated alias for preprocessed_annotations. Use preprocessed_annotations instead.

  • save_img_ids (bool) –

    Warning

    The argument save_img_ids is a deprecated alias for image_ids. Use image_ids instead.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.Caffe(*, device='cpu', **kwargs)#

Reads (Image, label) pairs from a Caffe LMDB.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • path (str or list of str) – List of paths to the Caffe LMDB directories.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • image_available (bool, optional, default = True) – Determines whether an image is available in this LMDB.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • label_available (bool, optional, default = True) – Determines whether a label is available.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.Caffe2(*, device='cpu', **kwargs)#

Reads sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).

Supported backends
  • ‘cpu’

Keyword Arguments:
  • path (str or list of str) – List of paths to the Caffe2 LMDB directories.

  • additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors that are provided for each sample.

  • bbox (bool, optional, default = False) – Denotes whether the bounding-box information is present.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • image_available (bool, optional, default = True) – Determines whether an image is available in this LMDB.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • label_type (int, optional, default = 0) –

    Type of label stored in dataset.

    Here is a list of the available values:

    • 0 = SINGLE_LABEL: which is the integer label for the multi-class classification.

    • 1 = MULTI_LABEL_SPARSE: which is the sparse active label indices for multi-label classification.

    • 2 = MULTI_LABEL_DENSE: which is the dense label embedding vector for label embedding regression.

    • 3 = MULTI_LABEL_WEIGHTED_SPARSE: which is the sparse active label indices with per-label weights for multi-label classification.

    • 4 = NO_LABEL: where no label is available.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_labels (int, optional, default = 1) –

    Number of classes in the dataset.

    Required when sparse labels are used.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.File(*, device='cpu', **kwargs)#

Reads file contents and returns file-label pairs.

This operator can be used in the following modes:

  1. Listing files from a directory, assigning labels based on subdirectory structure.

In this mode, the directory indicated in file_root argument should contain one or more subdirectories. The files in these subdirectories are listed and assigned labels based on lexicographical order of the subdirectory. If you provide file_filters argument with a list of glob strings, the operator will list files matching at least one of the patterns. Otherwise, a default set of filters is used (see the default value of file_filters for details).

For example, this directory structure:

<file_root>/0/image0.jpg
<file_root>/0/world_map.jpg
<file_root>/0/antarctic.png
<file_root>/1/cat.jpeg
<file_root>/1/dog.tif
<file_root>/2/car.jpeg
<file_root>/2/truck.jp2

by default will yield the following outputs:

<contents of 0/image0.jpg>        0
<contents of 0/world_map.jpg>     0
<contents of 0/antarctic.png>     0
<contents of 1/cat.jpeg>          1
<contents of 1/dog.tif>           1
<contents of 2/car.jpeg>          2
<contents of 2/truck.jp2>         2

and with file_filters = ["*.jpg", "*.jpeg"] will yield the following outputs:

<contents of 0/image0.jpg>        0
<contents of 0/world_map.jpg>     0
<contents of 1/cat.jpeg>          1
<contents of 2/car.jpeg>          2
  1. Use file names and labels stored in a text file.

file_list argument points to a file which contains one file name and label per line. Example:

dog.jpg 0
cute kitten.jpg 1
doge.png 0

The file names can contain spaces in the middle, but cannot contain trailing whitespace.

  1. Use file names and labels provided as a list of strings and integers, respectively.

As with other readers, the (file, label) pairs returned by this operator can be randomly shuffled and various sharding strategies can be applied. See documentation of this operator’s arguments for details.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • case_sensitive_filter (bool, optional, default = False) – If set to True, the filter will be matched case-sensitively, otherwise case-insensitively.

  • dir_filters (str or list of str, optional) –

    A list of glob strings to filter the list of sub-directories under file_root.

    This argument is ignored when file paths are taken from file_list or files.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • file_filters (str or list of str, optional, default = [‘*.jpg’, ‘*.jpeg’, ‘*.png’, ‘*.bmp’, ‘*.tif’, ‘*.tiff’, ‘*.pnm’, ‘*.ppm’, ‘*.pgm’, ‘*.pbm’, ‘*.jp2’, ‘*.webp’, ‘*.flac’, ‘*.ogg’, ‘*.wav’]) –

    A list of glob strings to filter the list of files in the sub-directories of the file_root.

    This argument is ignored when file paths are taken from file_list or files.

  • file_list (str, optional) –

    Path to a text file that contains one whitespace-separated filename label pair per line. The filenames are relative to the location of that file or to file_root, if specified.

    This argument is mutually exclusive with files.

  • file_root (str, optional) –

    Path to a directory that contains the data files.

    If not using file_list or files, this directory is traversed to discover the files. file_root is required in this mode of operation.

  • files (str or list of str, optional) –

    A list of file paths to read the data from.

    If file_root is provided, the paths are treated as being relative to it. When using files, the labels are taken from labels argument or, if it was not supplied, contain indices at which given file appeared in the files list.

    This argument is mutually exclusive with file_list.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • labels (int or list of int, optional) –

    Labels accompanying contents of files listed in files argument.

    If not used, sequential 0-based indices are used as labels

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) –

    If set to True, the reader shuffles the entire dataset after each epoch.

    stick_to_shard and random_shuffle cannot be used when this argument is set to True.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.MXNet(*, device='cpu', **kwargs)#

Reads the data from an MXNet RecordIO.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • index_path (str or list of str) –

    List (of length 1) that contains a path to the index (.idx) file.

    The file is generated by the MXNet’s im2rec.py script with the RecordIO file. The list can also be generated by using the rec2idx script that is distributed with DALI.

  • path (str or list of str) – List of paths to the RecordIO files.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.NemoAsr(*, device='cpu', **kwargs)#

Reads automatic speech recognition (ASR) data (audio, text) from an NVIDIA NeMo compatible manifest.

Example manifest file:

{
  "audio_filepath": "path/to/audio1.wav",
  "duration": 3.45,
  "text": "this is a nemo tutorial"
}
{
  "audio_filepath": "path/to/audio1.wav",
  "offset": 3.45,
  "duration": 1.45,
  "text": "same audio file but using offset"
}
{
  "audio_filepath": "path/to/audio2.wav",
  "duration": 3.45,
  "text": "third transcript in this example"
}

Note

Only audio_filepath is field mandatory. If duration is not specified, the whole audio file will be used. A missing text field will produce an empty string as a text.

Warning

Handling of duration and offset fields is not yet implemented. The current implementation always reads the whole audio file.

This reader produces between 1 and 3 outputs:

  • Decoded audio data: float, shape=(audio_length,)

  • (optional, if read_sample_rate=True) Audio sample rate: float, shape=(1,)

  • (optional, if read_text=True) Transcript text as a null terminated string: uint8, shape=(text_len + 1,)

  • (optional, if read_idxs=True) Index of the manifest entry: int64, shape=(1,)

Supported backends
  • ‘cpu’

Keyword Arguments:
  • manifest_filepaths (str or list of str) – List of paths to NeMo’s compatible manifest files.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • downmix (bool, optional, default = True) – If True, downmix all input channels to mono. If downmixing is turned on, decoder will produce always 1-D output

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –

    Output data type.

    Supported types: INT16, INT32, and FLOAT.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • max_duration (float, optional, default = 0.0) –

    If a value greater than 0 is provided, it specifies the maximum allowed duration, in seconds, of the audio samples.

    Samples with a duration longer than this value will be ignored.

  • min_duration (float, optional, default = 0.0) –

    If a value greater than 0 is provided, it specifies the minimum allowed duration,

    in seconds, of the audio samples.

    Samples with a duration shorter than this value will be ignored.

  • normalize_text (bool) –

    Warning

    The argument normalize_text is no longer used and will be removed in a future release.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • quality (float, optional, default = 50.0) –

    Resampling quality, 0 is lowest, 100 is highest.

    0 corresponds to 3 lobes of the sinc filter; 50 gives 16 lobes and 100 gives 64 lobes.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • read_idxs (bool, optional, default = False) –

    Whether to output the indices of samples as they occur in the manifest file

    as a separate output

  • read_sample_rate (bool, optional, default = True) – Whether to output the sample rate for each sample as a separate output

  • read_text (bool, optional, default = True) – Whether to output the transcript text for each sample as a separate output

  • sample_rate (float, optional, default = -1.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.Numpy(*, device='cpu', **kwargs)#

Reads Numpy arrays from a directory.

This operator can be used in the following modes:

  1. Read all files from a directory indicated by file_root that match given file_filter.

  2. Read file names from a text file indicated in file_list argument.

  3. Read files listed in files argument.

Note

The gpu backend requires cuFile/GDS support (418.x driver family or newer). which is shipped with the CUDA toolkit starting from CUDA 11.4. Please check the GDS documentation for more details.

The gpu reader reads the files in chunks. The size of the chunk can be controlled process-wide with an environment variable DALI_GDS_CHUNK_SIZE. Valid values are powers of 2 between 4096 and 16M, with the default being 2M. For convenience, the value can be specified with a k or M suffix, applying a multiplier of 1024 and 2^20, respectively.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cache_header_information (bool, optional, default = False) – If set to True, the header information for each file is cached, improving access speed.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • file_filter (str, optional, default = ‘*.npy’) –

    If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the file_root.

    This argument is ignored when file paths are taken from file_list or files.

  • file_list (str, optional) –

    Path to a text file that contains filenames (one per line) where the filenames are relative to the location of that file or to file_root, if specified.

    This argument is mutually exclusive with files.

  • file_root (str, optional) –

    Path to a directory that contains the data files.

    If not using file_list or files, this directory is traversed to discover the files. file_root is required in this mode of operation.

  • files (str or list of str, optional) –

    A list of file paths to read the data from.

    If file_root is provided, the paths are treated as being relative to it.

    This argument is mutually exclusive with file_list.

  • fill_value (float, optional, default = 0.0) – Determines the padding value when out_of_bounds_policy is set to “pad”.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • out_of_bounds_policy (str, optional, default = ‘error’) –

    Determines the policy when reading outside of the bounds of the numpy array.

    Here is a list of the supported values:

    • "error" (default): Attempting to read outside of the bounds of the image will produce an error.

    • "pad": The array will be padded as needed with zeros or any other value that is specified with the fill_value argument.

    • "trim_to_shape": The ROI will be cut to the bounds of the array.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • register_buffers (bool, optional, default = True) –

    Applies only to the gpu backend type.

    Warning

    This argument is temporarily disabled and left for backward compatibility. It will be reenabled in the future releases.

    If true, the device I/O buffers will be registered with cuFile. It is not recommended if the sample sizes vary a lot.

  • rel_roi_end (float or list of float or TensorList of float, optional) –

    End of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

    This argument is incompatible with “roi_end”, “roi_shape” and “rel_roi_shape”.

  • rel_roi_shape (float or list of float or TensorList of float, optional) –

    Shape of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

    This argument is incompatible with “roi_shape”, “roi_end” and “rel_roi_end”.

  • rel_roi_start (float or list of float or TensorList of float, optional) –

    Start of the region-of-interest, in relative coordinates (range [0.0 - 1.0]).

    This argument is incompatible with “roi_start”.

  • roi_axes (int or list of int, optional, default = []) –

    Order of dimensions used for the ROI anchor and shape arguments, as dimension indices.

    If not provided, all the dimensions should be specified in the ROI arguments.

  • roi_end (int or list of int or TensorList of int, optional) –

    End of the region-of-interest, in absolute coordinates.

    This argument is incompatible with “rel_roi_end”, “roi_shape” and “rel_roi_shape”.

  • roi_shape (int or list of int or TensorList of int, optional) –

    Shape of the region-of-interest, in absolute coordinates.

    This argument is incompatible with “rel_roi_shape”, “roi_end” and “rel_roi_end”.

  • roi_start (int or list of int or TensorList of int, optional) –

    Start of the region-of-interest, in absolute coordinates.

    This argument is incompatible with “rel_roi_start”.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • shuffle_after_epoch (bool, optional, default = False) –

    If set to True, the reader shuffles the entire dataset after each epoch.

    stick_to_shard and random_shuffle cannot be used when this argument is set to True.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

  • use_o_direct (bool, optional, default = False) –

    If set to True, the data will be read directly from the storage bypassing system cache.

    Mutually exclusive with dont_use_mmap=False.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.Sequence(*, device='cpu', **kwargs)#

Warning

This operator is now deprecated.

This operator may be removed in future releases.

external_source() can be used to implement custom reading patterns. For reading video sequences, one of nvidia.dali.fn.readers.video(), nvidia.dali.fn.experimental.readers.video(), nvidia.dali.fn.experimental.decoders.video() or nvidia.dali.fn.experimental.inputs.video() can be used.

Reads [Frame] sequences from a directory representing a collection of streams.

This operator expects file_root to contain a set of directories, where each directory represents an extracted video stream. This stream is represented by one file for each frame, sorted lexicographically. Sequences do not cross the stream boundary and only complete sequences are considered, so there is no padding.

Example directory structure:

- file_root
  - 0
    - 00001.png
    - 00002.png
    - 00003.png
    - 00004.png
    - 00005.png
    - 00006.png
    ....

  - 1
    - 00001.png
    - 00002.png
    - 00003.png
    - 00004.png
    - 00005.png
    - 00006.png
    ....

Note

This operator is an analogue of video reader working on video frames extracted as separate images. Its main purpose is for test baseline. For regular usage, the video reader is the recommended approach.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • file_root (str) – Path to a directory containing streams, where the directories represent streams.

  • sequence_length (int) – Length of sequence to load for each sample.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • step (int, optional, default = 1) – Distance between first frames of consecutive sequences.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • stride (int, optional, default = 1) – Distance between consecutive frames in a sequence.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.TFRecord(path, index_path, features, **kwargs)#

Reads samples from a TensorFlow TFRecord file.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • features (dict of (string, nvidia.dali.tfrecord.Feature)) –

    A dictionary that maps names of the TFRecord features to extract to the feature type.

    Typically obtained by using the dali.tfrecord.FixedLenFeature and dali.tfrecord.VarLenFeature helper functions, which are equal to TensorFlow’s tf.FixedLenFeature and tf.VarLenFeature types, respectively. For additional flexibility, dali.tfrecord.VarLenFeature supports the partial_shape parameter. If provided, the data will be reshaped to match its value, and the first dimension will be inferred from the data size.

    If the named feature doesn’t exists in the processed TFRecord entry an empty tensor is returned.

  • index_path (str or list of str) –

    List of paths to index files. There should be one index file for every TFRecord file.

    The index files can be obtained from TFRecord files by using the tfrecord2idx script that is distributed with DALI.

  • path (str or list of str) – List of paths to TFRecord files.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

  • use_o_direct (bool, optional, default = False) –

    If set to True, the data will be read directly from the storage bypassing the system cache.

    Mutually exclusive with dont_use_mmap=False.

class nvidia.dali.ops.readers.Video(*, device='cpu', **kwargs)#

Loads and decodes video files using FFmpeg and NVDECODE, which is the hardware-accelerated video decoding feature in the NVIDIA(R) GPU.

The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences of sequence_length frames with shape (N, F, H, W, C), where N is the batch size, and F is the number of frames). This class only supports constant frame rate videos.

Note

Containers which doesn’t support indexing, like mpeg, requires DALI to seek to the sequence when each new sequence needs to be decoded.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • sequence_length (int) – Frames to load per sequence.

  • additional_decode_surfaces (int, optional, default = 2) –

    Additional decode surfaces to use beyond minimum required.

    This argument is ignored when the decoder cannot determine the minimum number of decode surfaces

    Note

    This can happen when the driver is an older version.

    This parameter can be used to trade off memory usage with performance.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • channels (int, optional, default = 3) – Number of channels.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type.

    Supported types: UINT8 or FLOAT.

  • enable_frame_num (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the frame number output.

  • enable_timestamps (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the timestamps output.

  • file_list (str, optional, default = ‘’) –

    Path to the file with a list of file label [start_frame [end_frame]] values.

    Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with filenames and file_root.

  • file_list_frame_num (bool, optional, default = False) –

    If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.

    If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.

    Frame numbers start from 0.

  • file_list_include_preceding_frame (bool, optional, default = False) –

    Changes the behavior how file_list start and end frame timestamps are translated to a frame number.

    If the start/end timestamps are provided in file_list as timestamps, the start frame is calculated as ceil(start_time_stamp * FPS) and the end as floor(end_time_stamp * FPS). If this argument is set to True, the equation changes to floor(start_time_stamp * FPS) and ceil(end_time_stamp * FPS) respectively. In effect, the first returned frame is not later, and the end frame not earlier, than the provided timestamps. This behavior is more aligned with how the visible timestamps are correlated with displayed video frames.

    Note

    When file_list_frame_num is set to True, this option does not take any effect.

    Warning

    This option is available for legacy behavior compatibility.

  • file_root (str, optional, default = ‘’) –

    Path to a directory that contains the data files.

    This option is mutually exclusive with filenames and file_list.

  • filenames (str or list of str, optional, default = []) –

    File names of the video files to load.

    This option is mutually exclusive with file_list and file_root.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • labels (int or list of int, optional) –

    Labels associated with the files listed in filenames argument.

    If an empty list is provided, sequential 0-based indices are used as labels. If not provided, no labels will be yielded.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • normalized (bool, optional, default = False) – Gets the output as normalized data.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • pad_sequences (bool, optional, default = False) –

    Allows creation of incomplete sequences if there is an insufficient number of frames at the very end of the video.

    Redundant frames are zeroed. Corresponding time stamps and frame numbers are set to -1.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • skip_vfr_check (bool, optional, default = False) –

    Skips the check for the variable frame rate (VFR) videos.

    Use this flag to suppress false positive detection of VFR videos.

    Warning

    When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.

  • step (int, optional, default = -1) –

    Frame interval between each sequence.

    When the value is less than 0, step is set to sequence_length.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.VideoResize(*, device='cpu', **kwargs)#

Loads, decodes and resizes video files with FFmpeg and NVDECODE, which is NVIDIA GPU’s hardware-accelerated video decoding.

The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences with shape (N, F, H, W, C), with N being the batch size, and F the number of frames in the sequence.

This operator combines the features of nvidia.dali.fn.video_reader() and nvidia.dali.fn.resize().

Note

The decoder supports only constant frame-rate videos.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • sequence_length (int) – Frames to load per sequence.

  • additional_decode_surfaces (int, optional, default = 2) –

    Additional decode surfaces to use beyond minimum required.

    This argument is ignored when the decoder cannot determine the minimum number of decode surfaces

    Note

    This can happen when the driver is an older version.

    This parameter can be used to trade off memory usage with performance.

  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • channels (int, optional, default = 3) – Number of channels.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type.

    Supported types: UINT8 or FLOAT.

  • enable_frame_num (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the frame number output.

  • enable_timestamps (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the timestamps output.

  • file_list (str, optional, default = ‘’) –

    Path to the file with a list of file label [start_frame [end_frame]] values.

    Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with filenames and file_root.

  • file_list_frame_num (bool, optional, default = False) –

    If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.

    If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.

    Frame numbers start from 0.

  • file_list_include_preceding_frame (bool, optional, default = False) –

    Changes the behavior how file_list start and end frame timestamps are translated to a frame number.

    If the start/end timestamps are provided in file_list as timestamps, the start frame is calculated as ceil(start_time_stamp * FPS) and the end as floor(end_time_stamp * FPS). If this argument is set to True, the equation changes to floor(start_time_stamp * FPS) and ceil(end_time_stamp * FPS) respectively. In effect, the first returned frame is not later, and the end frame not earlier, than the provided timestamps. This behavior is more aligned with how the visible timestamps are correlated with displayed video frames.

    Note

    When file_list_frame_num is set to True, this option does not take any effect.

    Warning

    This option is available for legacy behavior compatibility.

  • file_root (str, optional, default = ‘’) –

    Path to a directory that contains the data files.

    This option is mutually exclusive with filenames and file_list.

  • filenames (str or list of str, optional, default = []) –

    File names of the video files to load.

    This option is mutually exclusive with file_list and file_root.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • labels (int or list of int, optional) –

    Labels associated with the files listed in filenames argument.

    If an empty list is provided, sequential 0-based indices are used as labels. If not provided, no labels will be yielded.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • normalized (bool, optional, default = False) – Gets the output as normalized data.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • pad_sequences (bool, optional, default = False) –

    Allows creation of incomplete sequences if there is an insufficient number of frames at the very end of the video.

    Redundant frames are zeroed. Corresponding time stamps and frame numbers are set to -1.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • skip_vfr_check (bool, optional, default = False) –

    Skips the check for the variable frame rate (VFR) videos.

    Use this flag to suppress false positive detection of VFR videos.

    Warning

    When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.

  • step (int, optional, default = -1) –

    Frame interval between each sequence.

    When the value is less than 0, step is set to sequence_length.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

class nvidia.dali.ops.readers.Webdataset(*, device='cpu', **kwargs)#

A reader for the webdataset format.

The webdataset format is a way of providing efficient access to datasets stored in tar archives.

Storing data in POSIX tar archives greatly speeds up I/O operations on mechanical storage devices and on network file systems because it allows the operating system to reduce the number of I/O operations and to read the data ahead.

WebDataset fulfils a similar function to Tensorflow’s TFRecord/tf.Example classes, but is much easier to adopt because it does not actually require any data conversion. The data is stored in exactly the same format inside tar files as it is on disk, and all preprocessing and data augmentation code remains unchanged.

The dataset consists of one or more tar archives, each of which is further split into samples. A sample contains one or more components that correspond to the actual files contained within the archive. The components that belong to a specific sample are aggregated by filename without extension (for the specifics about the extensions please read the description of the ext parameter below). Note that samples with their filename starting with a dot will not be loaded, as well as entries that are not regular files.

In addition to the tar archive with data, each archive should come with a corresponding index file. The index file can be generated using a dedicated script:

<path_to_dali>/tools/wds2idx.py <path_to_archive> <path_to_index_file>

If the index file is not provided, it will be automatically inferred from the tar file. Keep in mind though that it will add considerable startup time for big datasets.

The format of the index file is:

v1.2 <num_samples>
<component1_ext> <component1_data_offset> <component1_size> <component2_ext> <component2_data_offset> <component2_size> ...
...

Based on webdataset/webdataset

Supported backends
  • ‘cpu’

Keyword Arguments:
  • ext (str or list of str) –

    The extension sets for each of the outputs produced.

    The number of extension sets determines the number of outputs of the reader. The extensions of the components are counted as the text after the first dot in the name of the file (excluding the samples starting with a dot). The different extension options should be separated with a semicolon (‘;’) and may contain dots.

    Example: “left.png;right.jpg”

  • paths (str or list of str) –

    The list of (one or more) paths to the webdataset archives.

    Has to be the same length as the index_paths argument.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • case_sensitive_extensions (bool, optional, default = True) –

    Determines whether the extensions provided via the ext should be case sensitive.

    Allows mixing case sizes in the ext argument as well as in the webdataset container. For example when turned off: jpg, JPG, jPG should work.

    If the extension characters cannot be represented as ASCI the result of turing this option off is undefined.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • dtypes (DALIDataType or list of DALIDataType, optional) –

    Data types of the respective outputs.

    The default output data types are UINT8. However, if set, each output data type should be specified. Moreover, the tar file should be constructed so that it will only output a sample with its byte size divisible by the size of the data type.

  • index_paths (str or list of str, optional, default = []) –

    The list of the index files corresponding to the respective webdataset archives.

    Has to be the same length as the paths argument. In case it is not provided, it will be inferred automatically from the webdataset archive.

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • missing_component_behavior (str, optional, default = ‘’) –

    Specifies what to do in case there is not any file in a sample corresponding to a certain output.

    Possible behaviors:
    • ”empty” (default) - in that case the output that was not set will just contain an empty tensor

    • ”skip” - in that case the entire sample will just be skipped (no penalty to performance except for reduced caching of the archive)

    • ”error” - in that case an exception will be raised and te execution stops

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.

__call__(**kwargs)#

Operator call to be used in graph definition. This operator doesn’t have any inputs.

nvidia.dali.ops.reductions#

class nvidia.dali.ops.reductions.Max(*, device='cpu', **kwargs)#

Gets maximal input element along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.Mean(*, device='cpu', **kwargs)#

Gets mean of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) – Output data type. This type is used to accumulate the result.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.MeanSquare(*, device='cpu', **kwargs)#

Gets mean square of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) – Output data type. This type is used to accumulate the result.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.Min(*, device='cpu', **kwargs)#

Gets minimal input element along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.RMS(*, device='cpu', **kwargs)#

Gets root mean square of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) – Output data type. This type is used to accumulate the result.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.StdDev(*, device='cpu', **kwargs)#

Gets standard deviation of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • ddof (int, optional, default = 0) – Delta Degrees of Freedom. Adjusts the divisor used in calculations, which is N - ddof.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data, mean, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Input to the operator.

  • __mean (float or TensorList of float) – Mean value to use in the calculations.

class nvidia.dali.ops.reductions.Sum(*, device='cpu', **kwargs)#

Gets sum of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • dtype (nvidia.dali.types.DALIDataType, optional) – Output data type. This type is used to accumulate the result.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.reductions.Variance(*, device='cpu', **kwargs)#

Gets variance of elements along provided axes.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • axes (int or list of int, optional) –

    Axis or axes along which reduction is performed.

    Accepted range is [-ndim, ndim-1]. Negative indices are counted from the back.

    Not providing any axis results in reduction of all elements.

  • axis_names (layout str, optional) –

    Name(s) of the axis or axes along which the reduction is performed.

    The input layout is used to translate the axis names to axis indices, for example axis_names="HW" with input layout “FHWC” is equivalent to specifying axes=[1,2]. This argument cannot be used together with axes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • ddof (int, optional, default = 0) – Delta Degrees of Freedom. Adjusts the divisor used in calculations, which is N - ddof.

  • keep_dims (bool, optional, default = False) – If True, maintains original input dimensions.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(data, mean, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __data (TensorList) – Input to the operator.

  • __mean (float or TensorList of float) – Mean value to use in the calculations.

nvidia.dali.ops.segmentation#

class nvidia.dali.ops.segmentation.RandomMaskPixel(*, device='cpu', **kwargs)#

Selects random pixel coordinates in a mask, sampled from a uniform distribution.

Based on run-time argument foreground, it returns either only foreground pixels or any pixels.

Pixels are classified as foreground either when their value exceeds a given threshold or when it’s equal to a specific value.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • foreground (int or TensorList of int, optional, default = 0) –

    If different than 0, the pixel position is sampled uniformly from all foreground pixels.

    If 0, the pixel position is sampled uniformly from all available pixels.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • threshold (float or TensorList of float, optional, default = 0.0) –

    All pixels with a value above this threshold are interpreted as foreground.

    This argument is mutually exclusive with value argument.

  • value (int or TensorList of int, optional) –

    All pixels equal to this value are interpreted as foreground.

    This argument is mutually exclusive with threshold argument and is meant to be used only with integer inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.segmentation.RandomObjectBBox(*, device='cpu', **kwargs)#

Randomly selects an object from a mask and returns its bounding box.

This operator takes a labeled segmentation map as its input. With probability foreground_prob it randomly selects a label (uniformly or according to the distribution given as class_weights), extracts connected blobs of pixels with the selected label and randomly selects one of the blobs. The blobs may be further filtered according to k_largest and threshold. The output is a bounding box of the selected blob in one of the formats described in format.

With probability 1-foreground_prob, the entire area of the input is returned.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • background (int or TensorList of int, optional, default = 0) –

    Background label.

    If left unspecified, it’s either 0 or any value not in classes.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • cache_objects (bool, optional, default = False) –

    Cache object bounding boxes to avoid the computational cost of finding object blobs in previously seen inputs.

    Searching for blobs of connected pixels and finding boxes can take a long time. When the dataset has few items, but item size is big, you can use caching to save the boxes and reuse them when the same input is seen again. The inputs are compared based on 256-bit hash, which is much faster to compute than to recalculate the object boxes.

  • class_weights (float or list of float or TensorList of float, optional) –

    Relative probabilities of foreground classes.

    Each value corresponds to a class label in classes. If classes are not specified, consecutive 1-based labels are assigned.

    The sum of the weights doesn’t have to be equal to 1 - if it isn’t the weights will be normalized .

  • classes (int or list of int or TensorList of int, optional) –

    List of labels considered as foreground.

    If left unspecified, all labels not equal to background are considered foreground.

  • foreground_prob (float or TensorList of float, optional, default = 1.0) – Probability of selecting a foreground bounding box.

  • format (str, optional, default = ‘anchor_shape’) –

    Format in which the data is returned.

    Possible choices are::
    • ”anchor_shape” (the default) - there are two outputs: anchor and shape

    • ”start_end” - there are two outputs: bounding box start and one-past-end coordinates

    • ”box” - there is one output that contains concatenated start and end coordinates

  • ignore_class (bool, optional, default = False) –

    If True, all objects are picked with equal probability, regardless of the class they belong to. Otherwise, a class is picked first and then an object is randomly selected from this class.

    This argument is incompatible with classes, class_weights or output_class.

    Note

    This flag only affects the probability with which blobs are selected. It does not cause blobs of different classes to be merged.

  • k_largest (int, optional) –

    If specified, the boxes are sorted by decreasing volume and only k_largest are considered.

    If ignore_class is True, k_largest referes to all boxes; otherwise it refers to the selected class.

  • output_class (bool, optional, default = False) –

    If True, an additional output is produced which contains the label of the class to which the selected box belongs, or background label if the selected box is not an object bounding box.

    The output may not be an object bounding box when any of the following conditions occur:
    • the sample was randomly (according to foreground_prob) chosen not be be a foreground one

    • the sample contained no foreground objects

    • no bounding box met the required size threshold.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • threshold (int or list of int or TensorList of int, optional) –

    Per-axis minimum size of the bounding boxes to return.

    If the selected class doesn’t contain any bounding box that meets this condition, it is rejected and another class is picked. If no class contains a satisfactory box, the entire input area is returned.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.segmentation.SelectMasks(*, device='cpu', **kwargs)#

Selects a subset of polygons by their mask ids.

The operator expects three inputs describing multiple segmentation mask polygons belonging to different mask ids and a list of selected mask ids.

Each sample can contain several polygons belonging to different masks, and each polygon can be composed by an arbitrary number of vertices (at least 3). The masks polygons are described by the inputs polygons and vertices and the operator produces output polygons and vertices where only the polygons associated with the selected masks are present.

Note

The format of polygons and vertices is the same as produced by COCOReader.

Examples:

Let us assume the following input mask, where symbolic coordinates are used for a clearer example:

polygons = [[0, 0, 3], [1, 3, 7], [2, 7, 10]]
vertices = [[x0, y0], [x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5],
            [x6, y6], [x7, y7], [x8, y8], [x9, y9]]

Example 1: Selecting a single mask with id 1, maintaining the original id:

mask_ids = [1], ``reindex_masks`` = False
out_polygons = [[1, 0, 4]]
out_vertices = [[x3, y3], [x4, y4], [x5, y5], [x6, y6]]

Example 2: Selecting two out of the three masks, replacing the mask ids with the indices at which they appeared in mask_ids input:

mask_ids = [2, 0]
reindex_masks = True
out_polygons = [[0, 3, 6], [1, 0, 3]]
out_vertices = [[x0, y0], [x1, y1], [x2, y2], [x7, y7], [x8, y8], [x9, y9]]
Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reindex_masks (bool, optional, default = False) – If set to True, the output mask ids are replaced with the indices at which they appeared in mask_ids input.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(mask_ids, polygons, vertices, **kwargs)#

Operator call to be used in graph definition.

Parameters:
  • __mask_ids (1D TensorList of int) – List of identifiers of the masks to be selected. The list should not contain duplicates.

  • __polygons (2D TensorList of int) –

    Polygons, described by 3 columns:

    [[mask_id0, start_vertex_idx0, end_vertex_idx0],
     [mask_id1, start_vertex_idx1, end_vertex_idx1],
     ...,
     [mask_idn, start_vertex_idxn, end_vertex_idxn],]
    

    with mask_id being the identifier of the mask this polygon belongs to, and [start_vertex_idx, end_vertex_idx) describing the range of indices from vertices that belong to this polygon.

  • __vertices (2D TensorList) –

    Vertex data stored in interleaved format:

    [[x0, y0, ...],
     [x1, y1, ...],
     ... ,
     [xn, yn, ...]]
    

    The operator accepts vertices with arbitrary number of coordinates.

nvidia.dali.ops.transforms#

All operators in this module support only CPU device as they are meant to be provided as an input to named keyword operator arguments. Check for more details the relevant pipeline documentation section.

class nvidia.dali.ops.transforms.Combine(*, device='cpu', **kwargs)#

Combines two or more affine transforms.

By default, the transforms are combined such that applying the resulting transform to a point is equivalent to

applying the input transforms in the order as listed.

Example: combining [T1, T2, T3] is equivalent to T3(T2(T1(…))) for default order and equivalent to T1(T2(T3(…)))

for reversed order.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.transforms.Combine() class for complete information.

class nvidia.dali.ops.transforms.Crop(*, device='cpu', **kwargs)#

Produces an affine transform matrix that maps a reference coordinate space to another one.

This transform can be used to adjust coordinates after a crop operation so that a from_start point will be mapped to to_start and from_end will be mapped to to_end.

If another transform matrix is passed as an input, the operator applies the transformation to the matrix provided.

Note

The output of this operator can be fed directly to CoordTransform and WarpAffine operators.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • absolute (bool, optional, default = False) – If set to true, start and end coordinates will be swapped if start > end.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • from_end (float or list of float or TensorList of float, optional, default = [1.0]) –

    The upper bound of the original coordinate space.

    Note

    If left empty, a vector of ones will be assumed. If a single value is provided, it will be repeated to match the number of dimensions

    Supports per-frame inputs.

  • from_start (float or list of float or TensorList of float, optional, default = [0.0]) –

    The lower bound of the original coordinate space.

    Note

    If left empty, a vector of zeros will be assumed. If a single value is provided, it will be repeated to match the number of dimensions

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • to_end (float or list of float or TensorList of float, optional, default = [1.0]) –

    The upper bound of the destination coordinate space.

    Note

    If left empty, a vector of ones will be assumed. If a single value is provided, it will be repeated to match the number of dimensions

    Supports per-frame inputs.

  • to_start (float or list of float or TensorList of float, optional, default = [0.0]) –

    The lower bound of the destination coordinate space.

    Note

    If left empty, a vector of zeros will be assumed. If a single value is provided, it will be repeated to match the number of dimensions

    Supports per-frame inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.transforms.Rotation(*, device='cpu', **kwargs)#

Produces a rotation affine transform matrix.

If another transform matrix is passed as an input, the operator applies rotation to the matrix provided.

The number of dimensions is assumed to be 3 if a rotation axis is provided or 2 otherwise.

Note

The output of this operator can be fed directly to CoordTransform and WarpAffine operators.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • angle (float or TensorList of float) –

    Angle, in degrees.

    Supports per-frame inputs.

  • axis (float or list of float or TensorList of float, optional) –

    Axis of rotation (applies only to 3D transforms).

    The vector does not need to be normalized, but it must have a non-zero length.

    Reversing the vector is equivalent to changing the sign of angle.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • center (float or list of float or TensorList of float, optional) –

    The center of the rotation.

    If provided, the number of elements should match the dimensionality of the transform.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.transforms.Scale(*, device='cpu', **kwargs)#

Produces a scale affine transform matrix.

If another transform matrix is passed as an input, the operator applies scaling to the matrix provided.

Note

The output of this operator can be fed directly to CoordTransform and WarpAffine operators.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • scale (float or list of float or TensorList of float) –

    The scale factor, per dimension.

    The number of dimensions of the transform is inferred from this argument.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • center (float or list of float or TensorList of float, optional) –

    The center of the scale operation.

    If provided, the number of elements should match the one of scale argument.

    Supports per-frame inputs.

  • ndim (int, optional) –

    Number of dimensions.

    It should be provided when the number of dimensions can’t be inferred. For example, when scale is a scalar value and there’s no input transform.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.transforms.Shear(*, device='cpu', **kwargs)#

Produces a shear affine transform matrix.

If another transform matrix is passed as an input, the operator applies the shear mapping to the matrix provided.

Note

The output of this operator can be fed directly to CoordTransform and WarpAffine operators.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • angles (float or list of float or TensorList of float, optional) –

    The shear angles, in degrees.

    This argument is mutually exclusive with shear.

    For 2D, angles contains two elements: angle_x, angle_y.

    For 3D, angles contains six elements: angle_xy, angle_xz, angle_yx, angle_yz, angle_zx, angle_zy.

    A shear angle is translated to a shear factor as follows:

    shear_factor = tan(deg2rad(shear_angle))
    

    Note

    The valid range of values is between -90 and 90 degrees. This argument is mutually exclusive with shear. If provided, the number of dimensions of the transform is inferred from this argument.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • center (float or list of float or TensorList of float, optional) –

    The center of the shear operation.

    If provided, the number of elements should match the dimensionality of the transform.

    Supports per-frame inputs.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shear (float or list of float or TensorList of float, optional) –

    The shear factors.

    For 2D, shear contains two elements: shear_x, shear_y.

    For 3D, shear contains six elements: shear_xy, shear_xz, shear_yx, shear_yz, shear_zx, shear_zy.

    A shear factor value can be interpreted as the offset to be applied in the first axis when moving in the direction of the second axis.

    Note

    This argument is mutually exclusive with angles. If provided, the number of dimensions of the transform is inferred from this argument.

    Supports per-frame inputs.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

class nvidia.dali.ops.transforms.Translation(*, device='cpu', **kwargs)#

Produces a translation affine transform matrix.

If another transform matrix is passed as an input, the operator applies translation to the matrix provided.

Note

The output of this operator can be fed directly to CoordTransform and WarpAffine operators.

This operator allows sequence inputs.

Supported backends
  • ‘cpu’

Keyword Arguments:
  • offset (float or list of float or TensorList of float) –

    The translation vector.

    The number of dimensions of the transform is inferred from this argument.

    Supports per-frame inputs.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • reverse_order (bool, optional, default = False) –

    Determines the order when combining affine transforms.

    If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.

    If there’s no input, this argument is ignored.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

__call__(__input, **kwargs)#

Operator call to be used in graph definition.

Parameters:

__input (TensorList) – Input to the operator.

nvidia.dali.plugin.numba.experimental#

class nvidia.dali.plugin.numba.experimental.NumbaFunction(run_fn, out_types, in_types, outs_ndim, ins_ndim, setup_fn=None, device='cpu', batch_processing=False, blocks=None, threads_per_block=None, **kwargs)#

Invokes a njit compiled Numba function.

The run function should be a Python function that can be compiled in Numba nopython mode. A function taking a single input and producing a single output should follow the following definition:

def run_fn(out0, in0)

where out0 and in0 are numpy array views of the input and output tensors. If the operator is configured to run in batch mode, then the first dimension of the arrays is the sample index.

Note that the function can take at most 6 inputs and 6 outputs.

Additionally, an optional setup function calculating the shape of the output so DALI can allocate memory for the output with the following definition:

def setup_fn(outs, ins)

The setup function is invoked once for the whole batch. The first dimension of outs, ins is the number of outputs/inputs, respectively. The second dimension is the sample index. For example, the first sample on the second output can be accessed by outs[1][0].

If no setup function provided, the output shape and data type will be the same as the input.

Note

This operator is experimental and its API might change without notice.

Warning

When the pipeline has conditional execution enabled, additional steps must be taken to prevent the run_fn and setup_fn functions from being rewritten by AutoGraph. There are two ways to achieve this:

  1. Define the functions at global scope (i.e. outside of pipeline_def scope).

  2. If functions are a result of another “factory” function, then the factory function must be defined outside pipeline definition function and decorated with @do_not_convert.

More details can be found in @do_not_convert documentation.

Example 1:

The following example shows a simple setup function which permutes the order of dimensions in the shape.

def setup_change_out_shape(outs, ins):
    out0 = outs[0]
    in0 = ins[0]
    perm = [1, 0, 2]
    for sample_idx in range(len(out0)):
        for d in range(len(perm)):
            out0[sample_idx][d] = in0[sample_idx][perm[d]]

Since the setup function is running for the whole batch, we need to iterate and permute each sample’s shape individually. For shapes = [(10, 20, 30), (20, 10, 30)] it will produce output with shapes = [(20, 10, 30), (10, 20, 30)].

Also lets provide run function:

def run_fn(out0, in0):
    for i in range(in0.shape[0]):
        for j in range(in0.shape[1]):
            out0[j, i] = in0[i, j]

The run function can work per-sample or per-batch, depending on the batch_processing argument.

A run function working per-batch may look like this:

def run_fn(out0_samples, in0_samples):
    for out0, in0 in zip(out0_samples, in0_samples):
        for i in range(in0.shape[0]):
            for j in range(in0.shape[1]):
                out0[j, i] = in0[i, j]

A run function working per-sample may look like this:

def run_fn(out0, in0):
    for i in range(in0.shape[0]):
        for j in range(in0.shape[1]):
            out0[j, i] = in0[i, j]

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • in_types (DALIDataType or list of DALIDataType) – Types of inputs.

  • ins_ndim (int or list of int) – Number of dimensions which inputs shapes should have.

  • out_types (DALIDataType or list of DALIDataType) – Types of outputs.

  • outs_ndim (int or list of int) – Number of dimensions which outputs shapes should have.

  • run_fn (object) – Function to be invoked. This function must work in Numba nopython mode.

  • batch_processing (bool, optional, default = False) –

    Determines whether the function is invoked once per batch or separately for each sample in the batch.

    When batch_processing is set to True, the function processes the whole batch. It is necessary if the function has to perform cross-sample operations and may be beneficial if significant part of the work can be reused. For other use cases, specifying False and using per-sample processing function allows the operator to process samples in parallel.

  • blocks (int or list of int, optional) –

    3-item list specifying the number of blocks per grid used to

    execute a CUDA kernel

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • setup_fn (object, optional) – Setup function setting shapes for outputs. This function is invoked once per batch. Also this function must work in Numba nopython mode.

  • threads_per_block (int or list of int, optional) –

    3-item list specifying the number of threads per

    block used to execute a CUDA kernel

__call__(*inputs, **kwargs)#

See nvidia.dali.ops.NumbaFunction() class for complete information.

nvidia.dali.plugin.pytorch#

class nvidia.dali.plugin.pytorch.TorchPythonFunction(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)#

Executes a function that is operating on Torch tensors.

This class is analogous to nvidia.dali.fn.python_function() but the tensor data is handled as PyTorch tensors.

This operator allows sequence inputs and supports volumetric data.

This operator will not be optimized out of the graph.

Supported backends
  • ‘cpu’

  • ‘gpu’

Keyword Arguments:
  • function (object) –

    A callable object that defines the function of the operator.

    Warning

    The function must not hold a reference to the pipeline in which it is used. If it does, a circular reference to the pipeline will form and the pipeline will never be freed.

  • batch_processing (bool, optional, default = True) – Determines whether the function gets an entire batch as an input.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • num_outputs (int, optional, default = 1) – Number of outputs.

  • output_layouts (layout str or list of layout str, optional) –

    Tensor data layouts for the outputs.

    This argument can be a list that contains a distinct layout for each output. If the list has fewer than num_outputs elements, only the first outputs have the layout set and the rest of the outputs have no layout assigned.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

Compose#

class nvidia.dali.ops.Compose(op_list)#

Returns a meta-operator that chains the operations in op_list.

The return value is a callable object which, when called, performs:

op_list[n-1](op_list([n-2](...  op_list[0](args))))

Operators can be composed only when all outputs of the previous operator can be processed directly by the next operator in the list.

The example below chains an image decoder and a Resize operation with random square size. The decode_and_resize object can be called as if it was an operator:

decode_and_resize = ops.Compose([
    ops.decoders.Image(device="cpu"),
    ops.Resize(size=fn.random.uniform(range=400,500)), device="gpu")
])

files, labels = fn.readers.caffe(path=caffe_db_folder, seed=1)
pipe.set_outputs(decode_and_resize(files), labels)

If there’s a transition from CPU to GPU in the middle of the op_list, as is the case in this example, Compose automatically arranges copying the data to GPU memory.

Note

This is an experimental feature, subject to change without notice.