Supported operations¶
Here is a list of the supported operations:
- CPU operator means that the operator can be scheduled on the CPU.The outputs of CPU operators may be used as regular inputs and to provide per-sample parameters for other operators through tensor arguments.
GPU operator means that the operator can be scheduled on the GPU. Their outputs can only be used as regular inputs for other GPU operators and pipeline outputs.
Mixed operator means that the operator accepts CPU inputs and produces GPU outputs.
Sequences means that the operator can produce or accept as an input a sequence (for example, a video).
Volumetric means that the operator supports 3D data processing.
Reading this guide¶
DALI operators are used in two steps:
Parameterizing the operator with
__init__
.Invoking the parameterized operator like a function (effectively invoking its
__call__
method) in pipeline’sdefine_graph()
method.
In the documentation for every DALI Operator, see the lists of Keyword Arguments that are supported by the class constructor.
The documentation for __call__
operator lists the positional arguments, (or parameters) and
additional keyword arguments. __call__
should only be used in the
define_graph()
. The inputs to the __call__
method
are nvidia.dali.pipeline.DataNode
objects, which are symbolic representations of
batches of Tensor.
The keyword arguments in __call__
operator accept TensorList argument inputs and should be
produced by other CPU operators.
Note
The names of the positional arguments for __call__
operator (parameters) are provided
only for documentation purposes and cannot be used as keyword arguments.
Note
Some keyword arguments can be listed twice. Once for the class constructor and once for
__call__
operator. This listing means the arguments can be parametrized during operator
construction with some Python values or driven by the output from another operator when
running the pipeline.
Support Table¶
The following table lists all available operators and devices on which they can be executed:
Operator name |
CPU |
GPU |
Mixed |
Sequences |
Volumetric |
---|---|---|---|---|---|
Operators Documentation¶
-
class
nvidia.dali.ops.
AudioDecoder
(**kwargs)¶ Decodes waveforms from encoded audio data.
It supports the following audio formats: wav, flac and ogg. This operator produces the following outputs:
output[0]: A batch of decoded data
output[1]: A batch of sampling rates [Hz].
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
downmix (bool, optional, default = False) –
If set to True, downmix all input channels to mono.
If downmixing is turned on, the decoder output is 1D. If downmixing is turned off, it produces 2D output with interleaved channels.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Output data type.
Supported types:
INT16
,INT32
,FLOAT
.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
quality (float, optional, default = 50.0) –
Resampling quality, where 0 is the lowest, and 100 is the highest.
0 gives 3 lobes of the sinc filter, 50 gives 16 lobes, and 100 gives 64 lobes.
sample_rate (float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
sample_rate (TensorList of float, optional, default = 0.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.
-
class
nvidia.dali.ops.
BBoxPaste
(**kwargs)¶ Transforms bounding boxes so that the boxes remain in the same place in the image after the image is pasted on a larger canvas.
Corner coordinates are transformed according to the following formula:
(x', y') = (x/ratio + paste_x', y/ratio + paste_y')
Box sizes (if
xywh
is used) are transformed according to the following formula:(w', h') = (w/ratio, h/ratio)
Where:
paste_x' = paste_x * (ratio - 1)/ratio paste_y' = paste_y * (ratio - 1)/ratio
The paste coordinates are normalized so that
(0,0)
aligns the image to top-left of the canvas and(1,1)
aligns it to bottom-right.- Supported backends
‘cpu’
- Keyword Arguments
ratio (float) – Ratio of the canvas size to the input size; the value must be at least 1.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
ltrb (bool, optional, default = False) – True for
ltrb
or False forxywh
.paste_x (float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0).
paste_y (float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0).
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
ratio (TensorList of float) – Ratio of the canvas size to the input size; the value must be at least 1.
paste_x (TensorList of float, optional, default = 0.5) – Horizontal position of the paste in image coordinates (0.0 - 1.0).
paste_y (TensorList of float, optional, default = 0.5) – Vertical position of the paste in image coordinates (0.0 - 1.0).
-
class
nvidia.dali.ops.
BbFlip
(**kwargs)¶ Flips bounding boxes horizontaly or verticaly (mirror).
The bounding box coordinates for the input are in the [x, y, width, height] -
xywh
or [left, top, right, bottom] -ltrb
format. All coordinates are in the image coordinate system, that is 0.0-1.0- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
horizontal (int, optional, default = 1) – Flip horizontal dimension.
ltrb (bool, optional, default = False) – True for
ltrb
or False forxywh
.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
vertical (int, optional, default = 0) – Flip vertical dimension.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
horizontal (TensorList of int, optional, default = 1) – Flip horizontal dimension.
vertical (TensorList of int, optional, default = 0) – Flip vertical dimension.
-
class
nvidia.dali.ops.
BoxEncoder
(**kwargs)¶ Encodes the input bounding boxes and labels using a set of default boxes (anchors) passed as an argument.
This operator follows the algorithm described in “SSD: Single Shot MultiBox Detector” and implemented in https://github.com/mlperf/training/tree/master/single_stage_detector/ssd. Inputs must be supplied as the following Tensors:
BBoxes
that contain bounding boxes that are represented as[l,t,r,b]
.Labels
that contain the corresponding label for each bounding box.
The results are two tensors:
EncodedBBoxes
that contain M-encoded bounding boxes as[l,t,r,b]
, where M is number of anchors.EncodedLabels
that contain the corresponding label for each encoded box.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
anchors (float or list of float) – Anchors to be used for encoding, as the list of floats is in the
ltrb
format.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
criteria (float, optional, default = 0.5) –
Threshold IoU for matching bounding boxes with anchors.
The value needs to be between 0 and 1.
means (float or list of float, optional, default = [0.0, 0.0, 0.0, 0.0]) – [x y w h] mean values for normalization.
offset (bool, optional, default = False) – Returns normalized offsets
((encoded_bboxes*scale - anchors*scale) - mean) / stds
in EncodedBBoxes that usestd
and themean
andscale
arguments.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
scale (float, optional, default = 1.0) – Rescales the box and anchor values before the offset is calculated (for example, to return to the absolute values).
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
stds (float or list of float, optional, default = [1.0, 1.0, 1.0, 1.0]) – [x y w h] standard deviations for offset normalization.
-
__call__
(*inputs, **kwargs)¶ See
nvidia.dali.ops.BoxEncoder()
class for complete information.
-
class
nvidia.dali.ops.
Brightness
(**kwargs)¶ Changes the brightness of the image.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
brightness (float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
brightness (TensorList of float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
-
class
nvidia.dali.ops.
BrightnessContrast
(**kwargs)¶ Adjusts the brightness and contrast of the images based on the following formula:
out = brightness_shift * output_range + brightness * (grey + contrast * (in - grey))
Where:
The output_range is 1 for float outputs or the maximum positive value for integral types.
Grey denotes the value of 0.5 for
float
, 128 foruint8
, and 16384 forint16
, and so on.
This operator can also change the type of data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
brightness (float, optional, default = 1.0) – Brightness mutliplier.
brightness_shift (float, optional, default = 0.0) –
The brightness shift.
For signed types, 1.0 represents the maximum positive value that can be represented by the type.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
contrast (float, optional, default = 1.0) – The contrast multiplier, where 0.0 produces the uniform grey.
contrast_center (float, optional, default = 0.5) –
The intensity level that is unaffected by contrast.
This is the value that all pixels assume when the contrast is zero. When not set, the half of the input type’s positive range (or 0.5 for
float
) is used.dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
If not set, the input type is used.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
brightness (TensorList of float, optional, default = 1.0) – Brightness mutliplier.
brightness_shift (TensorList of float, optional, default = 0.0) –
The brightness shift.
For signed types, 1.0 represents the maximum positive value that can be represented by the type.
contrast (TensorList of float, optional, default = 1.0) – The contrast multiplier, where 0.0 produces the uniform grey.
-
class
nvidia.dali.ops.
COCOReader
(**kwargs)¶ Reads data from a COCO dataset that is composed of a directory with images and annotation files. For each image with m bboxes, the bboxes are returned as
(m,4)
Tensor (m * [x, y, w, h]
orm * [left, top, right, bottom]
) and labels as(m,1)
Tensor (m * category_id
).- Supported backends
‘cpu’
- Keyword Arguments
file_root (str) – Path to a directory that contains the data files.
annotations_file (str, optional, default = '') – List of paths to the JSON annotations files.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
dump_meta_files (bool, optional, default = False) – If set to True, the operator dumps the meta files in the folder that is provided with
dump_meta_files_path
.dump_meta_files_path (str, optional, default = '') – Path to the directory in which to save the meta files that contain the preprocessed COCO annotations.
file_list (str, optional, default = '') –
Path to the file that contains a list of whitespace separated
file id
pairs.To traverse the file_root directory and obtain files and labels, leave this value empty.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
ltrb (bool, optional, default = False) –
If set to True, bboxes are returned as [left, top, right, bottom].
If set to False, the bboxes are returned as [x, y, width, height].
masks (bool, optional, default = False) –
If set to True, segmentation masks are read and returned as polygons, represented by a list of coordinates.
Each mask can be one or more polygons, and for a given sample, the polygons are represented by the following tensors:
masks_meta
-> list of tuples (mask_idx, start_idx, end_idx)masks_coords
-> list of (x,y) coordinates
One mask can have one or more
masks_meta
values that have the samemask_idx
. This means that the mask for that given index consists of several polygons.start_idx
indicates the index of the first coordinates inmasks_coords
. Currently objects withiscrowd=1
annotations are skipped because RLE masks are not suitable for instance segmentation.meta_files_path (str, optional, default = '') – Path to the directory with meta files that contain preprocessed COCO annotations.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
pixelwise_masks (bool, optional, default = False) – If true, segmentation masks are read and returned as pixel-wise masks.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.ratio (bool, optional, default = False) – If set to True, the returned bbox coordinates are relative to the image size.
read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
save_img_ids (bool, optional, default = False) – If set to True, the image IDs are also returned.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch (bool, optional, default = False) – If set to True, the reader shuffles the entire dataset after each epoch.
size_threshold (float, optional, default = 0.1) –
If the width or the height of a bounding box that represents an instance of an object is lower than this value, the object will be ignored.
The value is represented as an absolute value.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
skip_empty (bool, optional, default = False) – If true, reader will skip samples with no object instances in them
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
Caffe2Reader
(**kwargs)¶ Reads sample data from a Caffe2 Lightning Memory-Mapped Database (LMDB).
- Supported backends
‘cpu’
- Keyword Arguments
path (str or list of str) – List of paths to the Caffe2 LMDB directories.
additional_inputs (int, optional, default = 0) – Additional auxiliary data tensors that are provided for each sample.
bbox (bool, optional, default = False) – Denotes whether the bounding-box information is present.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
image_available (bool, optional, default = True) – Determines whether an image is available in this LMDB.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.label_type (int, optional, default = 0) –
Type of label stored in dataset.
Here is a list of the available values:
0 = SINGLE_LABEL: which is the integer label for the multi-class classification.
1 = MULTI_LABEL_SPARSE: which is the sparse active label indices for multi-label classification.
2 = MULTI_LABEL_DENSE: which is the dense label embedding vector for label embedding regression.
3 = MULTI_LABEL_WEIGHTED_SPARSE: which is the sparse active label indices with per-label weights for multi-label classification.
4 = NO_LABEL: where no label is available.
lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_labels (int, optional, default = 1) –
Number of classes in the dataset.
Required when sparse labels are used.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
CaffeReader
(**kwargs)¶ Reads (Image, label) pairs from a Caffe LMDB.
- Supported backends
‘cpu’
- Keyword Arguments
path (str or list of str) – List of paths to the Caffe LMDB directories.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
image_available (bool, optional, default = True) – Determines whether an image is available in this LMDB.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.label_available (bool, optional, default = True) – Determines whether a label is available.
lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
Cast
(**kwargs)¶ Cast tensor to a different type.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
dtype (nvidia.dali.types.DALIDataType) – Output data type.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
CoinFlip
(**kwargs)¶ Produces a batch of tensors filled with 0s and 1s, which is the result of random coin flips.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
probability (float, optional, default = 0.5) – Probability of returning a value of 1.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
ColorSpaceConversion
(**kwargs)¶ Converts between various image color models.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
image_type (nvidia.dali.types.DALIImageType) – The color space of the input image.
output_type (nvidia.dali.types.DALIImageType) – The color space of the output image.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
-
class
nvidia.dali.ops.
ColorTwist
(**kwargs)¶ Adjusts hue, saturation and brightness of the image.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
brightness (float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
contrast (float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
hue (float, optional, default = 0.0) – Hue change, in degrees.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
saturation (float, optional, default = 1.0) –
Saturation change factor.
Values must be non-negative.
Example values:
0 – Completely desaturated image.
1 - No change to image’s saturation.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
brightness (TensorList of float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
contrast (TensorList of float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
hue (TensorList of float, optional, default = 0.0) – Hue change, in degrees.
saturation (TensorList of float, optional, default = 1.0) –
Saturation change factor.
Values must be non-negative.
Example values:
0 – Completely desaturated image.
1 - No change to image’s saturation.
-
class
nvidia.dali.ops.
Constant
(**kwargs)¶ Produces a batch of constant tensors.
The floating point input data should be placed in the
fdata
argument and integer data inidata
. The data, which can be a flat vector of values or a scalar, is then reshaped according to theshape
argument. If the data is scalar, it will be broadcast to fill the entire shape.The operator only performs meaningful work at first invocation; subsequent calls will return a reference to the same memory.
The operator can be automatically instantiated in Python with a call to
types.Constant(value, dtype, shape, layout)
. The value can be a scalar, a tuple, a list, or a numpy array. If not explicitly overridden, theshape
anddtype
, will be taken from the array.Warning
64-bit integer and double precision arrays are not supported and will be silently downgraded to 32-bit.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
If this value is not set, the output is float if the fdata argument is used and int if idata is used.
fdata (float or list of float, optional, default = []) –
Contents of the constant that is produced (for floating point types).
Note
fdata
andidata
are mutually exclusive, and one of them is required.idata (int or list of int, optional, default = []) –
Contents of the constant that is produced (for integer point types).
Note
fdata
andidata
are mutually exclusive, and one of them is required.layout (layout str, optional, default = ‘’) –
Layout info.
If set and not empty, the layout must match the dimensionality of the output.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = []) – The desired shape of the output. If not set, the data is assumed to be 1D
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
Contrast
(**kwargs)¶ Changes the color contrast of the image.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
contrast (float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
contrast (TensorList of float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
-
class
nvidia.dali.ops.
CoordFlip
(**kwargs)¶ Transforms vectors or points by flipping (reflecting) their coordinates with respect to a given center.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
center_x (float, optional, default = 0.5) – Flip center in the horizontal axis.
center_y (float, optional, default = 0.5) – Flip center in the vertical axis.
center_z (float, optional, default = 0.5) – Flip center in the depthwise axis.
flip_x (int, optional, default = 1) – Flip the horizontal (x) coordinate.
flip_y (int, optional, default = 0) – Flip the vertical (y) coordinate.
flip_z (int, optional, default = 0) – Flip the depthwise (z) coordinate.
layout (layout str, optional, default = ‘’) –
Determines the order of coordinates in the input.
The string should consist of the following characters:
”x” (horizontal coordinate),
”y” (vertical coordinate),
”z” (depthwise coordinate),
Note
If left empty, depending on the number of dimensions, the “x”, “xy”, or “xyz” values are assumed.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
center_x (TensorList of float, optional, default = 0.5) – Flip center in the horizontal axis.
center_y (TensorList of float, optional, default = 0.5) – Flip center in the vertical axis.
center_z (TensorList of float, optional, default = 0.5) – Flip center in the depthwise axis.
flip_x (TensorList of int, optional, default = 1) – Flip the horizontal (x) coordinate.
flip_y (TensorList of int, optional, default = 0) – Flip the vertical (y) coordinate.
flip_z (TensorList of int, optional, default = 0) – Flip the depthwise (z) coordinate.
-
class
nvidia.dali.ops.
CoordTransform
(**kwargs)¶ Applies a linear transformation to points or vectors.
The transformation has the form:
out = M * in + T
Where
M
is am x n
matrix andT
is a translation vector with m components. Input must consist of n-element vectors or points and the output has m components.This operator can be used for many operations. Here’s the (incomplete) list:
applying affine transform to point clouds
projecting points onto a subspace
some color space conversions, for example RGB to YCbCr or grayscale
linear operations on colors, like hue rotation, brighness and contrast adjustment
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
M (float or list of float, optional) –
The matrix used for transforming the input vectors.
If left unspecified, identity matrix is used.
The matrix
M
does not need to be square - if it’s not, the output vectors will have a number of components equal to the number of rows inM
.If a scalar value is provided,
M
is assumed to be a square matrix with that value on the diagonal. The size of the matrix is then assumed to match the number of components in the input vectors.MT (float or list of float, optional) –
A block matrix [M T] which combines the arguments
M
andT
.Providing a scalar value for this argument is equivalent to providing the same scalar for M and leaving T unspecified.
The number of columns must be one more than the number of components in the input. This argument is mutually exclusive with
M
andT
.T (float or list of float, optional) –
The translation vector.
If left unspecified, no translation is applied unless MT argument is used.
The number of components of this vector must match the number of rows in matrix
M
. If a scalar value is provided, that value is broadcast to all components ofT
and the number of components is chosen to match the number of rows inM
.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Data type of the output coordinates.
If an integral type is used, the output values are rounded to the nearest integer and clamped to the dynamic range of this type.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
M (TensorList of float, optional) –
The matrix used for transforming the input vectors.
If left unspecified, identity matrix is used.
The matrix
M
does not need to be square - if it’s not, the output vectors will have a number of components equal to the number of rows inM
.If a scalar value is provided,
M
is assumed to be a square matrix with that value on the diagonal. The size of the matrix is then assumed to match the number of components in the input vectors.MT (TensorList of float, optional) –
A block matrix [M T] which combines the arguments
M
andT
.Providing a scalar value for this argument is equivalent to providing the same scalar for M and leaving T unspecified.
The number of columns must be one more than the number of components in the input. This argument is mutually exclusive with
M
andT
.T (TensorList of float, optional) –
The translation vector.
If left unspecified, no translation is applied unless MT argument is used.
The number of components of this vector must match the number of rows in matrix
M
. If a scalar value is provided, that value is broadcast to all components ofT
and the number of components is chosen to match the number of rows inM
.
-
class
nvidia.dali.ops.
Copy
(**kwargs)¶ Creates a copy of the input tensor.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Crop
(**kwargs)¶ Crops the images with the specified window dimensions and window position (upper left corner).
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float, optional, default = [0.0, 0.0]) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
Supported types:
FLOAT
,FLOAT16
, andUINT8
.If not set, the input type is used.
fill_values (float or list of float, optional, default = [0.0]) –
Determines padding values and is only relevant if
out_of_bounds_policy
is set to “pad”.If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension
C
in the layout) in the output slice.out_of_bounds_policy (str, optional, default = 'error') –
Determines the policy when slicing the out of bounds area of the input.
Here is a list of the supported values:
"error"
(default): Attempting to slice outside of the bounds of the image will produce an error."pad"
: The input will be padded as needed with zeros or any other value that is specified with thefill_values
argument."trim_to_shape"
: The slice window will be cut to the bounds of the input.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
crop_d (TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).
-
class
nvidia.dali.ops.
CropMirrorNormalize
(**kwargs)¶ Performs fused cropping, normalization, format conversion (NHWC to NCHW) if desired, and type casting.
Normalization takes the input images and produces the output by using the following formula:
output = (input - mean) / std
Note
If no cropping arguments are specified, only mirroring and normalization will occur.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float, optional, default = [0.0, 0.0]) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Output data type.
Supported types:
FLOAT
,FLOAT16
, andUINT8
.If not set, the input type is used.
fill_values (float or list of float, optional, default = [0.0]) –
Determines padding values and is only relevant if
out_of_bounds_policy
is set to “pad”.If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension
C
in the layout) in the output slice.mean (float or list of float, optional, default = [0.0]) – Mean pixel values for image normalization.
mirror (int, optional, default = 0) – If nonzero, the image will be flipped (mirrored) horizontally.
out_of_bounds_policy (str, optional, default = 'error') –
Determines the policy when slicing the out of bounds area of the input.
Here is a list of the supported values:
"error"
(default): Attempting to slice outside of the bounds of the image will produce an error."pad"
: The input will be padded as needed with zeros or any other value that is specified with thefill_values
argument."trim_to_shape"
: The slice window will be cut to the bounds of the input.
output_layout (layout str, optional, default = ‘CHW’) – Tensor data layout for the output.
pad_output (bool, optional, default = False) – Determines whether to pad the output to the number of channels as a power of 2).
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
std (float or list of float, optional, default = [1.0]) – Standard deviation values for image normalization.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
crop_d (TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).mirror (TensorList of int, optional, default = 0) – If nonzero, the image will be flipped (mirrored) horizontally.
-
class
nvidia.dali.ops.
DLTensorPythonFunction
(function, num_outputs=1, device='cpu', synchronize_stream=True, batch_processing=True, **kwargs)¶ Executes a Python function that operates on DLPack tensors.
The function should not modify input tensors.
For the GPU operator, it is the user’s responsibility to synchronize the device code with DALI. To synchronize the device code with DALI, synchronize DALI’s work before the operator call with the
synchronize_stream
flag (enabled by default) and ensure that the scheduled device tasks are finished in the operator call. The GPU code can be executed on the CUDA stream used by DALI, which can be obtained by calling thecurrent_dali_stream()
function. In this case, thesynchronize_stream
flag can be set to False.This operator allows sequence inputs and supports volumetric data.
This operator will not be optimized out of the graph.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
function (object) – Function object.
batch_processing (bool, optional, default = False) –
Determines whether the function is invoked once per batch or separately for every sample in the batch.
If set to True, the function will receive its arguments as lists of DLPack tensors.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
num_outputs (int, optional, default = 1) – Number of outputs.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
synchronize_stream (bool, optional, default = True) –
Ensures that DALI synchronizes its CUDA stream before calling the Python function.
Warning
This argument should be set to False only if the called function schedules device work to the stream that is used by DALI.
-
class
nvidia.dali.ops.
DumpImage
(**kwargs)¶ Save images in batch to disk in PPM format.
Useful for debugging.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
input_layout (layout str, optional, default = ‘HWC’) – Layout of input images.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
suffix (str, optional, default = '') – Suffix to be added to output file names.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
ElementExtract
(**kwargs)¶ Extracts one or more elements from input.
This operator expects sequence inputs.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
element_map (int or list of int) – Indices of the elements to extract.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Erase
(**kwargs)¶ Erases one or more regions from the input tensors.
The region is specified by an
anchor
(starting point) and ashape
(dimensions). Only the relevant dimensions are specified. Not specified dimensions are treated as if the entire range of the axis was provided. To specify multiple regions,anchor
andshape
represent multiple points consecutively (for example,anchor
= (y0, x0, y1, x1, …) andshape
= (h0, w0, h1, w1, …)). Theanchor
andshape
arguments are interpreted based on the value of theaxis_names
argument, or, alternatively, the value of theaxes
argument. If noaxis_names
oraxes
arguments are provided, all dimensions exceptC
(channels) must be specified.Example 1:
anchor
= (10, 20),shape
= (190, 200),axis_names
= “HW”,fill_value
= 0input:
layout
= “HWC”,shape
= (300, 300, 3)The erase region covers the range between 10 and 200 in the vertical dimension (height) and between 20 and 220 in the horizontal dimension (width). The range for the channel dimension was not specified, so it is between 0 and 3. What gives:
output[y, x, c] = 0 if 20 <= x < 220 and 10 <= y < 200 output[y, x, c] = input[y, x, c] otherwise
Example 2:
anchor
= (10, 250),shape
= (20, 30),axis_names
= “W”,fill_value
= (118, 185, 0)input:
layout
= “HWC”,shape
= (300, 300, 3)Two erase regions are provided, which covers two vertical bands that range from x=(10, 30) and x=(250, 280), respectively. Each pixel in the erased regions is filled with a multi-channel value (118, 185, 0). What gives:
output[y, x, :] = (118, 185, 0) if 10 <= x < 30 or 250 <= x < 280 output[y, x, :] = input[y, x, :] otherwise
Example 3:
anchor
= (0.15, 0.15),shape
= (0.3, 0.3),axis_names
= “HW”,fill_value
= 100,normalized
= Trueinput:
layout
= “HWC”,shape
= (300, 300, 3)One erase region with normalized coordinates in the height, and the width dimensions is provided. A fill value is provided for all the channels. The coordinates can be transformed to the absolute by multiplying by the input shape. What gives:
output[y, x, c] = 100 if 0.15 * 300 <= x < (0.3 + 0.15) * 300 and 0.15 * 300 <= y < (0.3 + 0.15) * 300 output[y, x, c] = input[y, x, c] otherwise
Example 4:
anchor
= (0.15, 0.15),shape
= (20, 30),normalized_anchor
= True,normalized_shape
= Falseinput:
layout
= “HWC”,shape
= (300, 300, 3)One erase region with an anchor is specified in normalized coordinates and the shape in absolute coordinates. Since no axis_names is provided, the anchor and shape must contain all dimensions except “C” (channels). What gives:
output[y, x, c] = 0 if 0.15 * 300 <= x < (0.15 * 300) + 20 and (0.15 * 300) <= y < (0.15 * 300) + 30 output[y, x, c] = input[y, x, c] otherwise
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
anchor (float or list of float, optional, default = []) –
Coordinates for the anchor or the starting point of the erase region.
Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.
axes (int or list of int, optional, default = [1, 0]) –
Order of dimensions used for anchor and shape arguments, as dimension indices.
For instance, axes=(1, 0) means the coordinates in anchor and shape refer to axes 1 and 0 in that particular order.
axis_names (str, optional, default = 'HW') –
Order of dimensions that are used for the anchor and shape arguments, as described in the layout.
For instance, axis_names=”HW” means that the coordinates in anchor and shape refer to dimensions H (height) and W (width) in that particular order.
Note
axis_name*s has a higher priority than *axes.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
centered_anchor (bool, optional, default = False) –
If set to True, the anchors refer to the center of the region instead of the top-left corner.
This results in centered erased regions at the specified anchor.
fill_value (float or list of float, optional, default = [0.0]) –
Value to fill the erased region.
Might be specified as one value (for example, 0) or a multi-channel value (for example, (200, 210, 220)). If a multi-channel fill value is provided, the input layout should contain a channel dimension
C
.normalized (bool, optional, default = False) –
Determines whether the anchor and shape arguments should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.
Providing a value for the normalized_shape and normalized_anchor arguments separately is mutually exclusive.
normalized_anchor (bool, optional, default = False) –
Determines whether the anchor argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.
Providing a value for normalized is mutually exclusive.
normalized_shape (bool, optional, default = False) –
Determines whether the shape argument should be interpreted as normalized (range [0.0, 1.0]) or absolute coordinates.
Providing a value for normalized is mutually exclusive.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (float or list of float, optional, default = []) –
Values for shape or dimensions of the erase region.
Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
anchor (TensorList of float, optional, default = []) –
Coordinates for the anchor or the starting point of the erase region.
Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.
shape (TensorList of float, optional, default = []) –
Values for shape or dimensions of the erase region.
Only the coordinates of the relevant dimensions that are specified by axis_names or axes should be provided.
-
class
nvidia.dali.ops.
ExternalSource
(source=None, num_outputs=None, *, cycle=None, layout=None, name=None, device='cpu', cuda_stream=None, use_copy_kernel=None, **kwargs)¶ ExternalSource is a special operator that can provide data to a DALI pipeline from Python by several methods.
The simplest and preferred way is to specify a
source
, which can be a callable or iterable.Warning
nvidia.dali.ops.ExternalSource()
operator is not compatible with TensorFlow integration.Note
To return a batch of copies of the same tensor, use
nvidia.dali.types.Constant()
, which is more performant.- Parameters
source (callable or iterable) –
The source of the data.
The source is polled for data (via a call
source()
ornext(source)
) when the pipeline needs input for the next iteration. Depending on the value ofnum_outputs
, the source can supply one or more data batches. Ifnum_outputs
is not set, thesource
is expected to return one batch. If this value is specified, the data is expected to a be tuple, or list, where each element corresponds to respective return value of the external_source. If the source is a callable that accepts a positional argument, it is assumed to be the current iteration number and consecutive calls will besource(0)
,source(1)
, and so on. If the source is a generator function, the function is invoked and treated as an iterable. However, unlike a generator, the function can be used withcycle
. In this case, the function will be called again when the generator reaches the end of iteration.For the GPU input, it is a user’s responsibility to modify the provided GPU memory content only in the provided stream. DALI schedules a copy on this stream, and all work is properly queued. If no stream is provided, DALI will use a default, with a best-effort approach at correctness. See the
cuda_stream
argument documentation for more information. The data batch produced bysource
may be anything that’s accepted bynvidia.dali.pipeline.Pipeline.feed_input()
num_outputs (int, optional) – If specified, denotes the number of TensorLists that are produced by the source function.
- Keyword Arguments
cycle (bool) –
If set to True, the source will be wrapped.
If set to False, StopIteration is raised when the end of data is reached. This flag requires that the
source
is a collection, for example, an iterable object whereiter(source)
returns a fresh iterator on each call or a gensource erator function. In the latter case, the generator function is called again when more data than was yielded by the function is requested.name (str, optional) –
The name of the data node.
Used when feeding the data in
iter_setup
and can be omitted if the data is provided bysource
.
- layoutlayout str or list/tuple thereof
If provided, sets the layout of the data.
When
num_outputs > 1
, the layout can be a list that contains a distinct layout for each output. If the list has fewer thannum_outputs
elements, only the first outputs have the layout set, the rest of the outputs don’t have a layout set.- cuda_streamoptional,
cudaStream_t
or an object convertible tocudaStream_t
, such ascupy.cuda.Stream
ortorch.cuda.Stream
The CUDA stream is used to copy data to the GPU or from a GPU source.
If this parameter is not set, a best-effort will be taken to maintain correctness. That is, if the data is provided as a tensor/array from a recognized library such as CuPy or PyTorch, the library’s current stream is used. Although this approach works in typical scenarios, with advanced use cases, and code that uses unsupported libraries, you might need to explicitly supply the stream handle.
- This argument has two special values:
0 - Use the default CUDA stream
1 - Use DALI’s internal stream
If internal stream is used, the call to
feed_input
will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.- use_copy_kerneloptional, bool
If set to True, DALI will use a CUDA kernel to feed the data instead of cudaMemcpyAsync (default).
Note
This is applicable only when copying data to and from GPU memory.
- blockingoptional
Determines whether the external source should wait until data is available or just fail when the data is not available.
- no_copyoptional
Determines whether DALI should copy the buffer when feed_input is called.
If set to True, DALI passes the user memory directly to the pipeline, instead of copying it. It is the user responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.
The buffer can be modified or freed again after the output of the relevant iterations has been consumed. Effectively, it happens after
prefetch_queue_depth
orcpu_queue_depth * gpu_queue_depth
(when they are not equal) iterations following thefeed_input
call.The memory location must match the specified
device
parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.
-
__call__
(*, source=None, cycle=None, name=None, layout=None, cuda_stream=None, use_copy_kernel=None, **kwargs)¶ - Parameters
source (callable or iterable) –
The source of the data.
The source is polled for data (via a call
source()
ornext(source)
) when the pipeline needs input for the next iteration. Depending on the value ofnum_outputs
, the source can supply one or more data batches. Ifnum_outputs
is not set, thesource
is expected to return one batch. If this value is specified, the data is expected to a be tuple, or list, where each element corresponds to respective return value of the external_source. If the source is a callable that accepts a positional argument, it is assumed to be the current iteration number and consecutive calls will besource(0)
,source(1)
, and so on. If the source is a generator function, the function is invoked and treated as an iterable. However, unlike a generator, the function can be used withcycle
. In this case, the function will be called again when the generator reaches the end of iteration.For the GPU input, it is a user’s responsibility to modify the provided GPU memory content only in the provided stream. DALI schedules a copy on this stream, and all work is properly queued. If no stream is provided, DALI will use a default, with a best-effort approach at correctness. See the
cuda_stream
argument documentation for more information. The data batch produced bysource
may be anything that’s accepted bynvidia.dali.pipeline.Pipeline.feed_input()
num_outputs (int, optional) – If specified, denotes the number of TensorLists that are produced by the source function.
- Keyword Arguments
cycle (bool) –
If set to True, the source will be wrapped.
If set to False, StopIteration is raised when the end of data is reached. This flag requires that the
source
is a collection, for example, an iterable object whereiter(source)
returns a fresh iterator on each call or a gensource erator function. In the latter case, the generator function is called again when more data than was yielded by the function is requested.name (str, optional) –
The name of the data node.
Used when feeding the data in
iter_setup
and can be omitted if the data is provided bysource
.
- layoutlayout str or list/tuple thereof
If provided, sets the layout of the data.
When
num_outputs > 1
, the layout can be a list that contains a distinct layout for each output. If the list has fewer thannum_outputs
elements, only the first outputs have the layout set, the rest of the outputs don’t have a layout set.- cuda_streamoptional,
cudaStream_t
or an object convertible tocudaStream_t
, such ascupy.cuda.Stream
ortorch.cuda.Stream
The CUDA stream is used to copy data to the GPU or from a GPU source.
If this parameter is not set, a best-effort will be taken to maintain correctness. That is, if the data is provided as a tensor/array from a recognized library such as CuPy or PyTorch, the library’s current stream is used. Although this approach works in typical scenarios, with advanced use cases, and code that uses unsupported libraries, you might need to explicitly supply the stream handle.
- This argument has two special values:
0 - Use the default CUDA stream
1 - Use DALI’s internal stream
If internal stream is used, the call to
feed_input
will block until the copy to internal buffer is complete, since there’s no way to synchronize with this stream to prevent overwriting the array with new data in another stream.- use_copy_kerneloptional, bool
If set to True, DALI will use a CUDA kernel to feed the data instead of cudaMemcpyAsync (default).
Note
This is applicable only when copying data to and from GPU memory.
- blockingoptional
Determines whether the external source should wait until data is available or just fail when the data is not available.
- no_copyoptional
Determines whether DALI should copy the buffer when feed_input is called.
If set to True, DALI passes the user memory directly to the pipeline, instead of copying it. It is the user responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.
The buffer can be modified or freed again after the output of the relevant iterations has been consumed. Effectively, it happens after
prefetch_queue_depth
orcpu_queue_depth * gpu_queue_depth
(when they are not equal) iterations following thefeed_input
call.The memory location must match the specified
device
parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.
-
class
nvidia.dali.ops.
FastResizeCropMirror
(**kwargs)¶ Performs a fused resize, crop, mirror operation.
The operator handles both fixed and random resizing and cropping. Backprojects the desired crop through the resize operation to reduce the amount of work that is performed.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float, optional, default = [0.0, 0.0]) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
Supported types:
FLOAT
,FLOAT16
, andUINT8
.If not set, the input type is used.
fill_values (float or list of float, optional, default = [0.0]) –
Determines padding values and is only relevant if
out_of_bounds_policy
is set to “pad”.If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension
C
in the layout) in the output slice.interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
max_size (float or list of float, optional) –
Limit of the output size.
When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using
resize_shorter
argument or “not_smaller” mode or when some extents are left unspecified.This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.
Note
When used with “not_smaller” mode or
resize_shorter
argument,max_size
takes precedence and the aspect ratio is kept - for example, resizing withmode="not_smaller", size=800, max_size=1400
an image of size 1200x600 would be resized to 1400x700.mirror (int, optional, default = 0) –
Mask for the horizontal flip.
Supported values:
0 - Do not perform horizontal flip for this image.
1 - Performs horizontal flip for this image.
mode (str, optional, default = 'default') –
Resize mode.
Here is a list of supported modes:
"default"
- image is resized to the specified size.Missing extents are scaled with the average scale of the provided ones extents."stretch"
- image is resized to the specified size.Missing extents are not scaled at all."not_larger"
- image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output."not_smaller"
- image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.This argument is mutually exclusive with
resize_longer
andresize_shorter
out_of_bounds_policy (str, optional, default = 'error') –
Determines the policy when slicing the out of bounds area of the input.
Here is a list of the supported values:
"error"
(default): Attempting to slice outside of the bounds of the image will produce an error."pad"
: The input will be padded as needed with zeros or any other value that is specified with thefill_values
argument."trim_to_shape"
: The slice window will be cut to the bounds of the input.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
resize_longer (float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (float or list of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right
roi_start (float or list of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
size (float or list of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.subpixel_scale (bool, optional, default = True) –
If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.
Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
crop_d (TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).mirror (TensorList of int, optional, default = 0) –
Mask for the horizontal flip.
Supported values:
0 - Do not perform horizontal flip for this image.
1 - Performs horizontal flip for this image.
resize_longer (TensorList of float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (TensorList of float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (TensorList of float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (TensorList of float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (TensorList of float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (TensorList of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_start (TensorList of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.size (TensorList of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.
-
class
nvidia.dali.ops.
FileReader
(**kwargs)¶ Reads (file, label) pairs from a directory.
- Supported backends
‘cpu’
- Keyword Arguments
file_root (str) –
Path to a directory that contains the data files.
FileReader
supports a flat directory structure. Thefile_root
directory must contain directories with data files. To obtain the labels,FileReader
sorts directories infile_root
in alphabetical order and takes an index in this order as a class label.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
file_list (str, optional, default = '') –
Path to a text file that contains the rows of
filename label
pairs, where the filenames are relative tofile_root
.If left empty,
file_root
is traversed for subdirectories, which are only at one level down fromfile_root
, and contain files that are associated with the same label. When traversing subdirectories, the labels are assigned consecutive numbers.initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch (bool, optional, default = False) –
If set to True, the reader shuffles the entire dataset after each epoch.
stick_to_shard
andrandom_shuffle
cannot be used when this argument is set to True.skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
Flip
(**kwargs)¶ Flips the images in selected dimensions (horizontal, vertical, and depthwise).
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
depthwise (int, optional, default = 0) – Flip the depthwise dimension.
horizontal (int, optional, default = 1) – Flip the horizontal dimension.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
vertical (int, optional, default = 0) – Flip the vertical dimension.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('FDHWC', 'FHWC', 'DHWC', 'HWC', 'FCDHW', 'FCHW', 'CDHW', 'CHW')) – Input to the operator.
- Keyword Arguments
depthwise (TensorList of int, optional, default = 0) – Flip the depthwise dimension.
horizontal (TensorList of int, optional, default = 1) – Flip the horizontal dimension.
vertical (TensorList of int, optional, default = 0) – Flip the vertical dimension.
-
class
nvidia.dali.ops.
GaussianBlur
(**kwargs)¶ Applies a Gaussian Blur to the input.
Gaussian blur is calculated by applying a convolution with a Gaussian kernel, which can be parameterized with
windows_size
andsigma
. If only the sigma is specified, the radius of the Gaussian kernel defaults toceil(3 * sigma)
, so the kernel window size is2 * ceil(3 * sigma) + 1
.If only the window size is provided, the sigma is calculated by using the following formula:
radius = (window_size - 1) / 2 sigma = (radius - 1) * 0.3 + 0.8
The sigma and kernel window size can be specified as one value for all data axes or a value per data axis.
When specifying the sigma or window size per axis, the axes are provided same as layouts, from outermost to innermost.
Note
The channel
C
and frameF
dimensions are not considered data axes. If channels are present, only channel-first or channel-last inputs are supported.For example, with
HWC
input, you can providesigma=1.0
orsigma=(1.0, 2.0)
because there are two data axes, H and W.The same input can be provided as per-sample tensors.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
Supported type: FLOAT. If not set, the input type is used.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
sigma (float or list of float, optional, default = [0.0]) – Sigma value for the Gaussian Kernel.
window_size (int or list of int, optional, default = [0]) – The diameter of the kernel.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
sigma (TensorList of float, optional, default = [0.0]) – Sigma value for the Gaussian Kernel.
window_size (TensorList of int, optional, default = [0]) – The diameter of the kernel.
-
class
nvidia.dali.ops.
Hsv
(**kwargs)¶ Adjusts hue, saturation and value (brightness) of the images.
To change the hue, the saturation, and/or the value of the image, pass the corresponding coefficients. Remember that the
hue
is an additive delta argument, while forsaturation
andvalue
, the arguments are multiplicative.This operator accepts images in the RGB color space.
For performance reasons, the operation is approximated by a linear transform in the RGB space. The color vector is projected along the neutral (gray) axis, rotated based on the hue delta, scaled based on the value and saturation multipliers, and restored to the original color space.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
The output data type.
If a value is not set, the input type is used.
hue (float, optional, default = 0.0) –
Hue delta, in degrees.
The hue component can be interpreted as an angle and values outside 0-360 range wrap around, as they would in case of rotation.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
saturation (float, optional, default = 1.0) – The saturation multiplier.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
value (float, optional, default = 1.0) – The value multiplier.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
hue (TensorList of float, optional, default = 0.0) –
Hue delta, in degrees.
The hue component can be interpreted as an angle and values outside 0-360 range wrap around, as they would in case of rotation.
saturation (TensorList of float, optional, default = 1.0) – The saturation multiplier.
value (TensorList of float, optional, default = 1.0) – The value multiplier.
-
class
nvidia.dali.ops.
Hue
(**kwargs)¶ Changes the hue level of the image.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
hue (float, optional, default = 0.0) – The hue change in degrees.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
hue (TensorList of float, optional, default = 0.0) – The hue change in degrees.
-
class
nvidia.dali.ops.
ImageDecoder
(**kwargs)¶ Decodes images.
For jpeg images, depending on the backend selected (“mixed” and “cpu”), the implementation uses the nvJPEG library or libjpeg-turbo, respectively. Other image formats are decoded with OpenCV or other specific libraries, such as libtiff.
If used with a
mixed
backend, and the hardware is available, the operator will use a dedicated hardware decoder.The output of the decoder is in HWC layout.
Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000. Please note that GPU acceleration for JPEG 2000 decoding is only available for CUDA 11.
- Supported backends
‘cpu’
‘mixed’
- Keyword Arguments
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
cache_batch_copy (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, multiple images from the cache are copied with a batched copy kernel call. Otherwise, unless the order in the batch is the same as in the cache, each image is copied with
cudaMemcpy
.cache_debug (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints the debug information about the decoder cache.
cache_size (int, optional, default = 0) –
Applies only to the
mixed
backend type.Total size of the decoder cache in megabytes. When provided, the decoded images that are larger than
cache_threshold
will be cached in GPU memory.cache_threshold (int, optional, default = 0) –
Applies only to the
mixed
backend type.The size threshold, in bytes, for decoded images to be cached. When an image is cached, it no longer needs to be decoded when it is encountered at the operator input saving processing time.
cache_type (str, optional, default = '') –
Applies only to the
mixed
backend type.Here is a list of the available cache types:
threshold
: caches every image with a size that is larger thancache_threshold
untilthe cache is full.The warm-up time for threshold policy is 1 epoch.
largest
: stores the largest images that can fit in the cache.The warm-up time for largest policy is 2 epochsNote
To take advantage of caching, it is recommended to configure readers with stick_to_shard=True to limit the amount of unique images seen by each decoder instance in a multi node environment.
device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.hw_decoder_load (float, optional, default = 0.65) –
Applies only to the
mixed
backend type.Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100
hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
memory_stats (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for
device_memory_padding
andhost_memory_padding
for a dataset.Note
The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.
output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
split_stages (bool, optional, default = False) –
Applies only to the
mixed
backend type.If True, the operator will be split into two sub-stages: a CPU and GPU one.
use_chunk_allocator (bool, optional, default = False) –
Experimental, applies only to the
mixed
backend type.Uses the chunk pinned memory allocator and allocates a chunk of the
batch_size * prefetch_queue_depth
size during the construction and suballocates them at runtime. Whensplit_stages
is false, this argument is ignored.use_fast_idct (bool, optional, default = False) –
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
ImageDecoderCrop
(**kwargs)¶ Decodes images and extracts regions-of-interest (ROI) that are specified by fixed window dimensions and variable anchors.
When possible, the argument uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to reduce the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.
Note
ROI decoding is currently not compatible with hardware-based decoding. Using
nvidia.dali.ops.ImageDecoderCrop()
automatically disables hardware accelerated decoding. To use the hardware decoder, use thenvidia.dali.ops.ImageDecoder()
andnvidia.dali.ops.Crop()
operators instead.The output of the decoder is in HWC layout.
Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000. Please note that GPU acceleration for JPEG 2000 decoding is only available for CUDA 11.
- Supported backends
‘cpu’
‘mixed’
- Keyword Arguments
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float, optional, default = [0.0, 0.0]) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
memory_stats (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for
device_memory_padding
andhost_memory_padding
for a dataset.Note
The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.
output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
split_stages (bool, optional, default = False) –
Applies only to the
mixed
backend type.If True, the operator will be split into two sub-stages: a CPU and GPU one.
use_chunk_allocator (bool, optional, default = False) –
Experimental, applies only to the
mixed
backend type.Uses the chunk pinned memory allocator and allocates a chunk of the
batch_size * prefetch_queue_depth
size during the construction and suballocates them at runtime. Whensplit_stages
is false, this argument is ignored.use_fast_idct (bool, optional, default = False) –
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
crop_d (TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).
-
class
nvidia.dali.ops.
ImageDecoderRandomCrop
(**kwargs)¶ Decodes images and randomly crops them.
The cropping window’s area (relative to the entire image) and aspect ratio can be restricted to a range of values specified by
area
andaspect_ratio
arguments, respectively.When possible, the operator uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to reduce the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.
Note
ROI decoding is currently not compatible with hardware-based decoding. Using
nvidia.dali.ops.ImageDecoderRandomCrop()
automatically disables hardware accelerated decoding. To use the hardware decoder, use thenvidia.dali.ops.ImageDecoder()
andnvidia.dali.ops.RandomResizedCrop()
operators instead.The output of the decoder is in HWC layout.
Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000. Please note that GPU acceleration for JPEG 2000 decoding is only available for CUDA 11.
- Supported backends
‘cpu’
‘mixed’
- Keyword Arguments
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
memory_stats (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for
device_memory_padding
andhost_memory_padding
for a dataset.Note
The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.
num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.
output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_area (float or list of float, optional, default = [0.08, 1.0]) –
Range from which to choose random area fraction
A
.The cropped image’s area will be equal to
A
* original image’s area.random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
split_stages (bool, optional, default = False) –
Applies only to the
mixed
backend type.If True, the operator will be split into two sub-stages: a CPU and GPU one.
use_chunk_allocator (bool, optional, default = False) –
Experimental, applies only to the
mixed
backend type.Uses the chunk pinned memory allocator and allocates a chunk of the
batch_size * prefetch_queue_depth
size during the construction and suballocates them at runtime. Whensplit_stages
is false, this argument is ignored.use_fast_idct (bool, optional, default = False) –
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
ImageDecoderSlice
(**kwargs)¶ Decodes images and extracts regions of interest based on externally provided anchors and shapes.
Inputs must be supplied as tensors in the following order:
data
that contains the input data.anchor
that contains normalized or absolute coordinates, depending on thenormalized_anchor
value, for the starting point of the slice (x0, x1, x2, and so on),shape
that contains normalized or absolute coordinates, depending on thenormalized_shape
value, for the dimensions of the slice (s0, s1, s2, and so on).
The anchor and shape coordinates must be within the interval [0.0, 1.0] for normalized coordinates or within the image shape for the absolute coordinates. The
anchor
andshape
inputs will provide as many dimensions as were specified with argumentsaxis_names
oraxes
.By default, the
nvidia.dali.ops.ImageDecoderSlice()
operator uses normalized coordinates and “WH” order for the slice arguments.When possible, the argument uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to optimize the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.
Note
ROI decoding is currently not compatible with hardware-based decoding. Using
nvidia.dali.ops.ImageDecoderSlice()
automatically disables hardware accelerated decoding. To use the hardware decoder, use thenvidia.dali.ops.ImageDecoder()
andnvidia.dali.ops.Slice()
operators instead.The output of the decoder is in the HWC layout.
Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000. Please note that GPU acceleration for JPEG 2000 decoding is only available for CUDA 11.
- Supported backends
‘cpu’
‘mixed’
- Keyword Arguments
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for the anchor and shape slice inputs as dimension indices.
axis_names (layout str, optional, default = ‘WH’) –
Order of the dimensions used for the anchor and shape slice inputs, as described in layout.
If a value is provided,
axis_names
will have a higher priority thanaxes
.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
memory_stats (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for
device_memory_padding
andhost_memory_padding
for a dataset.Note
The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.
normalized_anchor (bool, optional, default = True) –
Determines whether the anchor input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.normalized_shape (bool, optional, default = True) –
Determines whether the shape input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.output_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
split_stages (bool, optional, default = False) –
Applies only to the
mixed
backend type.If True, the operator will be split into two sub-stages: a CPU and GPU one.
use_chunk_allocator (bool, optional, default = False) –
Experimental, applies only to the
mixed
backend type.Uses the chunk pinned memory allocator and allocates a chunk of the
batch_size * prefetch_queue_depth
size during the construction and suballocates them at runtime. Whensplit_stages
is false, this argument is ignored.use_fast_idct (bool, optional, default = False) –
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.
-
__call__
(*inputs, **kwargs)¶ See
nvidia.dali.ops.ImageDecoderSlice()
class for complete information.
-
class
nvidia.dali.ops.
Jitter
(**kwargs)¶ Performs a random Jitter augmentation.
The output images are produced by moving each pixel by a random amount, in the x and y dimensions, and bounded by half of the
nDegree
parameter.- Supported backends
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
fill_value (float, optional, default = 0.0) – Color value that is used for padding.
interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
mask (int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
nDegree (int, optional, default = 2) – Each pixel is moved by a random amount in the
[-nDegree/2, nDegree/2]
rangepreserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
mask (TensorList of int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
-
class
nvidia.dali.ops.
LookupTable
(**kwargs)¶ Maps the input to output by using a lookup table that is specified by
keys
andvalues
, and adefault_value
for unspecified keys.For example when
keys
andvalues
are used to define the lookup table in the following way:keys[] = {0, 2, 3, 4, 5, 3} values[] = {0.2, 0.4, 0.5, 0.6, 0.7, 0.10} default_value = 0.99 0 <= i < max(keys) lut[i] = values[keys.index[i]] if i in keys lut[i] = default_value otherwise
the operator creates the following table:
lut[] = {0.2, 0.99, 0.4, 0.10, 0.6, 0.7} // only last occurrence of a key is considered
and produces the output according to this formula:
Output[i] = lut[Input[i]] if 0 <= Input[i] <= len(lut) Output[i] = default_value otherwise
Here is a practical example, considering the table defined above:
Input[] = {1, 4, 1, 0, 100, 2, 3, 4} Output[] = {0.99, 0.6, 0.99, 0.2, 0.99, 0.4, 0.10, 0.6}
Note
Only integer types can be used as inputs for this operator.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
default_value (float, optional, default = 0.0) – Default output value for keys that are not present in the table.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.
keys (int or list of int, optional, default = []) –
A list of input values (keys) in the lookup table.
The length of
keys
andvalues
argument must match. The values inkeys
should be in the [0, 65535 ] range.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
values (float or list of float, optional, default = []) –
A list of mapped output
values
for eachkeys
entry.The length of the
keys
and thevalues
argument must match.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
MFCC
(**kwargs)¶ Computes Mel Frequency Cepstral Coefficiencs (MFCC) from a mel spectrogram.
- Supported backends
‘cpu’
- Keyword Arguments
axis (int, optional, default = 0) –
Axis over which the transform will be applied.
If a value is not provided, the outer-most dimension will be used.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dct_type (int, optional, default = 2) –
Discrete Cosine Transform type.
The supported types are 1, 2, 3, 4. The formulas that are used to calculate the DCT are equivalent to those described in https://en.wikipedia.org/wiki/Discrete_cosine_transform (the numbers correspond to types listed in https://en.wikipedia.org/wiki/Discrete_cosine_transform#Formal_definition).
lifter (float, optional, default = 0.0) –
Cepstral filtering coefficient, which is also known as the liftering coefficient.
If the lifter coefficient is greater than 0, the MFCCs will be scaled based on the following formula:
MFFC[i] = MFCC[i] * (1 + sin(pi * (i + 1) / lifter)) * (lifter / 2)
n_mfcc (int, optional, default = 20) – Number of MFCC coefficients.
normalize (bool, optional, default = False) –
If set to True, the DCT uses an ortho-normal basis.
Note
Normalization is not supported when dct_type=1.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
MXNetReader
(**kwargs)¶ Reads the data from an MXNet RecordIO.
- Supported backends
‘cpu’
- Keyword Arguments
index_path (str or list of str) –
List (of length 1) that contains a path to the index (.idx) file.
The file is generated by the MXNet’s
im2rec.py
script with the RecordIO file. The list can also be generated by using therec2idx
script that is distributed with DALI.path (str or list of str) – List of paths to the RecordIO files.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
MelFilterBank
(**kwargs)¶ Converts a spectrogram to a mel spectrogram by applying a bank of triangular filters.
Expects an input with at least 2 dimensions where the last two dimensions correspond to the fft bin index and the window index, respectively.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
freq_high (float, optional, default = 0.0) –
The maximum frequency.
If this value is not provided,
sample_rate /2
is used.freq_low (float, optional, default = 0.0) – The minimum frequency.
mel_formula (str, optional, default = 'slaney') –
Determines the formula that will be used to convert frequencies from hertz to mel and from mel to hertz.
The mel scale is a perceptual scale of pitches, so there is no single formula.
The supported values are:
slaney
, which follows Slaney’s MATLAB Auditory Modelling Work behavior.This formula is linear under 1 KHz and logarithmic above this value. The implementation is consistent with Librosa’s default implementation.htk
, which follows O’Shaughnessy’s book formula,m = 2595 * log10(1 + (f/700))
.This value is consistent with the implementation of the Hidden Markov Toolkit (HTK).
nfilter (int, optional, default = 128) – Number of mel filters.
normalize (bool, optional, default = True) –
Determines whether to normalize the triangular filter weights by the width of their frequency bands.
If set to True, the integral of the filter function is 1.
If set to False, the peak of the filter function will be 1.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
sample_rate (float, optional, default = 44100.0) – Sampling rate of the audio signal.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
NemoAsrReader
(**kwargs)¶ Reads automatic speech recognition (ASR) data (audio, text) from an NVIDIA NeMo compatible manifest.
Example manifest file:
{"audio_filepath": "path/to/audio1.wav", "duration": 3.45, "text": "this is a nemo tutorial"} {"audio_filepath": "path/to/audio1.wav", "offset": 3.45, "duration": 1.45, "text": "same audio file but using offset"} {"audio_filepath": "path/to/audio2.wav", "duration": 3.45, "text": "third transcript in this example"}
Note
Only
audio_filepath
is field mandatory. Ifduration
is not specified, the whole audio file will be used. A missingtext
field will produce an empty string as a text.Warning
Handling of
duration
andoffset
fields is not yet implemented. The current implementation always reads the whole audio file.This reader produces between 1 and 3 outputs:
Decoded audio data: float, shape=``(audio_length,)``
(optional, if
read_sample_rate=True
) Audio sample rate: float, shape=``(1,)``(optional, if
read_text=True
) Transcript text as a null terminated string: uint8, shape=``(text_len + 1,)``
- Supported backends
‘cpu’
- Keyword Arguments
manifest_filepaths (str or list of str) – List of paths to NeMo’s compatible manifest files.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
downmix (bool, optional, default = True) – If True, downmix all input channels to mono. If downmixing is turned on, decoder will produce always 1-D output
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Output data type.
Supported types:
INT16
,INT32
, andFLOAT
.initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
max_duration (float, optional, default = 0.0) –
If a value greater than 0 is provided, it specifies the maximum allowed duration, in seconds, of the audio samples.
Samples with a duration longer than this value will be ignored.
min_duration (float, optional, default = 0.0) –
- If a value greater than 0 is provided, it specifies the minimum allowed duration,
in seconds, of the audio samples.
Samples with a duration shorter than this value will be ignored.
normalize_text (bool, optional, default = False) –
If set to True, the text transcript will be stripped of leading and trailing whitespace and converted to lowercase.
Warning
Non-ASCII strings are not yet supported.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
quality (float, optional, default = 50.0) –
Resampling quality, 0 is lowest, 100 is highest.
0 corresponds to 3 lobes of the sinc filter; 50 gives 16 lobes and 100 gives 64 lobes.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
read_sample_rate (bool, optional, default = True) – Whether to output the sample rate for each sample as a separate output
read_text (bool, optional, default = True) – Whether to output the transcript text for each sample as a separate output
sample_rate (float, optional, default = -1.0) – If specified, the target sample rate, in Hz, to which the audio is resampled.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch (bool, optional, default = False) – If true, reader shuffles whole dataset after each epoch
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
NonsilentRegion
(**kwargs)¶ Performs leading and trailing silence detection in an audio buffer.
The operator returns the beginning and length of the non-silent region by comparing the short term power calculated for
window_length
of the signal with a silence cut-off threshold. The signal is considered to be silent when theshort_term_power_db
is less than thecutoff_db
. where:short_term_power_db = 10 * log10( short_term_power / reference_power )
Unless specified otherwise,
reference_power
is the maximum power of the signal.Inputs and outputs:
Input 0 - 1D audio buffer.
Output 0 - Index of the first sample in the nonsilent region.
Output 1 - Length of nonsilent region.
Note
If
Outputs[1] == 0
, the value inOutputs[0]
is undefined.- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
cutoff_db (float, optional, default = -60.0) – The threshold, in dB, below which the signal is considered silent.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
reference_power (float, optional, default = 0.0) –
The reference power that is used to convert the signal to dB.
If a value is not provided, the maximum power of the signal will be used as the reference.
reset_interval (int, optional, default = 8192) –
The number of samples after which the moving mean average is recalculated to avoid loss of precision.
If
reset_interval == -1
, or the input type allows exact calculation, the average will not be reset. The default value can be used for most of the use cases.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
window_length (int, optional, default = 2048) – Size of the sliding window used to calculate of the short-term power of the signal.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
NormalDistribution
(**kwargs)¶ Creates a batch of tensors filled with random values following a normal distribution.
This operator can be run in the following modes, which determine the
shape
of the output tensors/batch:Providing an input batch to this operator results in a batch of output tensors, which have the same
shape
as the input tensors.Providing a custom
shape
as an argument results in an output batch, where every tensor has the same (given)shape
.Providing no input arguments results in an output batch of scalars, distributed normally.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.
mean (float, optional, default = 0.0) – Mean value of the distribution.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = []) – Shape of an output tensor in a batch.
stddev (float, optional, default = 1.0) – Standard deviation of the distribution.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
mean (TensorList of float, optional, default = 0.0) – Mean value of the distribution.
stddev (TensorList of float, optional, default = 1.0) – Standard deviation of the distribution.
-
class
nvidia.dali.ops.
Normalize
(**kwargs)¶ Normalizes the input by removing the mean and dividing by the standard deviation.
The mean and standard deviation can be calculated internally for the specified subset of axes or can be externally provided as the
mean
andstddev
arguments.The normalization is done following the formula:
out = scale * (in - mean) / stddev + shift
The formula assumes that out and in are equally shaped tensors, but mean and stddev might be either tensors of same shape, scalars, or a mix of these.
Note
The expression follows the numpy broadcasting rules.
Sizes of the non-scalar
mean
andstddev
must have an extent of 1, if given axis is reduced, or match the corresponding extent of the input. A dimension is considered reduced if it is listed inaxes
oraxis_names
. If neither theaxes
nor theaxis_names
argument is present, the set of reduced axes is inferred by comparing the input shape to the shape of the mean/stddev arguments, but the set of reduced axes must be the same for all tensors in the batch.Here are some examples of valid argument combinations:
Per-sample normalization of dimensions 0 and 2:
axes = 0,2 # optional input.shape = [ [480, 640, 3], [1080, 1920, 4] ] batch = False mean.shape = [ [1, 640, 1], [1, 1920, 1] ] stddev = (not supplied)
With these shapes, batch normalization is not possible, because the non-reduced dimension has a different extent across samples.
Batch normalization of dimensions 0 and 1:
axes = 0,1 # optional input.shape = [ [480, 640, 3], [1080, 1920, 3] ] batch = True mean = (scalar) stddev.shape = [ [1, 1, 3] ] ]
For color images, this example normalizes the 3 color channels separately, but across all samples in the batch.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
axes (int or list of int, optional, default = []) –
Indices of dimensions along which the input is normalized.
By default, all axes are used, and the axes can also be specified by name. See
axis_names
for more informaton.axis_names (layout str, optional, default = ‘’) –
Names of the axes in the input.
Axis indices are taken from the input layout, and this argument cannot be used with
axes
.batch (bool, optional, default = False) –
If set to True, the mean and standard deviation are calculated across tensors in the batch.
This argument also requires that the input sample shapes in the non-reduced axes match.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
ddof (int, optional, default = 0) –
Delta Degrees of Freedom for Bessel’s correction.
The variance is estimated by using the following formula:
sum(Xi - mean)**2 / (N - ddof).
This argument is ignored when an externally supplied standard deviation is used.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) –
Output data type.
When using integral types, use
shift
andscale
to improve the usage of the output type’s dynamic range. Ifdtype
is an integral type, out of range values are clamped, and non-integer values are rounded to nearest integer.epsilon (float, optional, default = 0.0) – A value that is added to the variance to avoid division by small numbers.
mean (float, optional, default = 0.0) –
Mean value to be subtracted from the data.
The value can be a scalar or a batch of tensors with the same dimensionality as the input. The extent in each dimension must match the value of the input or be equal to 1. If the extent is 1, the value will be broadcast in this dimension. If the value is not specified, the mean is calculated from the input. A non-scalar mean cannot be used when batch argument is set to True.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
scale (float, optional, default = 1.0) –
The scaling factor applied to the output.
This argument is useful for integral output types.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shift (float, optional, default = 0.0) –
The value to which the mean will map in the output.
This argument is useful for unsigned output types.
stddev (float, optional, default = 0.0) –
Standard deviation value to scale the data.
See
mean
argument for more information about shape constraints. If a value is not specified, the standard deviation is calculated from the input. A non-scalarstddev
cannot be used whenbatch
argument is set to True.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
mean (TensorList of float, optional, default = 0.0) –
Mean value to be subtracted from the data.
The value can be a scalar or a batch of tensors with the same dimensionality as the input. The extent in each dimension must match the value of the input or be equal to 1. If the extent is 1, the value will be broadcast in this dimension. If the value is not specified, the mean is calculated from the input. A non-scalar mean cannot be used when batch argument is set to True.
stddev (TensorList of float, optional, default = 0.0) –
Standard deviation value to scale the data.
See
mean
argument for more information about shape constraints. If a value is not specified, the standard deviation is calculated from the input. A non-scalarstddev
cannot be used whenbatch
argument is set to True.
-
class
nvidia.dali.ops.
NumpyReader
(**kwargs)¶ Reads Numpy arrays from a directory.
- Supported backends
‘cpu’
- Keyword Arguments
file_root (str) –
Path to a directory containing the data files.
Supports flat directory structure. The
file_root
directory should contain directories with numpy files in them.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
file_filter (str, optional, default = ‘*.npy’) – If a value is specified, the string is interpreted as glob string to filter the list of files in the sub-directories of the
file_root
.file_list (str, optional, default = '') –
Path to a text file that contains the rows of
filename
entries, where the filenames are relative tofile_root
.If left empty,
file_root
is traversed for subdirectories, which are only at one level down fromfile_root
.initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
shuffle_after_epoch (bool, optional, default = False) –
If set to True, the reader shuffles the entire dataset after each epoch.
Using this argument is mutually exclusive with using
stick_to_shard
andrandom_shuffle
.skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
OldColorTwist
(**kwargs)¶ Warning
This operator is now deprecated. Use ColorTwist instead.
A combination of hue, saturation, contrast, and brightness.
Note
This is an old implementation which uses NPP.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
brightness (float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
contrast (float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
hue (float, optional, default = 0.0) – Hue change, in degrees.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
saturation (float, optional, default = 1.0) –
Saturation change factor.
Values must be non-negative.
Example values:
0 – Completely desaturated image.
1 - No change to image’s saturation.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
brightness (TensorList of float, optional, default = 1.0) –
Brightness change factor.
Values must be non-negative.
Example values:
0 - Black image.
1 - No change.
2 - Increase brightness twice.
contrast (TensorList of float, optional, default = 1.0) –
Contrast change factor.
Values must be non-negative.
Example values:
0 - Uniform grey image.
1 - No change.
2 - Increase brightness twice.
hue (TensorList of float, optional, default = 0.0) – Hue change, in degrees.
saturation (TensorList of float, optional, default = 1.0) –
Saturation change factor.
Values must be non-negative.
Example values:
0 – Completely desaturated image.
1 - No change to image’s saturation.
-
class
nvidia.dali.ops.
OneHot
(**kwargs)¶ Produces a one-hot encoding of the input.
If the input is not a scalar (tensor consisting from one value per sample), the operator will fail.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Output data type.
num_classes (int, optional, default = 0) – Number of all classes in the data.
off_value (float, optional, default = 0.0) –
Value that will be used to fill the output when
input[j] != i
.This value will be cast to the
dtype
type.on_value (float, optional, default = 1.0) –
Value that will be used to fill the output when
input[j] = i
.This value will be cast to the
dtype
type.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
OpticalFlow
(**kwargs)¶ Calculates the optical flow between images in the input.
The main input for this operator is a sequence of frames. Optionally, the operator can be provided with external hints for the optical flow calculation. The output format of this operator matches the output format of the optical flow driver API. Refer to https://developer.nvidia.com/opticalflow-sdk for more information about the Turing and Ampere optical flow hardware that is used by DALI.
This operator allows sequence inputs.
- Supported backends
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
enable_external_hints (bool, optional, default = False) –
Enables or disables the external hints for optical flow calculations.
External hints are analogous to temporal hints, but the only difference is that external hints come from an external source. When this option is enabled, the operator requires two inputs.
enable_temporal_hints (bool, optional, default = False) –
Enables or disables temporal hints for sequences that are longer than two images.
The hints are used to improve the quality of the output motion field as well as to speed up the calculations. The hints are especially useful in presence of large displacements or periodic patterns which might confuse the optical flow algorithms. )
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – Input color space (RGB, BGR or GRAY).
output_format (int, optional, default = -1) –
Sets the grid size for the output vector field.
This operator produces the motion vector field at a coarser resolution than the input pixels. This parameter specifies the size of the pixel grid cell corresponding to one motion vector. For example, a value of 4 will produce one motion vector for each 4x4 pixel block.
Note
Currently, only a grid_size=4 is supported.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
preset (float, optional, default = 0.0) –
Speed and quality level of the optical flow calculation.
Allowed values are:
0.0
is the lowest speed and the best quality.0.5
is the medium speed and quality.1.0
is the fastest speed and the lowest quality.
The lower the speed, the more additional pre- and postprocessing is used to enhance the quality of the optical flow result.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(*inputs, **kwargs)¶ See
nvidia.dali.ops.OpticalFlow()
class for complete information.
-
class
nvidia.dali.ops.
Pad
(**kwargs)¶ Pads all samples with the
fill_value
in the specified axes to match the biggest extent in the batch for those axes or to match the minimum shape specified.Here are a few examples:
1-D samples, fill_value = -1, axes = (0,)
The samples are padded in the first axis to match the extent of the largest sample.
input = [[3, 4, 2, 5, 4], [2, 2], [3, 199, 5]]; output = [[3, 4, 2, 5, 4], [2, 2, -1, -1, -1], [3, 199, 5, -1, -1]]
1-D samples, fill_value = -1, axes = (0,), shape = (7,)
The samples are padded in the first axis to a minimum extent of 7.
input = [[3, 4, 2, 5, 4], [2, 2], [3, 199, 5], [1, 2, 3, 4, 5, 6, 7, 8]]; output = [[3, 4, 2, 5, 4, -1, -1], [2, 2, -1, -1, -1, -1, -1], [3, 199, 5, -1, -1, -1, -1], [1, 2, 3, 4, 5, 6, 7, 8]]
1-D samples, fill_value = -1, axes = (0,), align = (4,)
The samples are padded in the first axis to match the extent of the largest sample and the alignment requirements. The output extent is 8, which is a result of rounding up the largest extent (5) to a multiple of alignment (4).
input = [[3, 4, 2, 5, 4], [2, 2], [3, 199, 5]]; output = [[3, 4, 2, 5, 4, -1, -1, -1], [2, 2, -1, -1, -1, -1, -1, -1], [3, 199, 5, -1, -1, -1, -1, -1]]
1-D samples, fill_value = -1, axes = (0,), shape = (1,), align = (2,)
The samples are padded in the first axis to match the alignments requirements only. The minimum extent (shape) is set to 1 to avoid any padding other than the necessary for alignment.
input = [[3, 4, 2, 5, 4], [2, 2], [3, 199, 5]]; output = [[3, 4, 2, 5, 4, -1], [2, 2], [3, 199, 5, -1]]
2-D samples, fill_value = 42, axes = (1,)
The samples are padded in the second axis to match the extent of the largest sample and uses a custom fill value 42 instead of the default 0.
input = [[[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2], [4, 5]]] output = [[[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 42, 42], [4, 5, 42, 42]]]
2-D samples, fill_value = 0, axes = (0, 1), align = (4, 5)
The samples are padded in the first and second axes to match the alignment requirements of each axis.
input = [[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]], [[1, 2], [4, 5]]] output = [[[1, 2, 3, 4, 0], [5, 6, 7, 8, 0], [9, 10, 11, 12, 0], [0, 0, 0, 0, 0]], [[1, 2, 0, 0, 0], [4, 5, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]]
2-D samples, fill_value = 0, axes = (0, 1), align = (1, 2), shape = (4, -1)
The samples are padded in the first axis to match a minimum extent of 4, and in the second axis to match the largest sample in the batch and an alignment of 2.
input = [[[1, 2, 3], [4, 5, 6]], [[1, 2], [4, 5], [6, 7]]] output = [[[1, 2, 3, 0], [4, 5, 6, 0], [0, 0, 0, 0], [0, 0, 0, 0]], [[1, 2, 0, 0], [4, 5, 0, 0], [6, 7, 0, 0], [0, 0, 0, 0]]]
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
align (int or list of int, optional, default = []) –
If specified, this argument determines the alignment on the dimensions specified by
axes
oraxis_names
.The extent on
axis = axes[i]
will be adjusted to be a multiple ofalign[i]
.If an integer value is provided, the alignment restrictions are applied to all the padded axes.
To use alignment only, that is without any default or explicit padding behavior, set the minimum
shape
to 1 for the specified axis.axes (int or list of int, optional, default = []) –
Indices of the axes on which the batch samples will be padded.
Indices are zero-based, with 0 being the outer-most dimension of the tensor. The
axis_names
andaxes
arguments are mutually exclusive. Ifaxes
andaxis_names
are empty, or have not been provided, the output will be padded on all of the axes.axis_names (layout str, optional, default = ‘’) –
Names of the axes on which the batch samples will be padded.
Dimension names should correspond to dimensions in the input layout. The
axis_names
andaxes
arguments are mutually exclusive. Ifaxes
andaxis_names
are empty, or have not been not provided, the output will be padded on all of the axes.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
fill_value (float, optional, default = 0.0) – The value to pad the batch with.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = []) –
The extents of the output shape in the axes specified by the
axes
oraxis_names
.Specifying -1 for an axis restores the default behavior of extending the axis to accommodate the aligned size of the largest sample in the batch.
If the provided extent is smaller than the one of the samples, padding will be applied only to match the required alignment. For example, to disable padding in an axis, except for the necessary for alignment, you can specify a value of 1.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Paste
(**kwargs)¶ Pastes the input images on a larger canvas, where the canvas size is equal to
input size * ratio
.- Supported backends
‘gpu’
- Keyword Arguments
fill_value (int or list of int) –
Tuple of the values of the color that is used to fill the canvas.
The length of the tuple must be equal to
n_channels
.ratio (float) – Ratio of canvas size to input size. Must be >= 1.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
min_canvas_size (float, optional, default = 0.0) – Enforces the minimum paste canvas dimension after scaling the input size by the ratio.
n_channels (int, optional, default = 3) – Number of channels in the image.
paste_x (float, optional, default = 0.5) – Horizontal position of the paste in (0.0 - 1.0) image coordinates.
paste_y (float, optional, default = 0.5) – Vertical position of the paste in (0.0 - 1.0) image coordinates.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
ratio (TensorList of float) – Ratio of canvas size to input size. Must be >= 1.
min_canvas_size (TensorList of float, optional, default = 0.0) – Enforces the minimum paste canvas dimension after scaling the input size by the ratio.
paste_x (TensorList of float, optional, default = 0.5) – Horizontal position of the paste in (0.0 - 1.0) image coordinates.
paste_y (TensorList of float, optional, default = 0.5) – Vertical position of the paste in (0.0 - 1.0) image coordinates.
-
class
nvidia.dali.ops.
PeekImageShape
(**kwargs)¶ Obtains the shape of the encoded image.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
type (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type, to which the sizes are converted.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
PowerSpectrum
(**kwargs)¶ Calculates power spectrum of the signal.
- Supported backends
‘cpu’
- Keyword Arguments
axis (int, optional, default = -1) –
Index of the dimension to be transformed to the frequency domain.
By default, the last dimension is selected.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
nfft (int, optional, default = -1) –
Size of the FFT.
By default, the
nfft
is selected to match the length of the data in the transformation axis.The number of bins that are created in the output is calculated with the following formula:
nfft // 2 + 1
Note
The output only represents the positive part of the spectrum.
power (int, optional, default = 2) –
Exponent of the FFT magnitude.
The supported values are:
2
for power spectrum(real*real + imag*imag)
1
for the complex magnitude(sqrt(real*real + imag*imag))
.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
PreemphasisFilter
(**kwargs)¶ Applies preemphasis filter to the input data.
This filter, in simple form, can be expressed by the formula:
Y[t] = X[t] - coeff * X[t-1]
Where:
X
is the input singal.Y
is the output signal.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.FLOAT) – Data type for the output.
preemph_coeff (float, optional, default = 0.97) – Preemphasis coefficient
coeff
.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
preemph_coeff (TensorList of float, optional, default = 0.97) – Preemphasis coefficient
coeff
.
-
class
nvidia.dali.ops.
PythonFunction
(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)¶ Executes a Python function.
This operator can be used to execute custom Python code in the DALI pipeline. The function receives the data from DALI as NumPy arrays in case of CPU operators or as CuPy arrays for GPU operators. It is expected to return the results in the same format. For a more universal data format, see
nvidia.dali.ops.DLTensorPythonFunction()
. The function should not modify input tensors.Warning
Currently, this operator can be used only in pipelines with the
exec_async=False
andexec_pipelined=False
values specified and should only be used for prototyping and debugging.This operator allows sequence inputs and supports volumetric data.
This operator will not be optimized out of the graph.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
function (object) – Function object.
batch_processing (bool, optional, default = False) –
Determines whether the function is invoked once per batch or separately for every sample in the batch.
If set to True, the function will receive its arguments as lists of NumPy or CuPy arrays, for CPU and GPU backend, respectively.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
num_outputs (int, optional, default = 1) – Number of outputs.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
static
current_stream
()¶ Gets DALI’s current CUDA stream.
-
class
nvidia.dali.ops.
RandomBBoxCrop
(**kwargs)¶ Applies a prospective random crop to an image coordinate space while keeping the bounding boxes, and optionally labels, consistent.
This means that after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped ROI. The applied random crop operation is constrained by the arguments that are provided to the operator.
The cropping window candidates are randomly selected until one matches the overlap restrictions that are specified by the
thresholds
argument.thresholds
values represent a minimum overlap metric that is specified bythreshold_type
, such as the intersection-over-union of the cropping window and the bounding boxes or the relative overlap as a ratio of the intersection area and the bounding box area.Additionally, if
allow_no_crop
is True, the cropping may be skipped entirely as one of the valid results of the operator.The following modes of a random crop are available:
- Randomly shaped window, which is randomly placed in the original input space.The random crop window dimensions are selected based on the provided
aspect_ratio
and relative area restrictions. - Fixed size window, which is randomly placed in the original input space.The random crop window dimensions are taken from the
crop_shape
argument and the anchor israndomly selected.When providingcrop_shape
, a second argument,input_shape
, specifying the original dimensions should be provided.Note
These dimensions are required to scale the output bounding boxes.
The num_attempts argument can be used to control the maximum number of attempts to produce a valid crop to match a minimum overlap metric value from
thresholds
.Warning
When
allow_no_crop
is False andthresholds
does not contain0.0
, if you do not increase thenum_attempts
value, it might continue to loop for a long time.Inputs: 0: bboxes, (1: labels)
The first input,
bboxes
, refers to the bounding boxes that are provided as a two-dimensional tensor where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.The coordinates are relative to the original image dimensions (that means, a range of
[0.0, 1.0]
) that represent the start and, depending on the value of bbox_layout, the end of the region or start and shape. For example,bbox_layout
=”xyXY” means the bounding box coordinates follow thestart_x
,start_y
,end_x
, andend_y
order, andbbox_layout
=”xyWH” indicates that the order isstart_x
,start_y
,width
, andheight
. See thebbox_layout
argument description for more information.Optionally, a second input, called
labels
, can be provided, which represents the labels that are associated with each of the bounding boxes.Outputs: 0: anchor, 1: shape, 2: bboxes, (3: labels)
The resulting crop parameters are provided as two separate outputs,
anchor
andshape
, that can be fed directly to thenvidia.dali.ops.Slice()
operator to complete the cropping of the original image.anchor
andshape
contain the starting coordinates and dimensions for the crop in the[x, y, (z)]
and[w, h, (d)]
formats, respectively. The coordinates can be represented in absolute or relative terms, and the represetnation depends on whether the fixedcrop_shape
was used.The third and fourth outputs correspond to the adjusted bounding boxes and, optionally, to their corresponding labels. Bounding boxes are always specified in relative coordinates.
- Supported backends
‘cpu’
- Keyword Arguments
all_boxes_above_threshold (bool, optional, default = True) –
If set to True, all bounding boxes in a sample should overlap with the cropping window as specified by
thresholds
.If the bounding boxes do not overlap, the cropping window is considered to be invalid. If set to False, and at least one bounding box overlaps the window, the window is considered to be valid.
allow_no_crop (bool, optional, default = True) – If set to True, one of the possible outcomes of the random process will be to not crop, as if the outcome was one more
thresholds
value from which to choose.aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –
Valid range of aspect ratio of the cropping windows.
This parameter can be specified as either two values (min, max) or six values (three pairs), depending on the dimensionality of the input.
- For 2D bounding boxes, one range of valid aspect ratios (x/y) should be provided (e.g.
[min_xy, max_xy]
). - For 3D bounding boxes, three separate aspect ratio ranges may be specified, for x/y, x/z and y/z pairs of dimensions.They are provided in the following order
[min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]
. Alternatively, if only one aspect ratio range is provided, it will be used for all three pairs of dimensions.
The value for
min
should be greater than0.0
, and min should be less than or equal to themax
value. By default, square windows are generated.Note
Providing
aspect_ratio
andscaling
is incompatible with explicitly specifyingcrop_shape
.bbox_layout (layout str, optional, default = ‘’) –
Determines the meaning of the coordinates of the bounding boxes.
The value of this argument is a string containing the following characters:
x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor), X (horizontal end anchor), Y (vertical end anchor), Z (depthwise end anchor), W (width), H (height), D (depth).
Note
If this value is left empty, depending on the number of dimensions, “xyXY” or “xyzXYZ” is assumed.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop_shape (int or list of int, optional, default = []) –
If provided, the random crop window dimensions will be fixed to this shape.
The order of dimensions is determined by the layout provided in
shape_layout
.Note
crop_shape
andinput_shape
should be provided together and providing those arguments is incompatible with usingscaling
andaspect_ratio
arguments.input_shape (int or list of int, optional, default = []) –
Specifies the shape of the original input image.
The order of dimensions is determined by the layout that is provided in
shape_layout
.Note
crop_shape
andinput_shape
should be provided together but providing those arguments is incompatiblescaling
andaspect_ratio
arguments.ltrb (bool, optional, default = True) –
If set to True, bboxes are returned as
[left, top, right, bottom]
; otherwise they are provided as[left, top, width, height]
.Warning
This argument has been deprecated. To specify the bbox encoding, use
bbox_layout
instead. For example,ltrb=True
is equal tobbox_layout
=”xyXY”, andltrb=False
corresponds tobbox_layout
=”xyWH”.num_attempts (int, optional, default = 1) –
Number of attempts to get a crop window that matches the
aspect_ratio
and a selected value fromthresholds
.After each
num_attempts
, a different threshold will be picked, until the threshold reaches a maximum oftotal_num_attempts
(if provided) or otherwise indefinitely.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
scaling (float or list of float, optional, default = [1.0, 1.0]) –
Range
[min, max]
for the crop size with respect to the original image dimensions.The value of
min
andmax
must satisfy the condition0.0 <= min <= max
.Note
Providing
aspect_ratio
andscaling
is incompatible when explicitly specifying thecrop_shape
value.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape_layout (layout str, optional, default = ‘’) –
Determines the meaning of the dimensions provided in
crop_shape
andinput_shape
.The values are:
W
(width)H
(height)D
(depth)
Note
If left empty, depending on the number of dimensions
"WH"
or"WHD"
will be assumed.threshold_type (str, optional, default = 'iou') –
Determines the meaning of
thresholds
.By default, thresholds refers to the intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, the threshold can be set to “overlap” to specify the fraction (by area) of the bounding box that will will fall inside the crop window. For example, a threshold value of
1.0
means the entire bounding box must be contained in the resulting cropping window.thresholds (float or list of float, optional, default = [0.0]) –
Minimum IoU or a different metric, if specified by
threshold_type
, of the bounding boxes with respect to the cropping window.Each sample randomly selects one of the
thresholds
, and the operator will complete up to the specified number of attempts to produce a random crop window that has the selected metric above that threshold. Seenum_attempts
for more information about configuring the number of attempts.total_num_attempts (int, optional, default = -1) –
If provided, it indicates the total maximum number of attempts to get a crop window that matches the
aspect_ratio
and any selected value fromthresholds
.After
total_num_attempts
attempts, the best candidate will be selected.If this value is not specified, the crop search will continue indefinitely until a valid crop is found.
Warning
If you do not provide a
total_num_attempts
value, this can result in an infinite loop if the conditions imposed by the arguments cannot be satisfied.
-
__call__
(boxes, labels=None, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
boxes (2D TensorList of float) – Relative coordinates of the bounding boxes that are represented as a 2D tensor, where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.
labels (1D TensorList of integers, optional) – Labels that are associated with each of the bounding boxes.
- Keyword Arguments
crop_shape (TensorList of int, optional, default = []) –
If provided, the random crop window dimensions will be fixed to this shape.
The order of dimensions is determined by the layout provided in
shape_layout
.Note
crop_shape
andinput_shape
should be provided together and providing those arguments is incompatible with usingscaling
andaspect_ratio
arguments.input_shape (TensorList of int, optional, default = []) –
Specifies the shape of the original input image.
The order of dimensions is determined by the layout that is provided in
shape_layout
.Note
crop_shape
andinput_shape
should be provided together but providing those arguments is incompatiblescaling
andaspect_ratio
arguments.
-
class
nvidia.dali.ops.
RandomResizedCrop
(**kwargs)¶ Performs a crop with a randomly selected area and aspect ratio and resizes it to the specified size.
Expects a three-dimensional input with samples in height, width, channels (HWC) layout.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
size (int or list of int) – Size of the resized image.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional) –
Output data type.
Must be same as input type or
float
. If not set, input type is used.interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.
num_attempts (int, optional, default = 10) – Maximum number of attempts used to choose random area and aspect ratio.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_area (float or list of float, optional, default = [0.08, 1.0]) –
Range from which to choose random area fraction
A
.The cropped image’s area will be equal to
A
* original image’s area.random_aspect_ratio (float or list of float, optional, default = [0.75, 1.333333]) – Range from which to choose random aspect ratio (width/height).
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
temp_buffer_hint (int, optional, default = 0) –
Initial size in bytes, of a temporary buffer for resampling.
Note
This argument is ignored for the CPU variant.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC', 'CHW')) – Input to the operator.
- Keyword Arguments
interp_type (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.mag_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
min_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
-
class
nvidia.dali.ops.
Reinterpret
(**kwargs)¶ Treats content of the input as if it had a different type, shape, and/or layout.
The buffer contents are not copied.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
The total size, in bytes, of the output must match the input. If no shape is provided, the innermost dimension is adjusted accordingly. If the byte size of the innermost dimension is not divisible by the size of the target type, an error occurs.
layout (layout str, optional, default = ‘’) –
New layout for the data.
If a value is not specified, if number of dimension matches existing layout, the output layout is preserved. If the number of dimensions does not match, the argument is reset to empty. If a value is set, and is not empty, the layout must match the dimensionality of the output.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
rel_shape (float or list of float, optional, default = []) –
The relative shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
and arel_shape = [0.5, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = []) –
The desired shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
andshape = [240, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
-
__call__
(data, shape_input=None, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Data to be reshaped
shape_input (1D TensorList of integers, optional) – Same as shape keyword argument
- Keyword Arguments
rel_shape (TensorList of float, optional, default = []) –
The relative shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
and arel_shape = [0.5, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
shape (TensorList of int, optional, default = []) –
The desired shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
andshape = [240, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
-
class
nvidia.dali.ops.
Reshape
(**kwargs)¶ Treats content of the input as if it had a different shape and/or layout.
The buffer contents are not copied.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
layout (layout str, optional, default = ‘’) –
New layout for the data.
If a value is not specified, if number of dimension matches existing layout, the output layout is preserved. If the number of dimensions does not match, the argument is reset to empty. If a value is set, and is not empty, the layout must match the dimensionality of the output.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
rel_shape (float or list of float, optional, default = []) –
The relative shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
and arel_shape = [0.5, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = []) –
The desired shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
andshape = [240, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
-
__call__
(data, shape_input=None, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Data to be reshaped
shape_input (1D TensorList of integers, optional) – Same as shape keyword argument
- Keyword Arguments
rel_shape (TensorList of float, optional, default = []) –
The relative shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
and arel_shape = [0.5, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
shape (TensorList of int, optional, default = []) –
The desired shape of the output.
Number of dimensions cannot exceed the number of dimensions of the input. There can be one negative extent that receives the size that is required to match the input volume. For example, an input of shape
[480, 640, 3]
andshape = [240, -1]
results in the shape[240, 3840]
.Note
rel_shape and shape are mutually exclusive.
-
class
nvidia.dali.ops.
Resize
(**kwargs)¶ Resize images.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional) –
Output data type.
Must be same as input type or
float
. If not set, input type is used.interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
max_size (float or list of float, optional) –
Limit of the output size.
When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using
resize_shorter
argument or “not_smaller” mode or when some extents are left unspecified.This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.
Note
When used with “not_smaller” mode or
resize_shorter
argument,max_size
takes precedence and the aspect ratio is kept - for example, resizing withmode="not_smaller", size=800, max_size=1400
an image of size 1200x600 would be resized to 1400x700.min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.
mode (str, optional, default = 'default') –
Resize mode.
Here is a list of supported modes:
"default"
- image is resized to the specified size.Missing extents are scaled with the average scale of the provided ones extents."stretch"
- image is resized to the specified size.Missing extents are not scaled at all."not_larger"
- image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output."not_smaller"
- image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.This argument is mutually exclusive with
resize_longer
andresize_shorter
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
resize_longer (float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (float or list of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right
roi_start (float or list of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.save_attrs (bool, optional, default = False) – Save reshape attributes for testing.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
size (float or list of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.subpixel_scale (bool, optional, default = True) –
If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.
Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.
temp_buffer_hint (int, optional, default = 0) –
Initial size in bytes, of a temporary buffer for resampling.
Note
This argument is ignored for the CPU variant.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC', 'FHWC', 'CHW', 'FCHW', 'CFHW', 'DHWC', 'FDHWC', 'CDHW', 'FCDHW', 'CFDHW')) – Input to the operator.
- Keyword Arguments
interp_type (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.mag_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
min_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
resize_longer (TensorList of float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (TensorList of float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (TensorList of float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (TensorList of float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (TensorList of float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (TensorList of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_start (TensorList of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.size (TensorList of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.
-
class
nvidia.dali.ops.
ResizeCropMirror
(**kwargs)¶ Performs a fused resize, crop, mirror operation. Both fixed and random resizing and cropping are supported.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float, optional, default = [0.0, 0.0]) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
Supported types:
FLOAT
,FLOAT16
, andUINT8
.If not set, the input type is used.
fill_values (float or list of float, optional, default = [0.0]) –
Determines padding values and is only relevant if
out_of_bounds_policy
is set to “pad”.If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension
C
in the layout) in the output slice.interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
max_size (float or list of float, optional) –
Limit of the output size.
When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using
resize_shorter
argument or “not_smaller” mode or when some extents are left unspecified.This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.
Note
When used with “not_smaller” mode or
resize_shorter
argument,max_size
takes precedence and the aspect ratio is kept - for example, resizing withmode="not_smaller", size=800, max_size=1400
an image of size 1200x600 would be resized to 1400x700.mirror (int, optional, default = 0) –
Mask for the horizontal flip.
Supported values:
0 - Do not perform horizontal flip for this image.
1 - Performs horizontal flip for this image.
mode (str, optional, default = 'default') –
Resize mode.
Here is a list of supported modes:
"default"
- image is resized to the specified size.Missing extents are scaled with the average scale of the provided ones extents."stretch"
- image is resized to the specified size.Missing extents are not scaled at all."not_larger"
- image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output."not_smaller"
- image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.This argument is mutually exclusive with
resize_longer
andresize_shorter
out_of_bounds_policy (str, optional, default = 'error') –
Determines the policy when slicing the out of bounds area of the input.
Here is a list of the supported values:
"error"
(default): Attempting to slice outside of the bounds of the image will produce an error."pad"
: The input will be padded as needed with zeros or any other value that is specified with thefill_values
argument."trim_to_shape"
: The slice window will be cut to the bounds of the input.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
resize_longer (float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (float or list of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right
roi_start (float or list of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
size (float or list of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.subpixel_scale (bool, optional, default = True) –
If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.
Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
crop_d (TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.crop_pos_y (TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.crop_pos_z (TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.crop_w (TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).mirror (TensorList of int, optional, default = 0) –
Mask for the horizontal flip.
Supported values:
0 - Do not perform horizontal flip for this image.
1 - Performs horizontal flip for this image.
resize_longer (TensorList of float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (TensorList of float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (TensorList of float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (TensorList of float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (TensorList of float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (TensorList of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_start (TensorList of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.size (TensorList of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.
-
class
nvidia.dali.ops.
Rotate
(**kwargs)¶ Rotates the images by the specified angle.
This operator supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
angle (float) –
Angle, in degrees, by which the image is rotated.
For two-dimensional data, the rotation is counter-clockwise, assuming the top-left corner is at
(0,0)
. For three-dimensional data, theangle
is a positive rotation around the provided axis.axis (float or list of float, optional, default = []) –
Applies only to three-dimension and is the axis around which to rotate the image.
The vector does not need to be normalized, but it must have a non-zero length. Reversing the vector is equivalent to changing the sign of
angle
.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
If not set, the input type is used.
fill_value (float, optional, default = 0.0) –
Value used to fill areas that are outside the source image.
If a value is not specified, the source coordinates are clamped and the border pixel is repeated.
interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
keep_size (bool, optional, default = False) –
If True, original canvas size is kept.
If set to False (default), and the size is not set, the canvas size is adjusted to accommodate the rotated image with the least padding possible.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
size (float or list of float, optional, default = []) –
Output size, in pixels/points.
Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify
(480,640)
, not(480,640,3)
.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC', 'DHWC')) – Input to the operator.
- Keyword Arguments
angle (TensorList of float) –
Angle, in degrees, by which the image is rotated.
For two-dimensional data, the rotation is counter-clockwise, assuming the top-left corner is at
(0,0)
. For three-dimensional data, theangle
is a positive rotation around the provided axis.axis (TensorList of float, optional, default = []) –
Applies only to three-dimension and is the axis around which to rotate the image.
The vector does not need to be normalized, but it must have a non-zero length. Reversing the vector is equivalent to changing the sign of
angle
.size (TensorList of float, optional, default = []) –
Output size, in pixels/points.
Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify
(480,640)
, not(480,640,3)
.
-
class
nvidia.dali.ops.
SSDRandomCrop
(**kwargs)¶ Performs a random crop with bounding boxes where Intersection Over Union (IoU) meets a randomly selected threshold between 0-1.
When the IoU falls below the threshold, a new random crop is generated up to num_attempts. As an input, the operator accepts image, bounding boxes and labels. At the output cropped image, cropped and valid bounding boxes and valid labels are returned.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
num_attempts (int, optional, default = 1) – Number of attempts.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(*inputs, **kwargs)¶ See
nvidia.dali.ops.SSDRandomCrop()
class for complete information.
-
class
nvidia.dali.ops.
Saturation
(**kwargs)¶ Changes the saturation level of the image.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
If not set, the input type is used.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the input and the output image.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
saturation (float, optional, default = 1.0) –
The saturation change factor.
Values must be non-negative.
Example values:
0 - Completely desaturated image.
1 - No change to image’s saturation.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
saturation (TensorList of float, optional, default = 1.0) –
The saturation change factor.
Values must be non-negative.
Example values:
0 - Completely desaturated image.
1 - No change to image’s saturation.
-
class
nvidia.dali.ops.
SequenceReader
(**kwargs)¶ Reads [Frame] sequences from a directory representing a collection of streams.
This operator expects
file_root
to contain a set of directories, where each directory represents an extracted video stream. This stream is represented by one file for each frame, sorted lexicographically. Sequences do not cross the stream boundary and only complete sequences are considered, so there is no padding.Example directory structure:
- file_root - 0 - 00001.png - 00002.png - 00003.png - 00004.png - 00005.png - 00006.png .... - 1 - 00001.png - 00002.png - 00003.png - 00004.png - 00005.png - 00006.png ....
Note
This operator is an analogue of VideoReader working on video frames extracted as separate images. It’s main purpose is for test baseline. For regular usage, the VideoReader is the recommended approach.
This operator allows sequence inputs.
- Supported backends
‘cpu’
- Keyword Arguments
file_root (str) – Path to a directory containing streams, where the directories represent streams.
sequence_length (int) – Length of sequence to load for each sample.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of input and output image.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
step (int, optional, default = 1) – Distance between first frames of consecutive sequences.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
stride (int, optional, default = 1) – Distance between consecutive frames in a sequence.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
SequenceRearrange
(**kwargs)¶ Rearranges frames in a sequence.
Assumes that the outermost dimension represents the frame index in the sequence. If the input has a non-empty layout description, it must start with
F
(frame).This operator allows sequence inputs.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
new_order (int or list of int) –
List that describes the new order for the elements in each sample.
Output sequence at position
i
will contain elementnew_order[i]
from input sequence:out[i, ...] = in[new_order[i], ...]
Elements can be repeated or dropped, but empty output sequences are not allowed. Only indices in
[0, input_outermost_extent)
are allowed to be used innew_order
. Can be specified per sample as 1D tensors.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
new_order (TensorList of int) –
List that describes the new order for the elements in each sample.
Output sequence at position
i
will contain elementnew_order[i]
from input sequence:out[i, ...] = in[new_order[i], ...]
Elements can be repeated or dropped, but empty output sequences are not allowed. Only indices in
[0, input_outermost_extent)
are allowed to be used innew_order
. Can be specified per sample as 1D tensors.
-
class
nvidia.dali.ops.
Shapes
(**kwargs)¶ Returns the shapes of inputs.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.INT64) – Data type to which the sizes are converted.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Slice
(**kwargs)¶ Extracts a subtensor, or slice, with a specified shape and anchor.
Inputs must be supplied as the following separate tensors in the following order:
data
anchor
shape
The
anchor
andshape
coordinates must be in the [0.0, 1.0] interval for normalized coordinates, or in the image shape for absolute coordinates. Theanchor
andshape
inputs must provide as many dimensions as are specified with theaxis_names
oraxes
arguments. By default, thenvidia.dali.ops.Slice()
operator uses normalized coordinates andWH
order for the slice arguments.This operator supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
axes (int or list of int, optional, default = [1, 0]) – Order of dimensions used for the anchor and shape slice inputs as dimension indices.
axis_names (layout str, optional, default = ‘WH’) –
Order of the dimensions used for the anchor and shape slice inputs, as described in layout.
If a value is provided,
axis_names
will have a higher priority thanaxes
.bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
Supported types:
FLOAT
,FLOAT16
, andUINT8
.If not set, the input type is used.
fill_values (float or list of float, optional, default = [0.0]) –
Determines padding values and is only relevant if
out_of_bounds_policy
is set to “pad”.If a scalar value is provided, it will be used for all the channels. If multiple values are provided, the number of values and channels must be identical (extent of dimension
C
in the layout) in the output slice.normalized_anchor (bool, optional, default = True) –
Determines whether the anchor input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.normalized_shape (bool, optional, default = True) –
Determines whether the shape input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.out_of_bounds_policy (str, optional, default = 'error') –
Determines the policy when slicing the out of bounds area of the input.
Here is a list of the supported values:
"error"
(default): Attempting to slice outside of the bounds of the image will produce an error."pad"
: The input will be padded as needed with zeros or any other value that is specified with thefill_values
argument."trim_to_shape"
: The slice window will be cut to the bounds of the input.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, anchor, shape, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Batch that contains the input data.
anchor (1D TensorList of float or int) –
Input that contains normalized or absolute coordinates for the starting point of the slice (x0, x1, x2, …).
Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of
normalized_anchor
.shape (1D TensorList of float or int) –
Input that contains normalized or absolute coordinates for the dimensions of the slice (s0, s1, s2, …).
Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of
normalized_shape
.
-
class
nvidia.dali.ops.
Spectrogram
(**kwargs)¶ Produces a spectrogram from a 1D signal (for example, audio).
Input data is expected to be one channel (shape being
(nsamples,)
,(nsamples, 1)
, or(1, nsamples)
) of type float32.- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
center_windows (bool, optional, default = True) –
Indicates whether extracted windows should be padded so that the window function is centered at multiples of
window_step
.If set to False, the signal will not be padded, that is, only windows within the input range will be extracted.
nfft (int, optional, default = -1) –
Size of the FFT.
The number of bins that are created in the output is
nfft // 2 + 1
.Note
The output only represents the positive part of the spectrum.
power (int, optional, default = 2) –
Exponent of the magnitude of the spectrum.
Supported values:
1
- amplitude,2
- power (faster to compute).
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
reflect_padding (bool, optional, default = True) –
Indicates the padding policy when sampling outside the bounds of the signal.
If set to True, the signal is mirrored with respect to the boundary, otherwise the signal is padded with zeros.
Note
When
center_windows
is set to False, this option is ignored.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
window_fn (float or list of float, optional, default = []) –
Samples of the window function that will be multiplied to each extracted window when calculating the STFT.
If a value is provided, it should be a list of floating point numbers of size
window_length
. If a value is not provided, a Hann window will be used.window_length (int, optional, default = 512) – Window size in number of samples.
window_step (int, optional, default = 256) – Step betweeen the STFT windows in number of samples.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Sphere
(**kwargs)¶ Performs a sphere augmentation.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
fill_value (float, optional, default = 0.0) – Color value that is used for padding.
interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
mask (int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
mask (TensorList of int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
-
class
nvidia.dali.ops.
TFRecordReader
(path, index_path, features, **kwargs)¶ Reads samples from a TensorFlow TFRecord file.
- Supported backends
‘cpu’
- Keyword Arguments
features (dict of (string, nvidia.dali.tfrecord.Feature)) –
A dictionary that maps names of the TFRecord features to extract to the feature type.
Typically obtained by using the
dali.tfrecord.FixedLenFeature
anddali.tfrecord.VarLenFeature
helper functions, which are equal to TensorFlow’stf.FixedLenFeature
andtf.VarLenFeature
types, respectively. For additional flexibility,dali.tfrecord.VarLenFeature
supports thepartial_shape
parameter. If provided, the data will be reshaped to match its value, and the first dimension will be inferred from the data size.index_path (str or list of str) –
List of paths to index files. There should be one index file for every TFRecord file.
The index files can be obtained from TFRecord files by using the
tfrecord2idx
script that is distributed with DALI.path (str or list of str) – List of paths to TFRecord files.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
ToDecibels
(**kwargs)¶ Converts a magnitude (real, positive) to the decibel scale by using the following formula:
min_ratio = pow(10, cutoff_db / multiplier) out[i] = multiplier * log10( max(min_ratio, input[i] / reference) )
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
cutoff_db (float, optional, default = -200.0) –
Minimum or cut-off ratio in dB.
Any value below this value will saturate. For example, a value of
cutoff_db=-80
corresponds to a minimum ratio of1e-8
.multiplier (float, optional, default = 10.0) – Factor by which the logarithm is multiplied. The value is typically 10.0 or 20.0, which depends on whether the magnitude is squared.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
reference (float, optional, default = 0.0) –
Reference magnitude.
If a value is not provided, the maximum value for the input will be used as reference.
Note
The maximum of the input will be calculated on a per-sample basis.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
TranslateTransform
(**kwargs)¶ Produces a translation affine transform matrix.
If another transform matrix is passed as an input, the operator applies translation to the matrix provided.
Note
The output of this operator can be fed directly to the
MT
argument ofCoordTransform
operator.- Supported backends
‘cpu’
- Keyword Arguments
offset (float or list of float) –
The translation vector.
The number of dimensions of the transform is inferred from this argument.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
reverse_order (bool, optional, default = False) –
Determines the order when combining affine transforms.
If set to False (default), the operator’s affine transform will be applied to the input transform. If set to True, the input transform will be applied to the operator’s transform.
If there’s no input, this argument is ignored.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
- Keyword Arguments
offset (TensorList of float) –
The translation vector.
The number of dimensions of the transform is inferred from this argument.
-
class
nvidia.dali.ops.
Transpose
(**kwargs)¶ Transposes the tensors by reordering the dimensions based on the
perm
parameter.Destination dimension
i
is obtained from source dimensionperm[i]
.For example, for a source image with
HWC
layout,shape = (100, 200, 3)
, andperm = [2, 0, 1]
, it will produce a destination image withCHW
layout andshape = (3, 100, 200)
, holding the equality:\[dst(x_2, x_0, x_1) = src(x_0, x_1, x_2)\]which is equivalent to:
\[dst(x_{perm[0]}, x_{perm[1]}, x_{perm[2]}) = src(x_0, x_1, x_2)\]for all valid coordinates.
This operator allows sequence inputs and supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
perm (int or list of int) – Permutation of the dimensions of the input, for example, [2, 0, 1].
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
output_layout (layout str, optional, default = ‘’) –
Explicitly sets the output data layout.
If this argument is specified,
transpose_layout
is ignored.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
transpose_layout (bool, optional, default = True) –
When set to True, the axis names in the output data layout are permuted according to
perm
, Otherwise, the input layout is copied to the output.If
output_layout
is set, this argument is ignored.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList) – Input to the operator.
-
class
nvidia.dali.ops.
Uniform
(**kwargs)¶ Produces random numbers following a uniform distribution.
- Supported backends
‘cpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
range (float or list of float, optional, default = [-1.0, 1.0]) – Range of produced random numbers.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int, optional, default = [1]) – Shape of the samples.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
VideoReader
(**kwargs)¶ Loads and decodes video files using FFmpeg and NVDECODE, which is the hardware-accelerated video decoding feature in the NVIDIA(R) GPU.
The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences of
sequence_length
frames with shape(N, F, H, W, C)
, whereN
is the batch size, andF
is the number of frames). This class only supports constant frame rate videos.- Supported backends
‘gpu’
- Keyword Arguments
sequence_length (int) – Frames to load per sequence.
additional_decode_surfaces (int, optional, default = 2) –
Additional decode surfaces to use beyond minimum required.
This argument is ignored when the decoder cannot determine the minimum number of decode surfaces
Note
This can happen when the driver is an older version.
This parameter can be used to trade off memory usage with performance.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
channels (int, optional, default = 3) – Number of channels.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
Supported types:
UINT8
orFLOAT
.enable_frame_num (bool, optional, default = False) – If the
file_list
orfile_root
argument is passed, returns the frame number output.enable_timestamps (bool, optional, default = False) – If the
file_list
orfile_root
arguments are passed, returns the timestamps output.file_list (str, optional, default = '') –
Path to the file with a list of
file label [start_frame [end_frame]]
values.Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with
filenames
andfile_root
.file_list_frame_num (bool, optional, default = False) –
If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.
If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.
Frame numbers start from 0.
file_root (str, optional, default = '') –
Path to a directory that contains the data files.
This option is mutually exclusive with
filenames
andfile_list
.filenames (str or list of str, optional, default = []) –
File names of the video files to load.
This option is mutually exclusive with
filenames
andfile_root
.image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
normalized (bool, optional, default = False) – Gets the output as normalized data.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
skip_vfr_check (bool, optional, default = False) –
Skips the check for the variable frame rate (VFR) videos.
Use this flag to suppress false positive detection of VFR videos.
Warning
When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.
step (int, optional, default = -1) –
Frame interval between each sequence.
When the value is less than 0,
step
is set tosequence_length
.stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
-
class
nvidia.dali.ops.
VideoReaderResize
(**kwargs)¶ Loads, decodes and resizes video files with FFmpeg and NVDECODE, which is NVIDIA GPU’s hardware-accelerated video decoding.
The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences with shape
(N, F, H, W, C)
, with N being the batch size, and F the number of frames in the sequence.This operator combines the features of
nvidia.dali.ops.VideoDecoder()
andnvidia.dali.ops.Resize()
.Note
The decoder supports only constant frame-rate videos.
- Supported backends
‘gpu’
- Keyword Arguments
sequence_length (int) – Frames to load per sequence.
additional_decode_surfaces (int, optional, default = 2) –
Additional decode surfaces to use beyond minimum required.
This argument is ignored when the decoder cannot determine the minimum number of decode surfaces
Note
This can happen when the driver is an older version.
This parameter can be used to trade off memory usage with performance.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
channels (int, optional, default = 3) – Number of channels.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
Supported types:
UINT8
orFLOAT
.enable_frame_num (bool, optional, default = False) – If the
file_list
orfile_root
argument is passed, returns the frame number output.enable_timestamps (bool, optional, default = False) – If the
file_list
orfile_root
arguments are passed, returns the timestamps output.file_list (str, optional, default = '') –
Path to the file with a list of
file label [start_frame [end_frame]]
values.Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with
filenames
andfile_root
.file_list_frame_num (bool, optional, default = False) –
If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.
If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.
Frame numbers start from 0.
file_root (str, optional, default = '') –
Path to a directory that contains the data files.
This option is mutually exclusive with
filenames
andfile_list
.filenames (str or list of str, optional, default = []) –
File names of the video files to load.
This option is mutually exclusive with
filenames
andfile_root
.image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
If
random_shuffle
is False, this parameter is ignored.interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
mag_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
max_size (float or list of float, optional) –
Limit of the output size.
When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using
resize_shorter
argument or “not_smaller” mode or when some extents are left unspecified.This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.
Note
When used with “not_smaller” mode or
resize_shorter
argument,max_size
takes precedence and the aspect ratio is kept - for example, resizing withmode="not_smaller", size=800, max_size=1400
an image of size 1200x600 would be resized to 1400x700.min_filter (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.
mode (str, optional, default = 'default') –
Resize mode.
Here is a list of supported modes:
"default"
- image is resized to the specified size.Missing extents are scaled with the average scale of the provided ones extents."stretch"
- image is resized to the specified size.Missing extents are not scaled at all."not_larger"
- image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output."not_smaller"
- image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.This argument is mutually exclusive with
resize_longer
andresize_shorter
normalized (bool, optional, default = False) – Gets the output as normalized data.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
Note
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fill
is used to read data sequentially, and then samples are selected randomly to form a batch.read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
resize_longer (float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (float or list of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right
roi_start (float or list of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
size (float or list of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
skip_vfr_check (bool, optional, default = False) –
Skips the check for the variable frame rate (VFR) videos.
Use this flag to suppress false positive detection of VFR videos.
Warning
When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.
step (int, optional, default = -1) –
Frame interval between each sequence.
When the value is less than 0,
step
is set tosequence_length
.stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.
subpixel_scale (bool, optional, default = True) –
If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.
Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.
temp_buffer_hint (int, optional, default = 0) –
Initial size in bytes, of a temporary buffer for resampling.
Note
This argument is ignored for the CPU variant.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
-
__call__
(**kwargs)¶ Operator call to be used in graph definition. This operator doesn’t have any inputs.
- Keyword Arguments
interp_type (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –
Type of interpolation to be used.
Use
min_filter
andmag_filter
to specify different filtering for downscaling and upscaling.mag_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.
min_filter (TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.
resize_longer (TensorList of float, optional, default = 0.0) –
The length of the longer dimension of the resized image.
This option is mutually exclusive with
resize_shorter
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_larger"
.resize_shorter (TensorList of float, optional, default = 0.0) –
The length of the shorter dimension of the resized image.
This option is mutually exclusive with
resize_longer
and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions andmode="not_smaller"
. The longer dimension can be bounded by setting themax_size
argument. Seemax_size
argument doc for more info.resize_x (TensorList of float, optional, default = 0.0) –
The length of the X dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_y
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_y (TensorList of float, optional, default = 0.0) –
The length of the Y dimension of the resized image.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.resize_z (TensorList of float, optional, default = 0.0) –
The length of the Z dimension of the resized volume.
This option is mutually exclusive with
resize_shorter
,resize_longer
andsize
. If theresize_x
andresize_y
are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.roi_end (TensorList of float, optional) –
End of the input region of interest (ROI).
Must be specified together with
roi_start
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.roi_start (TensorList of float, optional) –
Origin of the input region of interest (ROI).
Must be specified together with
roi_end
. The coordinates follow the tensor shape order, which is the same assize
. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value ofrelative_roi
argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.size (TensorList of float, optional) –
The desired output size.
Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and
mode
argument.
-
class
nvidia.dali.ops.
WarpAffine
(**kwargs)¶ Applies an affine transformation to the images.
This operator supports volumetric data.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.NO_TYPE) –
Output data type.
If not set, the input type is used.
fill_value (float, optional, default = 0.0) –
Value used to fill areas that are outside the source image.
If a value is not specified, the source coordinates are clamped and the border pixel is repeated.
interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Type of interpolation used.
matrix (float or list of float, optional, default = []) –
Transform matrix (dst -> src).
With a list of values
(M11, M12, M13, M21, M22, M23)
, this operation produces a new image by using the following formula:dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)
It is equivalent to OpenCV’s
warpAffine
operation with theWARP_INVERSE_MAP
flag set.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
size (float or list of float, optional, default = []) –
Output size, in pixels/points.
Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify
(480,640)
, not(480,640,3)
.
-
__call__
(*inputs, **kwargs)¶ See
nvidia.dali.ops.WarpAffine()
class for complete information.- Keyword Arguments
matrix (TensorList of float, optional, default = []) –
Transform matrix (dst -> src).
With a list of values
(M11, M12, M13, M21, M22, M23)
, this operation produces a new image by using the following formula:dst(x,y) = src(M11 * x + M12 * y + M13, M21 * x + M22 * y + M23)
It is equivalent to OpenCV’s
warpAffine
operation with theWARP_INVERSE_MAP
flag set.size (TensorList of float, optional, default = []) –
Output size, in pixels/points.
Non-integer sizes are rounded to nearest integer. The channel dimension should be excluded (for example, for RGB images, specify
(480,640)
, not(480,640,3)
.
-
class
nvidia.dali.ops.
Water
(**kwargs)¶ Performs a water augmentation, which makes the image appear to be underwater.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
ampl_x (float, optional, default = 10.0) – Amplitude of the wave in the x direction.
ampl_y (float, optional, default = 10.0) – Amplitude of the wave in the y direction.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
fill_value (float, optional, default = 0.0) – Color value that is used for padding.
freq_x (float, optional, default = 0.049087) – Frequency of the wave in the x direction.
freq_y (float, optional, default = 0.049087) – Frequence of the wave in the y direction.
interp_type (nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_NN) – Type of interpolation used.
mask (int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
phase_x (float, optional, default = 0.0) – Phase of the wave in the x direction.
phase_y (float, optional, default = 0.0) – Phase of the wave in the y direction.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
-
__call__
(data, **kwargs)¶ Operator call to be used in graph definition.
- Parameters
data (TensorList ('HWC')) – Input to the operator.
- Keyword Arguments
mask (TensorList of int, optional, default = 1) –
Determines whether to apply this augmentation to the input image.
Here are the values:
0: Do not apply this transformation.
1: Apply this transformation.
-
class
nvidia.dali.plugin.pytorch.
TorchPythonFunction
(function, num_outputs=1, device='cpu', batch_processing=False, **kwargs)¶ Executes a function that is operating on Torch tensors.
This class is analogous to
nvidia.dali.ops.PythonFunction()
but the tensor data is handled as PyTorch tensors.This operator allows sequence inputs and supports volumetric data.
This operator will not be optimized out of the graph.
- Supported backends
‘cpu’
‘gpu’
- Keyword Arguments
function (object) – Function object.
batch_processing (bool, optional, default = False) – Determines whether the function gets an entire batch as an input.
bytes_per_sample_hint (int, optional, default = 0) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
num_outputs (int, optional, default = 1) – Number of outputs.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
Arithmetic expressions¶
DALI allows you to use regular Python arithmetic operations in
the define_graph()
method on the values that are returned
from invoking other operators.
The expressions that are used will be incorporated into the pipeline without needing to explicitly instantiate operators and will describe the element-wise operations on Tensors.
At least one of the inputs to the arithmetic expression must be returned by other DALI operator -
that is a value of nvidia.dali.pipeline.DataNode
representing a batch of tensors.
The other input can be nvidia.dali.types.Constant()
or regular Python value of type bool
,
int
, or float
. As the operations performed are element-wise, the shapes of all
operands must match.
Note
If one of the operands is a batch of Tensors that represent scalars, the scalar values are broadcasted to the other operand.
For details and examples see expressions tutorials.
Supported arithmetic operations¶
Currently, DALI supports the following operations:
-
Unary arithmetic operators: +, -
Unary operators that implement
__pos__(self)
and__neg__(self)
. The result of a unary arithmetic operation always preserves the input type. Unary operators accept only TensorList inputs from other operators.- Return type
TensorList of the same type
-
Binary arithmetic operations: +, -, *, /, //
Binary operators that implement
__add__
,__sub__
,__mul__
,__truediv__
and__floordiv__
respectively.The result of an arithmetic operation between two operands is described below, with the exception of
/
, the__truediv__
operation, which always returnsfloat32
orfloat64
type.Operand Type
Operand Type
Result Type
Additional Conditions
T
T
T
floatX
T
floatX
where T is not a float
floatX
floatY
floatZ
where Z = max(X, Y)
intX
intY
intZ
where Z = max(X, Y)
uintX
uintY
uintZ
where Z = max(X, Y)
intX
uintY
int2Y
if X <= Y
intX
uintY
intX
if X > Y
T
stands for any one of the supported numerical types:bool
,int8
,int16
,int32
,int64
,uint8
,uint16
,uint32
,uint64
,float32
, andfloat64
.bool
type is considered the smallest unsigned integer type and is treated asuint1
with respect to the table above.Note
Type promotion is commutative.
Note
The only allowed arithmetic operation between two
bool
values is multiplication(*)
.- Return type
TensorList of the type that is calculated based on the type promotion rules.
-
Comparison operations: ==, !=, <, <=, >, >=
Comparison operations.
- Return type
TensorList of
bool
type.
-
Bitwise binary operations: &, |, ^
The bitwise binary operations follow the same type promotion rules as arithmetic binary operations, but their inputs are restricted to integral types (including
bool
).Note
A bitwise operation can be applied to two boolean inputs. Those operations can be used to emulate element-wise logical operations on Tensors.
- Return type
TensorList of the type that is calculated based on the type promotion rules.