nvidia.dali.fn.decoders.image_slice¶
-
nvidia.dali.fn.decoders.
image_slice
(*inputs, **kwargs)¶ Decodes images and extracts regions of interest.
The slice can be specified by proving the start and end coordinates, or start coordinates and shape of the slice. Both coordinates and shapes can be provided in absolute or relative terms.
The slice arguments can be specified by the following named arguments:
start
: Slice start coordinates (absolute)rel_start
: Slice start coordinates (relative)end
: Slice end coordinates (absolute)rel_end
: Slice end coordinates (relative)shape
: Slice shape (absolute)rel_shape
: Slice shape (relative)
The slice can be configured by providing start and end coordinates or start and shape. Relative and absolute arguments can be mixed (for example,
rel_start
can be used withshape
) as long as start and shape or end are uniquely defined.Alternatively, two extra positional inputs can be provided, specifying
anchor
andshape
. When using positional inputs, two extra boolean argumentsnormalized_anchor
/normalized_shape
can be used to specify the nature of the arguments provided. Using positional inputs for anchor and shape is incompatible with the named arguments specified above.The slice arguments should provide as many dimensions as specified by the
axis_names
oraxes
arguments.By default, the
nvidia.dali.fn.decoders.image_slice()
operator uses normalized coordinates and “WH” order for the slice arguments.When possible, the argument uses the ROI decoding APIs (for example, libjpeg-turbo and nvJPEG) to optimize the decoding time and memory usage. When the ROI decoding is not supported for a given image format, it will decode the entire image and crop the selected ROI.
The output of the decoder is in the HWC layout.
Supported formats: JPG, BMP, PNG, TIFF, PNM, PPM, PGM, PBM, JPEG 2000, WebP.
Note
JPEG 2000 region-of-interest (ROI) decoding is not accelerated on the GPU, and will use a CPU implementation regardless of the selected backend. For a GPU accelerated implementation, consider using separate
decoders.image
andslice
operators.Note
EXIF orientation metadata is disregarded.
- Supported backends
‘cpu’
‘mixed’
- Parameters
data (TensorList) – Batch that contains the input data.
anchor (1D TensorList of float or int, optional) –
Input that contains normalized or absolute coordinates for the starting point of the slice (x0, x1, x2, …).
Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of
normalized_anchor
.shape (1D TensorList of float or int, optional) –
Input that contains normalized or absolute coordinates for the dimensions of the slice (s0, s1, s2, …).
Integer coordinates are interpreted as absolute coordinates, while float coordinates can be interpreted as absolute or relative coordinates, depending on the value of
normalized_shape
.
- Keyword Arguments
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
axes (int or list of int or TensorList of int, optional, default = [1, 0]) –
Order of dimensions used for the anchor and shape slice inputs as dimension indices.
Negative values are interpreted as counting dimensions from the back. Valid range:
[-ndim, ndim-1]
, where ndim is the number of dimensions in the input data.axis_names (layout str, optional, default = ‘WH’) –
Order of the dimensions used for the anchor and shape slice inputs, as described in layout.
If a value is provided,
axis_names
will have a higher priority thanaxes
.bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True and then copy the largest allocation value that was printed in the statistics.end (int or list of int or TensorList of int, optional) –
End coordinates of the slice.
Note: Providing named arguments
start
,end
,shape
,rel_start
,rel_end
,rel_shape
is incompatible with providing positional inputs anchor and shape.host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution. One way to find the ideal value is to do a complete run over the dataset with the
memory_stats
argument set to True, and then copy the largest allocation value that is printed in the statistics.hw_decoder_load (float, optional, default = 0.65) –
The percentage of the image data to be processed by the HW JPEG decoder.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU and newer architecture.Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100
hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
memory_stats (bool, optional, default = False) –
Applies only to the
mixed
backend type.Prints debug information about nvJPEG allocations. The information about the largest allocation might be useful to determine suitable values for
device_memory_padding
andhost_memory_padding
for a dataset.Note
The statistics are global for the entire process, not per operator instance, and include the allocations made during construction if the padding hints are non-zero.
normalized_anchor (bool, optional, default = True) –
Determines whether the anchor positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.normalized_shape (bool, optional, default = True) –
Determines whether the shape positional input should be interpreted as normalized (range [0.0, 1.0]) or as absolute coordinates.
Note
This argument is only relevant when anchor data type is
float
. For integer types, the coordinates are always absolute.output_type (
nvidia.dali.types.DALIImageType
, optional, default = DALIImageType.RGB) –The color space of the output image.
Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.
preallocate_height_hint (int, optional, default = 0) –
Image width hint.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU and newer architecture.The hint is used to preallocate memory for the HW JPEG decoder.
preallocate_width_hint (int, optional, default = 0) –
Image width hint.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU and newer architecture.The hint is used to preallocate memory for the HW JPEG decoder.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
rel_end (float or list of float or TensorList of float, optional) –
End relative coordinates of the slice (range [0.0 - 1.0].
Note: Providing named arguments
start
,end
,shape
,rel_start
,rel_end
,rel_shape
is incompatible with providing positional inputs anchor and shape.rel_shape (float or list of float or TensorList of float, optional) –
Relative shape of the slice (range [0.0 - 1.0]).
Providing named arguments
start
,end
,shape
,rel_start
,rel_end
,rel_shape
is incompatible with providing positional inputs anchor and shape.rel_start (float or list of float or TensorList of float, optional) –
Start relative coordinates of the slice (range [0.0 - 1.0]).
Note: Providing named arguments
start
,end
,shape
,rel_start
,rel_end
,rel_shape
is incompatible with providing positional inputs anchor and shape.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape (int or list of int or TensorList of int, optional) –
Shape of the slice.
Providing named arguments
start
,end
,shape
,rel_start
,rel_end
,rel_shape
is incompatible with providing positional inputs anchor and shape.split_stages (bool) –
Warning
The argument
split_stages
is no longer used and will be removed in a future release.start (int or list of int or TensorList of int, optional) –
Start coordinates of the slice.
Note: Providing named arguments
start
/end
orstart
/shape
is incompatible with providing positional inputs anchor and shape.use_chunk_allocator (bool) –
Warning
The argument
use_chunk_allocator
is no longer used and will be removed in a future release.use_fast_idct (bool, optional, default = False) –
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.
See also