nvidia.dali.fn.experimental.decoders.image_crop#
- nvidia.dali.fn.experimental.decoders.image_crop(__input, /, *, adjust_orientation=True, affine=True, bytes_per_sample_hint=[0], crop=None, crop_d=0.0, crop_h=0.0, crop_pos_x=0.5, crop_pos_y=0.5, crop_pos_z=0.5, crop_w=0.0, device_memory_padding=16777216, device_memory_padding_jpeg2k=0, dtype=DALIDataType.UINT8, host_memory_padding=8388608, host_memory_padding_jpeg2k=0, hw_decoder_load=0.9, hybrid_huffman_threshold=1000000, jpeg_fancy_upsampling=False, output_type=DALIImageType.RGB, preallocate_height_hint=0, preallocate_width_hint=0, preserve=False, rounding='round', seed=-1, use_fast_idct=False, device=None, name=None)#
Decodes images and extracts regions-of-interest (ROI) that are specified by fixed window dimensions and variable anchors.
Supported formats: JPEG, JPEG 2000, TIFF, PNG, BMP, PNM, PPM, PGM, PBM, WebP.
The output of the decoder is in HWC layout.
The implementation uses NVIDIA nvImageCodec to decode images. You need to install it separately. See https://developer.nvidia.com/nvimgcodec-downloads or simply do pip install nvidia-nvimgcodec-cu${CUDA_MAJOR_VERSION} where CUDA_MAJOR_VERSION is your CUDA major version (e.g. 12).
When possible, the operator uses the ROI decoding, reducing the decoding time and memory consumption.
Note
GPU accelerated decoding is only available for a subset of the image formats (JPEG, and JPEG2000). For other formats, a CPU based decoder is used. For JPEG, a dedicated HW decoder will be used when available.
Note
WebP decoding currently only supports the simple file format (lossy and lossless compression). For details on the different WebP file formats, see https://developers.google.com/speed/webp/docs/riff_container
- Supported backends
‘cpu’
‘mixed’
- Parameters:
__input (TensorList) – Input to the operator.
- Keyword Arguments:
adjust_orientation (bool, optional, default = True) – Use EXIF orientation metadata to rectify the images
affine (bool, optional, default = True) –
Applies only to the
mixed
backend type.If set to True, each thread in the internal thread pool will be tied to a specific CPU core. Otherwise, the threads can be reassigned to any CPU core by the operating system.
bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop (float or list of float or TensorList of float, optional) –
Shape of the cropped image, specified as a list of values (for example,
(crop_H, crop_W)
for the 2D crop and(crop_D, crop_H, crop_W)
for the volumetric crop).Providing crop argument is incompatible with providing separate arguments such as
crop_d
,crop_h
, andcrop_w
.crop_d (float or TensorList of float, optional, default = 0.0) –
Applies only to volumetric inputs; cropping window depth (in voxels).
crop_w
,crop_h
, andcrop_d
must be specified together. Providing values forcrop_w
,crop_h
, andcrop_d
is incompatible with providing the fixed crop window dimensions (argument crop).crop_h (float or TensorList of float, optional, default = 0.0) –
Cropping the window height (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).crop_pos_x (float or TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) horizontal position of the cropping window (upper left corner).
The actual position is calculated as
crop_x = crop_x_norm * (W - crop_W)
, where crop_x_norm is the normalized position,W
is the width of the image, andcrop_W
is the width of the cropping window.See
rounding
argument for more details on howcrop_x
is converted to an integral value.crop_pos_y (float or TensorList of float, optional, default = 0.5) –
Normalized (0.0 - 1.0) vertical position of the start of the cropping window (typically, the upper left corner).
The actual position is calculated as
crop_y = crop_y_norm * (H - crop_H)
, wherecrop_y_norm
is the normalized position, H is the height of the image, andcrop_H
is the height of the cropping window.See
rounding
argument for more details on howcrop_y
is converted to an integral value.crop_pos_z (float or TensorList of float, optional, default = 0.5) –
Applies only to volumetric inputs.
Normalized (0.0 - 1.0) normal position of the cropping window (front plane). The actual position is calculated as
crop_z = crop_z_norm * (D - crop_D)
, wherecrop_z_norm
is the normalized position,D
is the depth of the image andcrop_D
is the depth of the cropping window.See
rounding
argument for more details on howcrop_z
is converted to an integral value.crop_w (float or TensorList of float, optional, default = 0.0) –
Cropping window width (in pixels).
Providing values for
crop_w
andcrop_h
is incompatible with providing fixed crop window dimensions (argumentcrop
).device_memory_padding (int, optional, default = 16777216) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates one device buffer of the requested size per thread. If the value is correctly selected, no additional allocations will occur during the pipeline execution.
device_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s device memory allocations, in bytes. This parameter helps to avoid reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.
dtype (
nvidia.dali.types.DALIDataType
, optional, default = DALIDataType.UINT8) –Output data type of the image.
Values will be converted to the dynamic range of the requested type.
host_memory_padding (int, optional, default = 8388608) –
Applies only to the
mixed
backend type.The padding for nvJPEG’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates two (because of double-buffering) host-pinned buffers of the requested size per thread. If selected correctly, no additional allocations will occur during the pipeline execution.
host_memory_padding_jpeg2k (int, optional, default = 0) –
Applies only to the
mixed
backend type.The padding for nvJPEG2k’s host memory allocations, in bytes. This parameter helps to prevent the reallocation in nvJPEG2k when a larger image is encountered, and the internal buffer needs to be reallocated to decode the image.
If a value greater than 0 is provided, the operator preallocates the necessary number of buffers according to the hint provided. If the value is correctly selected, no additional allocations will occur during the pipeline execution.
hw_decoder_load (float, optional, default = 0.9) –
The percentage of the image data to be processed by the HW JPEG decoder.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU architecture.Determines the percentage of the workload that will be offloaded to the hardware decoder, if available. The optimal workload depends on the number of threads that are provided to the DALI pipeline and should be found empirically. More details can be found at https://developer.nvidia.com/blog/loading-data-fast-with-dali-and-new-jpeg-decoder-in-a100
hybrid_huffman_threshold (int, optional, default = 1000000) –
Applies only to the
mixed
backend type.Images with a total number of pixels (
height * width
) that is higher than this threshold will use the nvJPEG hybrid Huffman decoder. Images that have fewer pixels will use the nvJPEG host-side Huffman decoder.Note
Hybrid Huffman decoder still largely uses the CPU.
jpeg_fancy_upsampling (bool, optional, default = False) –
Make the
mixed
backend use the same chroma upsampling approach as thecpu
one.The option corresponds to the JPEG fancy upsampling available in libjpegturbo or ImageMagick.
memory_stats –
- output_type
nvidia.dali.types.DALIImageType
, optional, default = DALIImageType.RGB The color space of the output image.
Note: When decoding to YCbCr, the image will be decoded to RGB and then converted to YCbCr, following the YCbCr definition from ITU-R BT.601.
- preallocate_height_hintint, optional, default = 0
Image width hint.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU architecture.The hint is used to preallocate memory for the HW JPEG decoder.
- preallocate_width_hintint, optional, default = 0
Image width hint.
Applies only to the
mixed
backend type in NVIDIA Ampere GPU architecture.The hint is used to preallocate memory for the HW JPEG decoder.
- preservebool, optional, default = False
Prevents the operator from being removed from the graph even if its outputs are not used.
- roundingstr, optional, default = ‘round’
Determines the rounding function used to convert the starting coordinate of the window to an integral value (see
crop_pos_x
,crop_pos_y
,crop_pos_z
).Possible values are:
"round"
- Rounds to the nearest integer value, with halfway cases rounded away from zero."truncate"
- Discards the fractional part of the number (truncates towards zero).
- seedint, optional, default = -1
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
split_stages : bool, optional, default = False
Warning
The argument
split_stages
is now deprecated and its usage is discouraged.use_chunk_allocator : bool, optional, default = False
Warning
The argument
use_chunk_allocator
is now deprecated and its usage is discouraged.- use_fast_idctbool, optional, default = False
Enables fast IDCT in the libjpeg-turbo based CPU decoder, used when
device
is set to “cpu” or when the it is set to “mixed” but the particular image can not be handled by the GPU implementation.According to the libjpeg-turbo documentation, decompression performance is improved by up to 14% with little reduction in quality.