nvidia.dali.fn.random_bbox_crop¶
-
nvidia.dali.fn.
random_bbox_crop
(*inputs, **kwargs)¶ Applies a prospective random crop to an image coordinate space while keeping the bounding boxes, and optionally labels, consistent.
This means that after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped ROI. The applied random crop operation is constrained by the arguments that are provided to the operator.
The cropping window candidates are randomly selected until one matches the overlap restrictions that are specified by the
thresholds
argument.thresholds
values represent a minimum overlap metric that is specified bythreshold_type
, such as the intersection-over-union of the cropping window and the bounding boxes or the relative overlap as a ratio of the intersection area and the bounding box area.Additionally, if
allow_no_crop
is True, the cropping may be skipped entirely as one of the valid results of the operator.The following modes of a random crop are available:
- Randomly shaped window, which is randomly placed in the original input space.The random crop window dimensions are selected to satisfy the aspect ratio and relative area restrictions.If
input_shape
is provided, it will be taken into account for the aspect ratio range check.Otherwise, the aspect ratios are calculated in relative terms.In other words, withoutinput_shape
, an aspect ratio of 1.0 is equivalent to the aspect ratio of the input image. - Fixed size window, which is randomly placed in the original input space.The random crop window dimensions are taken from the
crop_shape
argument and the anchor israndomly selected.When providingcrop_shape
,input_shape
is also required (these dimensions are required toscale the output bounding boxes).
The num_attempts argument can be used to control the maximum number of attempts to produce a valid crop to match a minimum overlap metric value from
thresholds
.Warning
When
allow_no_crop
is False andthresholds
does not contain0.0
, if you do not increase thenum_attempts
value, it might continue to loop for a long time.Inputs: 0: bboxes, (1: labels)
The first input,
bboxes
, refers to the bounding boxes that are provided as a two-dimensional tensor where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.The coordinates are relative to the original image dimensions (that means, a range of
[0.0, 1.0]
) that represent the start and, depending on the value of bbox_layout, the end of the region or start and shape. For example,bbox_layout
=”xyXY” means the bounding box coordinates follow thestart_x
,start_y
,end_x
, andend_y
order, andbbox_layout
=”xyWH” indicates that the order isstart_x
,start_y
,width
, andheight
. See thebbox_layout
argument description for more information.An optional input
labels
can be provided, representing the labels that are associated with each of the bounding boxes.Outputs: 0: anchor, 1: shape, 2: bboxes (, 3: labels, 4: bboxes_indices)
The resulting crop parameters are provided as two separate outputs,
anchor
andshape
, that can be fed directly to thenvidia.dali.fn.slice()
operator to complete the cropping of the original image.anchor
andshape
contain the starting coordinates and dimensions for the crop in the[x, y, (z)]
and[w, h, (d)]
formats, respectively. The coordinates can be represented in absolute or relative terms, and the representation depends on whether the fixedcrop_shape
was used.Note
Both
anchor
andshape
are returned as afloat
, even if they represent absolute coordinates due to providingcrop_shape
argument. In order for them to be interpreted correctly bynvidia.dali.fn.slice()
,normalized_anchor
andnormalized_shape
should be set to False.The third output contains the bounding boxes, after filtering out the ones with a centroid outside of the cropping window, and with the coordinates mapped to the new coordinate space.
The next output is optional, and it represents the labels associated with the filtered bounding boxes. The output will be present if a labels input was provided.
The last output, also optional, correspond to the original indices of the bounding boxes that passed the centroid filter and are present in the output. This output will be present if the option
output_bbox_indices
is set to True.- Supported backends
‘cpu’
- Parameters
boxes (2D TensorList of float) – Relative coordinates of the bounding boxes that are represented as a 2D tensor, where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.
labels (1D TensorList of integers, optional) – Labels that are associated with each of the bounding boxes.
- Keyword Arguments
all_boxes_above_threshold (bool, optional, default = True) –
If set to True, all bounding boxes in a sample should overlap with the cropping window as specified by
thresholds
.If the bounding boxes do not overlap, the cropping window is considered to be invalid. If set to False, and at least one bounding box overlaps the window, the window is considered to be valid.
allow_no_crop (bool, optional, default = True) – If set to True, one of the possible outcomes of the random process will be to not crop, as if the outcome was one more
thresholds
value from which to choose.aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –
Valid range of aspect ratio of the cropping windows.
This parameter can be specified as either two values (min, max) or six values (three pairs), depending on the dimensionality of the input.
- For 2D bounding boxes, one range of valid aspect ratios (x/y) should be provided (e.g.
[min_xy, max_xy]
). - For 3D bounding boxes, three separate aspect ratio ranges may be specified, for x/y, x/z and y/z pairs of dimensions.They are provided in the following order
[min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]
. Alternatively, if only one aspect ratio range is provided, it will be used for all three pairs of dimensions.
The value for
min
should be greater than0.0
, and min should be less than or equal to themax
value. By default, square windows are generated.Note
Providing
aspect_ratio
andscaling
is incompatible with explicitly specifyingcrop_shape
.Note
If
input_shape
is provided, it will be taken into account for the calculation of the cropping window aspect ratio. Otherwise, the aspect ratio ranges are relative to the image dimensions. In other words, wheninput_shape
is not specified, an aspect ratio of 1.0 is equivalent to the original aspect ratio of the image.bbox_layout (layout str, optional, default = ‘’) –
Determines the meaning of the coordinates of the bounding boxes.
The value of this argument is a string containing the following characters:
x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor), X (horizontal end anchor), Y (vertical end anchor), Z (depthwise end anchor), W (width), H (height), D (depth).
Note
If this value is left empty, depending on the number of dimensions, “xyXY” or “xyzXYZ” is assumed.
bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop_shape (int or list of int or TensorList of int, optional, default = []) –
If provided, the random crop window dimensions will be fixed to this shape.
The order of dimensions is determined by the layout provided in
shape_layout
.Note
When providing
crop_shape
,input_shape
should be provided as well. Providing explicitcrop_shape
is incompatible with usingscaling
andaspect_ratio
arguments.input_shape (int or list of int or TensorList of int, optional, default = []) –
Specifies the shape of the original input image.
The order of dimensions is determined by the layout that is provided in
shape_layout
.ltrb (bool, optional, default = True) –
If set to True, bboxes are returned as
[left, top, right, bottom]
; otherwise they are provided as[left, top, width, height]
.Warning
This argument has been deprecated. To specify the bbox encoding, use
bbox_layout
instead. For example,ltrb=True
is equal tobbox_layout
=”xyXY”, andltrb=False
corresponds tobbox_layout
=”xyWH”.num_attempts (int, optional, default = 1) –
Number of attempts to get a crop window that matches the
aspect_ratio
and a selected value fromthresholds
.After each
num_attempts
, a different threshold will be picked, until the threshold reaches a maximum oftotal_num_attempts
(if provided) or otherwise indefinitely.output_bbox_indices (bool, optional, default = False) – If set to True, an extra output will be returned, containing the original indices of the bounding boxes that passed the centroid filter and are present in the output bounding boxes.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
scaling (float or list of float, optional, default = [1.0, 1.0]) –
Range
[min, max]
for the crop size with respect to the original image dimensions.The value of
min
andmax
must satisfy the condition0.0 <= min <= max
.Note
Providing
aspect_ratio
andscaling
is incompatible when explicitly specifying thecrop_shape
value.seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.
shape_layout (layout str, optional, default = ‘’) –
Determines the meaning of the dimensions provided in
crop_shape
andinput_shape
.The values are:
W
(width)H
(height)D
(depth)
Note
If left empty, depending on the number of dimensions
"WH"
or"WHD"
will be assumed.threshold_type (str, optional, default = ‘iou’) –
Determines the meaning of
thresholds
.By default, thresholds refers to the intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, the threshold can be set to “overlap” to specify the fraction (by area) of the bounding box that will will fall inside the crop window. For example, a threshold value of
1.0
means the entire bounding box must be contained in the resulting cropping window.thresholds (float or list of float, optional, default = [0.0]) –
Minimum IoU or a different metric, if specified by
threshold_type
, of the bounding boxes with respect to the cropping window.Each sample randomly selects one of the
thresholds
, and the operator will complete up to the specified number of attempts to produce a random crop window that has the selected metric above that threshold. Seenum_attempts
for more information about configuring the number of attempts.total_num_attempts (int, optional, default = -1) –
If provided, it indicates the total maximum number of attempts to get a crop window that matches the
aspect_ratio
and any selected value fromthresholds
.After
total_num_attempts
attempts, the best candidate will be selected.If this value is not specified, the crop search will continue indefinitely until a valid crop is found.
Warning
If you do not provide a
total_num_attempts
value, this can result in an infinite loop if the conditions imposed by the arguments cannot be satisfied.