Applies a prospective random crop to an image coordinate space while keeping the bounding boxes, and optionally labels, consistent.
This means that after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped ROI. The applied random crop operation is constrained by the arguments that are provided to the operator.
The cropping window candidates are randomly selected until one matches the overlap restrictions that are specified by the
thresholdsvalues represent a minimum overlap metric that is specified by
threshold_type, such as the intersection-over-union of the cropping window and the bounding boxes or the relative overlap as a ratio of the intersection area and the bounding box area.
allow_no_cropis True, the cropping may be skipped entirely as one of the valid results of the operator.
The following modes of a random crop are available:
- Randomly shaped window, which is randomly placed in the original input space.The random crop window dimensions are selected to satisfy the aspect ratio and relative area restrictions.If
input_shapeis provided, it will be taken into account for the aspect ratio range check.Otherwise, the aspect ratios are calculated in relative terms.In other words, without
input_shape, an aspect ratio of 1.0 is equivalent to the aspect ratio of the input image.
- Fixed size window, which is randomly placed in the original input space.The random crop window dimensions are taken from the
crop_shapeargument and the anchor israndomly selected.When providing
input_shapeis also required (these dimensions are required toscale the output bounding boxes).
The num_attempts argument can be used to control the maximum number of attempts to produce a valid crop to match a minimum overlap metric value from
allow_no_cropis False and
thresholdsdoes not contain
0.0, if you do not increase the
num_attemptsvalue, it might continue to loop for a long time.
Inputs: 0: bboxes, (1: labels)
The first input,
bboxes, refers to the bounding boxes that are provided as a two-dimensional tensor where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.
The coordinates are relative to the original image dimensions (that means, a range of
[0.0, 1.0]) that represent the start and, depending on the value of bbox_layout, the end of the region or start and shape. For example,
bbox_layout=”xyXY” means the bounding box coordinates follow the
bbox_layout=”xyWH” indicates that the order is
height. See the
bbox_layoutargument description for more information.
An optional input
labelscan be provided, representing the labels that are associated with each of the bounding boxes.
Outputs: 0: anchor, 1: shape, 2: bboxes (, 3: labels, 4: bboxes_indices)
The resulting crop parameters are provided as two separate outputs,
shape, that can be fed directly to the
nvidia.dali.fn.slice()operator to complete the cropping of the original image.
shapecontain the starting coordinates and dimensions for the crop in the
[x, y, (z)]and
[w, h, (d)]formats, respectively. The coordinates can be represented in absolute or relative terms, and the representation depends on whether the fixed
shapeare returned as a
float, even if they represent absolute coordinates due to providing
crop_shapeargument. In order for them to be interpreted correctly by
normalized_shapeshould be set to False.
The third output contains the bounding boxes, after filtering out the ones with a centroid outside of the cropping window, and with the coordinates mapped to the new coordinate space.
The next output is optional, and it represents the labels associated with the filtered bounding boxes. The output will be present if a labels input was provided.
The last output, also optional, correspond to the original indices of the bounding boxes that passed the centroid filter and are present in the output. This output will be present if the option
output_bbox_indicesis set to True.
- Supported backends
boxes (2D TensorList of float) – Relative coordinates of the bounding boxes that are represented as a 2D tensor, where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.
labels (1D TensorList of integers, optional) – Labels that are associated with each of the bounding boxes.
- Keyword Arguments
all_boxes_above_threshold (bool, optional, default = True) –
If set to True, all bounding boxes in a sample should overlap with the cropping window as specified by
If the bounding boxes do not overlap, the cropping window is considered to be invalid. If set to False, and at least one bounding box overlaps the window, the window is considered to be valid.
allow_no_crop (bool, optional, default = True) – If set to True, one of the possible outcomes of the random process will be to not crop, as if the outcome was one more
thresholdsvalue from which to choose.
aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –
Valid range of aspect ratio of the cropping windows.
This parameter can be specified as either two values (min, max) or six values (three pairs), depending on the dimensionality of the input.
- For 2D bounding boxes, one range of valid aspect ratios (x/y) should be provided (e.g.
- For 3D bounding boxes, three separate aspect ratio ranges may be specified, for x/y, x/z and y/z pairs of dimensions.They are provided in the following order
[min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]. Alternatively, if only one aspect ratio range is provided, it will be used for all three pairs of dimensions.
The value for
minshould be greater than
0.0, and min should be less than or equal to the
maxvalue. By default, square windows are generated.
scalingis incompatible with explicitly specifying
input_shapeis provided, it will be taken into account for the calculation of the cropping window aspect ratio. Otherwise, the aspect ratio ranges are relative to the image dimensions. In other words, when
input_shapeis not specified, an aspect ratio of 1.0 is equivalent to the original aspect ratio of the image.
bbox_layout (layout str, optional, default = ‘’) –
Determines the meaning of the coordinates of the bounding boxes.
The value of this argument is a string containing the following characters:
x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor), X (horizontal end anchor), Y (vertical end anchor), Z (depthwise end anchor), W (width), H (height), D (depth).
If this value is left empty, depending on the number of dimensions, “xyXY” or “xyzXYZ” is assumed.
bytes_per_sample_hint (int or list of int, optional, default = ) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
crop_shape (int or list of int or TensorList of int, optional, default = ) –
If provided, the random crop window dimensions will be fixed to this shape.
The order of dimensions is determined by the layout provided in
input_shapeshould be provided as well. Providing explicit
crop_shapeis incompatible with using
input_shape (int or list of int or TensorList of int, optional, default = ) –
Specifies the shape of the original input image.
The order of dimensions is determined by the layout that is provided in
ltrb (bool, optional, default = True) –
If set to True, bboxes are returned as
[left, top, right, bottom]; otherwise they are provided as
[left, top, width, height].
This argument has been deprecated. To specify the bbox encoding, use
bbox_layoutinstead. For example,
ltrb=Trueis equal to
num_attempts (int, optional, default = 1) –
Number of attempts to get a crop window that matches the
aspect_ratioand a selected value from
num_attempts, a different threshold will be picked, until the threshold reaches a maximum of
total_num_attempts(if provided) or otherwise indefinitely.
output_bbox_indices (bool, optional, default = False) – If set to True, an extra output will be returned, containing the original indices of the bounding boxes that passed the centroid filter and are present in the output bounding boxes.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
scaling (float or list of float, optional, default = [1.0, 1.0]) –
[min, max]for the crop size with respect to the original image dimensions.
The value of
maxmust satisfy the condition
0.0 <= min <= max.
scalingis incompatible when explicitly specifying the
seed (int, optional, default = -1) –
If not provided, it will be populated based on the global seed of the pipeline.
shape_layout (layout str, optional, default = ‘’) –
Determines the meaning of the dimensions provided in
The values are:
If left empty, depending on the number of dimensions
"WHD"will be assumed.
threshold_type (str, optional, default = ‘iou’) –
Determines the meaning of
By default, thresholds refers to the intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, the threshold can be set to “overlap” to specify the fraction (by area) of the bounding box that will will fall inside the crop window. For example, a threshold value of
1.0means the entire bounding box must be contained in the resulting cropping window.
thresholds (float or list of float, optional, default = [0.0]) –
Minimum IoU or a different metric, if specified by
threshold_type, of the bounding boxes with respect to the cropping window.
Each sample randomly selects one of the
thresholds, and the operator will complete up to the specified number of attempts to produce a random crop window that has the selected metric above that threshold. See
num_attemptsfor more information about configuring the number of attempts.
total_num_attempts (int, optional, default = -1) –
If provided, it indicates the total maximum number of attempts to get a crop window that matches the
aspect_ratioand any selected value from
total_num_attemptsattempts, the best candidate will be selected.
If this value is not specified, the crop search will continue indefinitely until a valid crop is found.
If you do not provide a
total_num_attemptsvalue, this can result in an infinite loop if the conditions imposed by the arguments cannot be satisfied.