nvidia.dali.fn.random_bbox_crop

nvidia.dali.fn.random_bbox_crop(__boxes, __labels=None, /, *, all_boxes_above_threshold=True, allow_no_crop=True, aspect_ratio=[1.0, 1.0], bbox_layout='', bytes_per_sample_hint=[0], crop_shape=[], input_shape=[], ltrb=True, num_attempts=1, output_bbox_indices=False, preserve=False, scaling=[1.0, 1.0], seed=-1, shape_layout='', threshold_type='iou', thresholds=[0.0], total_num_attempts=-1, device=None, name=None)

Applies a prospective random crop to an image coordinate space while keeping the bounding boxes, and optionally labels, consistent.

This means that after applying the random crop operator to the image coordinate space, the bounding boxes will be adjusted or filtered out to match the cropped ROI. The applied random crop operation is constrained by the arguments that are provided to the operator.

The cropping window candidates are randomly selected until one matches the overlap restrictions that are specified by the thresholds argument. thresholds values represent a minimum overlap metric that is specified by threshold_type, such as the intersection-over-union of the cropping window and the bounding boxes or the relative overlap as a ratio of the intersection area and the bounding box area.

Additionally, if allow_no_crop is True, the cropping may be skipped entirely as one of the valid results of the operator.

The following modes of a random crop are available:

  • Randomly shaped window, which is randomly placed in the original input space.
    The random crop window dimensions are selected to satisfy the aspect ratio and relative area restrictions.
    If input_shape is provided, it will be taken into account for the aspect ratio range check.
    Otherwise, the aspect ratios are calculated in relative terms.
    In other words, without input_shape, an aspect ratio of 1.0 is equivalent to the aspect ratio of the input image.
  • Fixed size window, which is randomly placed in the original input space.
    The random crop window dimensions are taken from the crop_shape argument and the anchor is
    randomly selected.
    When providing crop_shape, input_shape is also required (these dimensions are required to
    scale the output bounding boxes).

The num_attempts argument can be used to control the maximum number of attempts to produce a valid crop to match a minimum overlap metric value from thresholds.

Warning

When allow_no_crop is False and thresholds does not contain 0.0, if you do not increase the num_attempts value, it might continue to loop for a long time.

Inputs: 0: bboxes, (1: labels)

The first input, bboxes, refers to the bounding boxes that are provided as a two-dimensional tensor where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.

The coordinates are relative to the original image dimensions (that means, a range of [0.0, 1.0]) that represent the start and, depending on the value of bbox_layout, the end of the region or start and shape. For example, bbox_layout=”xyXY” means the bounding box coordinates follow the start_x, start_y, end_x, and end_y order, and bbox_layout=”xyWH” indicates that the order is start_x, start_y, width, and height. See the bbox_layout argument description for more information.

An optional input labels can be provided, representing the labels that are associated with each of the bounding boxes.

Outputs: 0: anchor, 1: shape, 2: bboxes (, 3: labels, 4: bboxes_indices)

The resulting crop parameters are provided as two separate outputs, anchor and shape, that can be fed directly to the nvidia.dali.fn.slice() operator to complete the cropping of the original image. anchor and shape contain the starting coordinates and dimensions for the crop in the [x, y, (z)] and [w, h, (d)] formats, respectively. The coordinates can be represented in absolute or relative terms, and the representation depends on whether the fixed crop_shape was used.

Note

Both anchor and shape are returned as a float, even if they represent absolute coordinates due to providing crop_shape argument. In order for them to be interpreted correctly by nvidia.dali.fn.slice(), normalized_anchor and normalized_shape should be set to False.

The third output contains the bounding boxes, after filtering out the ones with a centroid outside of the cropping window, and with the coordinates mapped to the new coordinate space.

The next output is optional, and it represents the labels associated with the filtered bounding boxes. The output will be present if a labels input was provided.

The last output, also optional, correspond to the original indices of the bounding boxes that passed the centroid filter and are present in the output. This output will be present if the option output_bbox_indices is set to True.

Supported backends
  • ‘cpu’

Parameters:
  • __boxes (2D TensorList of float) – Relative coordinates of the bounding boxes that are represented as a 2D tensor, where the first dimension refers to the index of the bounding box, and the second dimension refers to the index of the coordinate.

  • __labels (1D TensorList of integers, optional) – Labels that are associated with each of the bounding boxes.

Keyword Arguments:
  • all_boxes_above_threshold (bool, optional, default = True) –

    If set to True, all bounding boxes in a sample should overlap with the cropping window as specified by thresholds.

    If the bounding boxes do not overlap, the cropping window is considered to be invalid. If set to False, and at least one bounding box overlaps the window, the window is considered to be valid.

  • allow_no_crop (bool, optional, default = True) – If set to True, one of the possible outcomes of the random process will be to not crop, as if the outcome was one more thresholds value from which to choose.

  • aspect_ratio (float or list of float, optional, default = [1.0, 1.0]) –

    Valid range of aspect ratio of the cropping windows.

    This parameter can be specified as either two values (min, max) or six values (three pairs), depending on the dimensionality of the input.

    • For 2D bounding boxes, one range of valid aspect ratios (x/y) should be provided (e.g. [min_xy, max_xy]).
    • For 3D bounding boxes, three separate aspect ratio ranges may be specified, for x/y, x/z and y/z pairs of dimensions.
      They are provided in the following order [min_xy, max_xy, min_xz, max_xz, min_yz, max_yz]. Alternatively, if only one aspect ratio range is provided, it will be used for all three pairs of dimensions.

    The value for min should be greater than 0.0, and min should be less than or equal to the max value. By default, square windows are generated.

    Note

    Providing aspect_ratio and scaling is incompatible with explicitly specifying crop_shape.

    Note

    If input_shape is provided, it will be taken into account for the calculation of the cropping window aspect ratio. Otherwise, the aspect ratio ranges are relative to the image dimensions. In other words, when input_shape is not specified, an aspect ratio of 1.0 is equivalent to the original aspect ratio of the image.

  • bbox_layout (layout str, optional, default = ‘’) –

    Determines the meaning of the coordinates of the bounding boxes.

    The value of this argument is a string containing the following characters:

    x (horizontal start anchor), y (vertical start anchor), z (depthwise start anchor),
    X (horizontal end anchor),   Y (vertical end anchor),   Z (depthwise end anchor),
    W (width),                   H (height),                D (depth).
    

    Note

    If this value is left empty, depending on the number of dimensions, “xyXY” or “xyzXYZ” is assumed.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • crop_shape (int or list of int or TensorList of int, optional, default = []) –

    If provided, the random crop window dimensions will be fixed to this shape.

    The order of dimensions is determined by the layout provided in shape_layout.

    Note

    When providing crop_shape, input_shape should be provided as well. Providing explicit crop_shape is incompatible with using scaling and aspect_ratio arguments.

  • input_shape (int or list of int or TensorList of int, optional, default = []) –

    Specifies the shape of the original input image.

    The order of dimensions is determined by the layout that is provided in shape_layout.

  • ltrb (bool, optional, default = True) –

    If set to True, bboxes are returned as [left, top, right, bottom]; otherwise they are provided as [left, top, width, height].

    Warning

    This argument has been deprecated. To specify the bbox encoding, use bbox_layout instead. For example, ltrb=True is equal to bbox_layout=”xyXY”, and ltrb=False corresponds to bbox_layout=”xyWH”.

  • num_attempts (int, optional, default = 1) –

    Number of attempts to get a crop window that matches the aspect_ratio and a selected value from thresholds.

    After each num_attempts, a different threshold will be picked, until the threshold reaches a maximum of total_num_attempts (if provided) or otherwise indefinitely.

  • output_bbox_indices (bool, optional, default = False) – If set to True, an extra output will be returned, containing the original indices of the bounding boxes that passed the centroid filter and are present in the output bounding boxes.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • scaling (float or list of float, optional, default = [1.0, 1.0]) –

    Range [min, max] for the crop size with respect to the original image dimensions.

    The value of min and max must satisfy the condition 0.0 <= min <= max.

    Note

    Providing aspect_ratio and scaling is incompatible when explicitly specifying the crop_shape value.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shape_layout (layout str, optional, default = ‘’) –

    Determines the meaning of the dimensions provided in crop_shape and input_shape.

    The values are:

    • W (width)

    • H (height)

    • D (depth)

    Note

    If left empty, depending on the number of dimensions "WH" or "WHD" will be assumed.

  • threshold_type (str, optional, default = ‘iou’) –

    Determines the meaning of thresholds.

    By default, thresholds refers to the intersection-over-union (IoU) of the bounding boxes with respect to the cropping window. Alternatively, the threshold can be set to “overlap” to specify the fraction (by area) of the bounding box that will will fall inside the crop window. For example, a threshold value of 1.0 means the entire bounding box must be contained in the resulting cropping window.

  • thresholds (float or list of float, optional, default = [0.0]) –

    Minimum IoU or a different metric, if specified by threshold_type, of the bounding boxes with respect to the cropping window.

    Each sample randomly selects one of the thresholds, and the operator will complete up to the specified number of attempts to produce a random crop window that has the selected metric above that threshold. See num_attempts for more information about configuring the number of attempts.

  • total_num_attempts (int, optional, default = -1) –

    If provided, it indicates the total maximum number of attempts to get a crop window that matches the aspect_ratio and any selected value from thresholds.

    After total_num_attempts attempts, the best candidate will be selected.

    If this value is not specified, the crop search will continue indefinitely until a valid crop is found.

    Warning

    If you do not provide a total_num_attempts value, this can result in an infinite loop if the conditions imposed by the arguments cannot be satisfied.