nvidia.dali.fn.readers.video_resize

nvidia.dali.fn.readers.video_resize(*, sequence_length, additional_decode_surfaces=2, antialias=True, bytes_per_sample_hint=[0], channels=3, dont_use_mmap=False, dtype=DALIDataType.UINT8, enable_frame_num=False, enable_timestamps=False, file_list='', file_list_frame_num=False, file_list_include_preceding_frame=False, file_root='', filenames=[], image_type=DALIImageType.RGB, initial_fill=1024, interp_type=DALIInterpType.INTERP_LINEAR, labels=None, lazy_init=False, mag_filter=DALIInterpType.INTERP_LINEAR, max_size=None, min_filter=DALIInterpType.INTERP_LINEAR, minibatch_size=32, mode='default', normalized=False, num_shards=1, pad_last_batch=False, pad_sequences=False, prefetch_queue_depth=1, preserve=False, random_shuffle=False, read_ahead=False, resize_longer=0.0, resize_shorter=0.0, resize_x=0.0, resize_y=0.0, resize_z=0.0, roi_end=None, roi_relative=False, roi_start=None, seed=-1, shard_id=0, size=None, skip_cached_images=False, skip_vfr_check=False, step=-1, stick_to_shard=False, stride=1, subpixel_scale=True, temp_buffer_hint=0, tensor_init_bytes=1048576, device=None, name=None)

Loads, decodes and resizes video files with FFmpeg and NVDECODE, which is NVIDIA GPU’s hardware-accelerated video decoding.

The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences with shape (N, F, H, W, C), with N being the batch size, and F the number of frames in the sequence.

This operator combines the features of nvidia.dali.fn.video_reader() and nvidia.dali.fn.resize().

Note

The decoder supports only constant frame-rate videos.

Supported backends
  • ‘gpu’

Keyword Arguments:
  • sequence_length (int) – Frames to load per sequence.

  • additional_decode_surfaces (int, optional, default = 2) –

    Additional decode surfaces to use beyond minimum required.

    This argument is ignored when the decoder cannot determine the minimum number of decode surfaces

    Note

    This can happen when the driver is an older version.

    This parameter can be used to trade off memory usage with performance.

  • antialias (bool, optional, default = True) –

    If enabled, it applies an antialiasing filter when scaling down.

    Note

    Nearest neighbor interpolation does not support antialiasing.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • channels (int, optional, default = 3) – Number of channels.

  • dont_use_mmap (bool, optional, default = False) –

    If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

    Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.

  • dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –

    Output data type.

    Supported types: UINT8 or FLOAT.

  • enable_frame_num (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the frame number output.

  • enable_timestamps (bool, optional, default = False) – If the file_list or filenames argument is passed, returns the timestamps output.

  • file_list (str, optional, default = ‘’) –

    Path to the file with a list of file label [start_frame [end_frame]] values.

    Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with filenames and file_root.

  • file_list_frame_num (bool, optional, default = False) –

    If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.

    If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.

    Frame numbers start from 0.

  • file_list_include_preceding_frame (bool, optional, default = False) –

    Changes the behavior how file_list start and end frame timestamps are translated to a frame number.

    If the start/end timestamps are provided in file_list as timestamps, the start frame is calculated as ceil(start_time_stamp * FPS) and the end as floor(end_time_stamp * FPS). If this argument is set to True, the equation changes to floor(start_time_stamp * FPS) and ceil(end_time_stamp * FPS) respectively. In effect, the first returned frame is not later, and the end frame not earlier, than the provided timestamps. This behavior is more aligned with how the visible timestamps are correlated with displayed video frames.

    Note

    When file_list_frame_num is set to True, this option does not take any effect.

    Warning

    This option is available for legacy behavior compatibility.

  • file_root (str, optional, default = ‘’) –

    Path to a directory that contains the data files.

    This option is mutually exclusive with filenames and file_list.

  • filenames (str or list of str, optional, default = []) –

    File names of the video files to load.

    This option is mutually exclusive with file_list and file_root.

  • image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).

  • initial_fill (int, optional, default = 1024) –

    Size of the buffer that is used for shuffling.

    If random_shuffle is False, this parameter is ignored.

  • interp_type (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) –

    Type of interpolation to be used.

    Use min_filter and mag_filter to specify different filtering for downscaling and upscaling.

    Note

    Usage of INTERP_TRIANGULAR is now deprecated and it should be replaced by a combination of

    INTERP_LINEAR with antialias enabled.

  • labels (int or list of int, optional) –

    Labels associated with the files listed in filenames argument.

    If an empty list is provided, sequential 0-based indices are used as labels. If not provided, no labels will be yielded.

  • lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.

  • mag_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling up.

  • max_size (float or list of float, optional) –

    Limit of the output size.

    When the operator is configured to keep aspect ratio and only the smaller dimension is specified, the other(s) can grow very large. This can happen when using resize_shorter argument or “not_smaller” mode or when some extents are left unspecified.

    This parameter puts a limit to how big the output can become. This value can be specified per-axis or uniformly for all axes.

    Note

    When used with “not_smaller” mode or resize_shorter argument, max_size takes precedence and the aspect ratio is kept - for example, resizing with mode="not_smaller", size=800, max_size=1400 an image of size 1200x600 would be resized to 1400x700.

  • min_filter (nvidia.dali.types.DALIInterpType or TensorList of nvidia.dali.types.DALIInterpType, optional, default = DALIInterpType.INTERP_LINEAR) – Filter used when scaling down.

  • minibatch_size (int, optional, default = 32) – Maximum number of images that are processed in a kernel call.

  • mode (str, optional, default = ‘default’) –

    Resize mode.

    Here is a list of supported modes:

    • "default" - image is resized to the specified size.
      Missing extents are scaled with the average scale of the provided ones.
    • "stretch" - image is resized to the specified size.
      Missing extents are not scaled at all.
    • "not_larger" - image is resized, keeping the aspect ratio, so that no extent of the output image exceeds the specified size.
      For example, a 1280x720, with a desired output size of 640x480, actually produces a 640x360 output.
    • "not_smaller" - image is resized, keeping the aspect ratio, so that no extent of the output image is smaller than specified.
      For example, a 640x480 image with a desired output size of 1920x1080, actually produces a 1920x1440 output.

      This argument is mutually exclusive with resize_longer and resize_shorter

  • normalized (bool, optional, default = False) – Gets the output as normalized data.

  • num_shards (int, optional, default = 1) –

    Partitions the data into the specified number of parts (shards).

    This is typically used for multi-GPU or multi-node training.

  • pad_last_batch (bool, optional, default = False) –

    If set to True, pads the shard by repeating the last sample.

    Note

    If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.

  • pad_sequences (bool, optional, default = False) –

    Allows creation of incomplete sequences if there is an insufficient number of frames at the very end of the video.

    Redundant frames are zeroed. Corresponding time stamps and frame numbers are set to -1.

  • prefetch_queue_depth (int, optional, default = 1) –

    Specifies the number of batches to be prefetched by the internal Loader.

    This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • random_shuffle (bool, optional, default = False) –

    Determines whether to randomly shuffle data.

    A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.

  • read_ahead (bool, optional, default = False) –

    Determines whether the accessed data should be read ahead.

    For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.

  • resize_longer (float or TensorList of float, optional, default = 0.0) –

    The length of the longer dimension of the resized image.

    This option is mutually exclusive with resize_shorter and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_larger".

  • resize_shorter (float or TensorList of float, optional, default = 0.0) –

    The length of the shorter dimension of the resized image.

    This option is mutually exclusive with resize_longer and explicit size arguments, and the operator keeps the aspect ratio of the original image. This option is equivalent to specifying the same size for all dimensions and mode="not_smaller". The longer dimension can be bounded by setting the max_size argument. See max_size argument doc for more info.

  • resize_x (float or TensorList of float, optional, default = 0.0) –

    The length of the X dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_y is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_y (float or TensorList of float, optional, default = 0.0) –

    The length of the Y dimension of the resized image.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x is unspecified or 0, the operator keeps the aspect ratio of the original image. A negative value flips the image.

  • resize_z (float or TensorList of float, optional, default = 0.0) –

    The length of the Z dimension of the resized volume.

    This option is mutually exclusive with resize_shorter, resize_longer and size. If the resize_x and resize_y are left unspecified or 0, then the op will keep the aspect ratio of the original volume. Negative value flips the volume.

  • roi_end (float or list of float or TensorList of float, optional) –

    End of the input region of interest (ROI).

    Must be specified together with roi_start. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • roi_relative (bool, optional, default = False) – If true, ROI coordinates are relative to the input size, where 0 denotes top/left and 1 denotes bottom/right

  • roi_start (float or list of float or TensorList of float, optional) –

    Origin of the input region of interest (ROI).

    Must be specified together with roi_end. The coordinates follow the tensor shape order, which is the same as size. The coordinates can be either absolute (in pixels, which is the default) or relative (0..1), depending on the value of relative_roi argument. If the ROI origin is greater than the ROI end in any dimension, the region is flipped in that dimension.

  • seed (int, optional, default = -1) –

    Random seed.

    If not provided, it will be populated based on the global seed of the pipeline.

  • shard_id (int, optional, default = 0) – Index of the shard to read.

  • size (float or list of float or TensorList of float, optional) –

    The desired output size.

    Must be a list/tuple with one entry per spatial dimension, excluding video frames and channels. Dimensions with a 0 extent are treated as absent, and the output size will be calculated based on other extents and mode argument.

  • skip_cached_images (bool, optional, default = False) –

    If set to True, the loading data will be skipped when the sample is in the decoder cache.

    In this case, the output of the loader will be empty.

  • skip_vfr_check (bool, optional, default = False) –

    Skips the check for the variable frame rate (VFR) videos.

    Use this flag to suppress false positive detection of VFR videos.

    Warning

    When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.

  • step (int, optional, default = -1) –

    Frame interval between each sequence.

    When the value is less than 0, step is set to sequence_length.

  • stick_to_shard (bool, optional, default = False) –

    Determines whether the reader should stick to a data shard instead of going through the entire dataset.

    If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.

  • stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.

  • subpixel_scale (bool, optional, default = True) –

    If True, fractional sizes, directly specified or calculated, will cause the input ROI to be adjusted to keep the scale factor.

    Otherwise, the scale factor will be adjusted so that the source image maps to the rounded output size.

  • temp_buffer_hint (int, optional, default = 0) –

    Initial size in bytes, of a temporary buffer for resampling.

    Note

    This argument is ignored for the CPU variant.

  • tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.