nvidia.dali.fn.experimental.decoders.video#

nvidia.dali.fn.experimental.decoders.video(__encoded, /, *, affine=True, build_index=True, bytes_per_sample_hint=[0], end_frame=None, fill_value=[0], frames=None, pad_mode='constant', preserve=False, sequence_length=None, start_frame=None, stride=None, device=None, name=None)#

Decodes videos from in-memory streams.

The operator supports most common video container formats using libavformat (FFmpeg). The operator utilizes either libavcodec (FFmpeg) or NVIDIA Video Codec SDK (NVDEC) for decoding the frames.

The following video codecs are supported by both CPU and Mixed backends:

  • H.264/AVC

  • H.265/HEVC

  • VP8

  • VP9

  • MJPEG

The following codecs are supported by the Mixed backend only:

  • AV1

  • MPEG-4

Each output sample is a sequence of frames with shape (F, H, W, C) where:

  • F is the number of frames in the sequence (can vary between samples)

  • H is the frame height in pixels

  • W is the frame width in pixels

  • C is the number of color channels

The operator provides several ways to select which frames to extract from the video:

  • Using no frame selection arguments:

    • When no frame selection arguments are provided, all frames in the video are decoded

    • Frames are extracted sequentially from start to end with stride=1

    • For example, a 10-frame video would extract frames [0,1,2,3,4,5,6,7,8,9]

  • Using the frames argument:

    • Accepts a list of frame indices to extract from the video

    • Frame indices can be specified in any order and can repeat frames

    • Each index must be non-negative and may exceed the bounds of the video, if the pad_mode is not none

  • Using start_frame, end_frame and stride:

    • start_frame: First frame to extract (default: 0)

    • end_frame: Last frame to extract (exclusive)

    • stride: Number of frames to skip between each extracted frame (default: 1)

    • Extracts frames in the range [start_frame, end_frame) advancing by stride

    • For example, with start_frame=0, end_frame=10, stride=2 extracts frames [0,2,4,6,8]

  • Using start_frame, sequence_length and stride:

    • start_frame: First frame to extract (default: 0)

    • sequence_length: Number of frames to extract

    • stride: Number of frames to skip between each extracted frame (default: 1)

    • Extracts sequence_length frames starting at start_frame, advancing by stride

    • For example, with start_frame=0, sequence_length=5, stride=2 extracts frames [0,2,4,6,8]

If the requested frames exceed the bounds of the video, the behavior depends on pad_mode. If pad_mode is none, it causes an error. Otherwise, the sequence is padded according to the pad_mode argument (see pad_mode for details).

Example 1: Extract a sequence of arbitrary frames:

video_decoder = dali.experimental.decoders.video(
    encoded=encoded_video,
    frames=[0, 10, 20, 30, 40, 50, 40, 30, 20, 10, 0]
    ...,
)

Example 2: Extract a sequence of evenly spaced frames, starting from frame 0, with a stride of 2, until frame 20 (exclusive):

video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=0, end_frame=20, stride=2
    ...,
)

Example 3: Pad the sequence with the last frame in the video, until 100 frames are reached:

video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=0, sequence_length=100, stride=2, pad_mode="edge"
    ...,
)

Example 4: Pad the sequence with a constant value of 128, until 100 frames are reached:

video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=0, sequence_length=100, stride=2, pad_mode="constant", fill_value=128
    ...,

Example 5: Pad the sequence with a constant RGB value of (118, 185, 0), until 100 frames are reached:

video_decoder = dali.experimental.decoders.Video(
    encoded=encoded_video,
    start_frame=0, sequence_length=100, stride=2, pad_mode="constant", fill_value=[118, 185, 0]
    ...,
Supported backends
  • ‘cpu’

  • ‘mixed’

Parameters:

__encoded (TensorList) – Encoded video stream

Keyword Arguments:
  • affine (bool, optional, default = True) –

    Whether to pin threads to CPU cores (mixed backend only).

    If True, each thread in the internal thread pool will be pinned to a specific CPU core. If False, threads can migrate between cores based on OS scheduling.

  • build_index (bool, optional, default = True) –

    Controls whether to build a frame index during initialization.

    Building an index allows faster seeking to specific frames, but requires additional CPU memory to store frame metadata and longer initialization time to scan the entire video file. The index stores metadata, such as whether it is a key frame and the presentation timestamp (PTS).

    Building an index is particularly useful when decoding a small number of frames spaced far apart or starting playback from a frame deep into the video.

  • bytes_per_sample_hint (int or list of int, optional, default = [0]) –

    Output size hint, in bytes per sample.

    If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.

  • end_frame (int or TensorList of int, optional) – Last frame to extract from each video (exclusive). Cannot be used with frames or sequence_length.

  • fill_value (int or list of int, optional, default = [0]) –

    Value(s) used to pad missing frames when pad_mode='constant'’.

    Each value must be in range [0, 255]. If a single value is provided, it will be used for all channels. Otherwise, the number of values must match the number of channels in the video.

  • frames (int or list of int or TensorList of int, optional) –

    Specifies which frames to extract from each video by their indices.

    The indices can be provided in any order and can include duplicates. For example, [0,10,5,10] would extract:

    • Frame 0 (first frame)

    • Frame 10

    • Frame 5

    • Frame 10 (again)

    This argument cannot be used together with start_frame, sequence_length, stride.

  • pad_mode (str or TensorList of str, optional, default = ‘constant’) –

    How to handle videos with insufficient frames when using start_frame/sequence_length/stride:

    • 'none': Return shorter sequences if not enough frames: ABC -> ABC

    • 'constant': Pad with a fixed value (specified by pad_value): ABC -> ABCPPP

    • 'edge' or 'repeat': Repeat the last valid frame: ABC -> ABCCCC

    • 'reflect_1001' or 'symmetric': Reflect padding, including the last element: ABC -> ABCCBA

    • 'reflect_101' or 'reflect': Reflect padding, not including the last element: ABC -> ABCBA

    Not relevant when using frames argument.

  • preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.

  • sequence_length (int or TensorList of int, optional) – Number of frames to extract from each video. Cannot be used together with frames or end_frame arguments.

  • start_frame (int or TensorList of int, optional) – Index of the first frame to extract from each video. Cannot be used together with frames argument.

  • stride (int or TensorList of int, optional) – Number of frames to skip between each extracted frame. Cannot be used together with frames argument.