nvidia.dali.fn.experimental.decoders.video#
- nvidia.dali.fn.experimental.decoders.video(__encoded, /, *, affine=True, build_index=True, bytes_per_sample_hint=[0], end_frame=None, fill_value=[0], frames=None, pad_mode='constant', preserve=False, sequence_length=None, start_frame=None, stride=None, device=None, name=None)#
Decodes videos from in-memory streams.
The operator supports most common video container formats using libavformat (FFmpeg). The operator utilizes either libavcodec (FFmpeg) or NVIDIA Video Codec SDK (NVDEC) for decoding the frames.
The following video codecs are supported by both CPU and Mixed backends:
H.264/AVC
H.265/HEVC
VP8
VP9
MJPEG
The following codecs are supported by the Mixed backend only:
AV1
MPEG-4
Each output sample is a sequence of frames with shape
(F, H, W, C)
where:F
is the number of frames in the sequence (can vary between samples)H
is the frame height in pixelsW
is the frame width in pixelsC
is the number of color channels
The operator provides several ways to select which frames to extract from the video:
Using no frame selection arguments:
When no frame selection arguments are provided, all frames in the video are decoded
Frames are extracted sequentially from start to end with stride=1
For example, a 10-frame video would extract frames [0,1,2,3,4,5,6,7,8,9]
Using the
frames
argument:Accepts a list of frame indices to extract from the video
Frame indices can be specified in any order and can repeat frames
Each index must be non-negative and may exceed the bounds of the video, if the
pad_mode
is notnone
Using
start_frame
,end_frame
andstride
:start_frame
: First frame to extract (default: 0)end_frame
: Last frame to extract (exclusive)stride
: Number of frames to skip between each extracted frame (default: 1)Extracts frames in the range [start_frame, end_frame) advancing by stride
For example, with start_frame=0, end_frame=10, stride=2 extracts frames [0,2,4,6,8]
Using
start_frame
,sequence_length
andstride
:start_frame
: First frame to extract (default: 0)sequence_length
: Number of frames to extractstride
: Number of frames to skip between each extracted frame (default: 1)Extracts sequence_length frames starting at start_frame, advancing by stride
For example, with start_frame=0, sequence_length=5, stride=2 extracts frames [0,2,4,6,8]
If the requested frames exceed the bounds of the video, the behavior depends on
pad_mode
. If pad_mode isnone
, it causes an error. Otherwise, the sequence is padded according to thepad_mode
argument (seepad_mode
for details).Example 1: Extract a sequence of arbitrary frames:
video_decoder = dali.experimental.decoders.video( encoded=encoded_video, frames=[0, 10, 20, 30, 40, 50, 40, 30, 20, 10, 0] ..., )
Example 2: Extract a sequence of evenly spaced frames, starting from frame 0, with a stride of 2, until frame 20 (exclusive):
video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=0, end_frame=20, stride=2 ..., )
Example 3: Pad the sequence with the last frame in the video, until 100 frames are reached:
video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=0, sequence_length=100, stride=2, pad_mode="edge" ..., )
Example 4: Pad the sequence with a constant value of 128, until 100 frames are reached:
video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=0, sequence_length=100, stride=2, pad_mode="constant", fill_value=128 ...,
Example 5: Pad the sequence with a constant RGB value of (118, 185, 0), until 100 frames are reached:
video_decoder = dali.experimental.decoders.Video( encoded=encoded_video, start_frame=0, sequence_length=100, stride=2, pad_mode="constant", fill_value=[118, 185, 0] ...,
- Supported backends
‘cpu’
‘mixed’
- Parameters:
__encoded¶ (TensorList) – Encoded video stream
- Keyword Arguments:
affine¶ (bool, optional, default = True) –
Whether to pin threads to CPU cores (mixed backend only).
If True, each thread in the internal thread pool will be pinned to a specific CPU core. If False, threads can migrate between cores based on OS scheduling.
build_index¶ (bool, optional, default = True) –
Controls whether to build a frame index during initialization.
Building an index allows faster seeking to specific frames, but requires additional CPU memory to store frame metadata and longer initialization time to scan the entire video file. The index stores metadata, such as whether it is a key frame and the presentation timestamp (PTS).
Building an index is particularly useful when decoding a small number of frames spaced far apart or starting playback from a frame deep into the video.
bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
end_frame¶ (int or TensorList of int, optional) – Last frame to extract from each video (exclusive). Cannot be used with
frames
orsequence_length
.fill_value¶ (int or list of int, optional, default = [0]) –
Value(s) used to pad missing frames when
pad_mode='constant'
’.Each value must be in range [0, 255]. If a single value is provided, it will be used for all channels. Otherwise, the number of values must match the number of channels in the video.
frames¶ (int or list of int or TensorList of int, optional) –
Specifies which frames to extract from each video by their indices.
The indices can be provided in any order and can include duplicates. For example,
[0,10,5,10]
would extract:Frame 0 (first frame)
Frame 10
Frame 5
Frame 10 (again)
This argument cannot be used together with
start_frame
,sequence_length
,stride
.pad_mode¶ (str or TensorList of str, optional, default = ‘constant’) –
How to handle videos with insufficient frames when using start_frame/sequence_length/stride:
'none'
: Return shorter sequences if not enough frames: ABC -> ABC'constant'
: Pad with a fixed value (specified bypad_value
): ABC -> ABCPPP'edge'
or'repeat'
: Repeat the last valid frame: ABC -> ABCCCC'reflect_1001'
or'symmetric'
: Reflect padding, including the last element: ABC -> ABCCBA'reflect_101'
or'reflect'
: Reflect padding, not including the last element: ABC -> ABCBA
Not relevant when using
frames
argument.preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
sequence_length¶ (int or TensorList of int, optional) – Number of frames to extract from each video. Cannot be used together with
frames
orend_frame
arguments.start_frame¶ (int or TensorList of int, optional) – Index of the first frame to extract from each video. Cannot be used together with
frames
argument.stride¶ (int or TensorList of int, optional) – Number of frames to skip between each extracted frame. Cannot be used together with
frames
argument.