nvidia.dali.fn.experimental.readers.video#

nvidia.dali.fn.experimental.readers.video( *, device=None, name=None, bytes_per_sample_hint=[0], dont_use_mmap=False, enable_frame_num='none', enable_timestamps=False, file_list='', file_list_format='timestamps', file_list_include_end=True, file_list_rounding='start_down_end_up', file_root='', filenames=[], fill_value=[0], image_type=DALIImageType.RGB, initial_fill=1024, labels=None, lazy_init=False, num_shards=1, pad_last_batch=False, pad_mode='none', prefetch_queue_depth=1, preserve=False, random_shuffle=False, read_ahead=False, seed=-1, sequence_length, shard_id=0, skip_cached_images=False, step=-1, stick_to_shard=False, stride=1, tensor_init_bytes=1048576, uniform_sample=False, )#

Loads and decodes video files from disk.

The operator supports most common video container formats using libavformat (FFmpeg). The operator utilizes either libavcodec (FFmpeg) or NVIDIA Video Codec SDK (NVDEC) for decoding the frames.

The following video codecs are supported by both CPU and GPU backends:

VP8
VP9
MJPEG

The following codecs are supported by the GPU backend only:

AV1
MPEG-4
H.264/AVC
H.265/HEVC

The outputs of the operator are: video, [labels], [frame_num], [timestamps].

video: A sequence of frames with shape (F, H, W, C) where F is the number of frames in the sequence (can vary between samples), H is the frame height in pixels, W is the frame width in pixels, and C is the number of color channels.
labels: Label associated with the sample. Only available when using labels with filenames, or when using file_list or file_root.
frame_num: Frame number information. Shape and content depend on enable_frame_num:
- "scalar" or True: Index of the first frame in the decoded sequence, shape (1,).
- "sequence": Frame index of each decoded frame, shape (F,). Padded frames (e.g. when using pad_mode='constant') have index -1.
timestamps: Time in seconds of each frame in the sequence. Only available when enable_timestamps=True.

Supported backends

‘cpu’
‘gpu’

Keyword Arguments:

bytes_per_sample_hint¶ (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.

If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
dont_use_mmap¶ (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.

Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
enable_frame_num¶ (str, optional, default = ‘none’) –
Determines what frame number information is returned as an additional output.
- "none" or False (default): No frame number output.
- "scalar" or True: Returns the index of the first frame in the decoded sequence, shape (1,).
- "sequence": Returns the frame index of each decoded frame, shape (F,). For padded frames (e.g. when using pad_mode='constant'), the index is -1.
enable_timestamps¶ (bool, optional, default = False) – If set, returns the timestamp of the frames in the decoded sequence as an additional output.
file_list¶ (str, optional, default = ‘’) –
Path to the file with a list of file label [start [end]] values.

start and end are optional and can be used to specify the start and end of the video to load. The values can be interpreted differently depending on the file_list_format.

This option is mutually exclusive with filenames and file_root.
file_list_format¶ (str, optional, default = ‘timestamps’) –
How to interpret start/end values in file_list:
- frames: Use exact frame numbers (0-based). Negative values count from end.
- timestamps: Use timestamps in seconds.
Default: timestamps.
file_list_include_end¶ (bool, optional, default = True) – If true, include the end frame in the range. Default: true
file_list_rounding¶ (str, optional, default = ‘start_down_end_up’) –
How to handle non-exact frame matches:
- start_down_end_up (default): Round start down and end up
- start_up_end_down: Round start up and end down
- all_up: Round both up
- all_down: Round both down
file_root¶ (str, optional, default = ‘’) –
Path to a directory that contains the data files.

This option is mutually exclusive with filenames and file_list.
filenames¶ (str or list of str, optional, default = []) –
Absolute paths to the video files to load.

This option is mutually exclusive with file_root and file_list.
fill_value¶ (int or list of int, optional, default = [0]) –
Value(s) used to pad missing frames when pad_mode='constant'’.

Each value must be in range [0, 255]. If a single value is provided, it will be used for all channels. Otherwise, the number of values must match the number of channels in the video.
image_type¶ (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).
initial_fill¶ (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.

If random_shuffle is False, this parameter is ignored.
labels¶ (int or list of int, optional) – Labels associated with the files listed in filenames argument. If not provided, no labels will be yielded.
lazy_init¶ (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
num_shards¶ (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).

This is typically used for multi-GPU or multi-node training.
pad_last_batch¶ (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.

Note

If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
pad_mode¶ (str or TensorList of str, optional, default = ‘none’) –
How to handle videos with insufficient frames when using start_frame/sequence_length/stride:
- 'none': Return shorter sequences if not enough frames: ABC -> ABC
- 'constant': Pad with a fixed value (specified by pad_value): ABC -> ABCPPP
- 'edge' or 'repeat': Repeat the last valid frame: ABC -> ABCCCC
- 'reflect_1001' or 'symmetric': Reflect padding, including the last element: ABC -> ABCCBA
- 'reflect_101' or 'reflect': Reflect padding, not including the last element: ABC -> ABCBA
Not relevant when using frames argument.
prefetch_queue_depth¶ (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.

This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve¶ (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle¶ (bool, optional, default = False) –
Determines whether to randomly shuffle data.

A prefetch buffer with a size equal to initial_fill is used to read data sequentially, and then samples are selected randomly to form a batch.
read_ahead¶ (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.

For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed¶ (int, optional, default = -1) – Random seed; if not set, one will be assigned automatically.
sequence_length¶ (int) – Frames to load per sequence.
shard_id¶ (int, optional, default = 0) – Index of the shard to read.
skip_cached_images¶ (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.

In this case, the output of the loader will be empty.
step¶ (int, optional, default = -1) –
Frame interval between each sequence.

When the value is less than 0, step is set to sequence_length.
stick_to_shard¶ (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.

If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
stride¶ (int, optional, default = 1) – Distance between consecutive frames in the sequence.
tensor_init_bytes¶ (int, optional, default = 1048576) – Hint for how much memory to allocate per image.
uniform_sample¶ (bool, optional, default = False) –
If set to True, uniformly samples sequence_length frames from the full video (or from the video range defined by file_list), regardless of the video length.

The sampled frame indices correspond to numpy.linspace(start, end-1, sequence_length) rounded to the nearest integer using floor(x + 0.5) (rounds half away from zero, matching C++ std::round — not NumPy’s default banker’s rounding).

If sequence_length exceeds the number of frames in the video, frames are repeated rather than padded. For example, sampling 5 frames from a 3-frame video yields indices [0, 1, 1, 2, 2]. A single-frame video always produces a sequence of identical frames.

When enabled, each video file produces exactly one sample per epoch. The stride, step, and pad_mode arguments are ignored.