Loads and decodes video files using FFmpeg and NVDECODE, which is the hardware-accelerated video decoding feature in the NVIDIA(R) GPU.
The video streams can be in most of the container file formats. FFmpeg is used to parse video containers and returns a batch of sequences of
sequence_lengthframes with shape
(N, F, H, W, C), where
Nis the batch size, and
Fis the number of frames). This class only supports constant frame rate videos.
Containers which doesn’t support indexing, like mpeg, requires DALI to seek to the sequence when each new sequence needs to be decoded.
- Supported backends
- Keyword Arguments
sequence_length (int) – Frames to load per sequence.
additional_decode_surfaces (int, optional, default = 2) –
Additional decode surfaces to use beyond minimum required.
This argument is ignored when the decoder cannot determine the minimum number of decode surfaces
This can happen when the driver is an older version.
This parameter can be used to trade off memory usage with performance.
bytes_per_sample_hint (int or list of int, optional, default = ) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
channels (int, optional, default = 3) – Number of channels.
dont_use_mmap (bool, optional, default = False) –
If set to True, the Loader will use plain file I/O instead of trying to map the file in memory.
Mapping provides a small performance benefit when accessing a local file system, but most network file systems, do not provide optimum performance.
dtype (nvidia.dali.types.DALIDataType, optional, default = DALIDataType.UINT8) –
Output data type.
enable_frame_num (bool, optional, default = False) – If the
filenamesargument is passed, returns the frame number output.
enable_timestamps (bool, optional, default = False) – If the
filenamesargument is passed, returns the timestamps output.
file_list (str, optional, default = ‘’) –
Path to the file with a list of
file label [start_frame [end_frame]]values.
Positive value means the exact frame, negative counts as a Nth frame from the end (it follows python array indexing schema), equal values for the start and end frame would yield an empty sequence and a warning. This option is mutually exclusive with
file_list_frame_num (bool, optional, default = False) –
If the start/end timestamps are provided in file_list, you can interpret them as frame numbers instead of as timestamps.
If floating point values have been provided, the start frame number will be rounded up and the end frame number will be rounded down.
Frame numbers start from 0.
file_list_include_preceding_frame (bool, optional, default = False) –
Changes the behavior how
file_liststart and end frame timestamps are translated to a frame number.
If the start/end timestamps are provided in file_list as timestamps, the start frame is calculated as
ceil(start_time_stamp * FPS)and the end as
floor(end_time_stamp * FPS). If this argument is set to True, the equation changes to
floor(start_time_stamp * FPS)and
ceil(end_time_stamp * FPS)respectively. In effect, the first returned frame is not later, and the end frame not earlier, than the provided timestamps. This behavior is more aligned with how the visible timestamps are correlated with displayed video frames.
file_list_frame_numis set to True, this option does not take any effect.
This option is available for legacy behavior compatibility.
file_root (str, optional, default = ‘’) –
Path to a directory that contains the data files.
This option is mutually exclusive with
filenames (str or list of str, optional, default = ) –
File names of the video files to load.
This option is mutually exclusive with
image_type (nvidia.dali.types.DALIImageType, optional, default = DALIImageType.RGB) – The color space of the output frames (RGB or YCbCr).
initial_fill (int, optional, default = 1024) –
Size of the buffer that is used for shuffling.
random_shuffleis False, this parameter is ignored.
labels (int or list of int, optional) –
Labels associated with the files listed in
If an empty list is provided, sequential 0-based indices are used as labels. If not provided, no labels will be yielded.
lazy_init (bool, optional, default = False) – Parse and prepare the dataset metadata only during the first run instead of in the constructor.
normalized (bool, optional, default = False) – Gets the output as normalized data.
num_shards (int, optional, default = 1) –
Partitions the data into the specified number of parts (shards).
This is typically used for multi-GPU or multi-node training.
pad_last_batch (bool, optional, default = False) –
If set to True, pads the shard by repeating the last sample.
If the number of batches differs across shards, this option can cause an entire batch of repeated samples to be added to the dataset.
pad_sequences (bool, optional, default = False) –
Allows creation of incomplete sequences if there is an insufficient number of frames at the very end of the video.
Redundant frames are zeroed. Corresponding time stamps and frame numbers are set to -1.
prefetch_queue_depth (int, optional, default = 1) –
Specifies the number of batches to be prefetched by the internal Loader.
This value should be increased when the pipeline is CPU-stage bound, trading memory consumption for better interleaving with the Loader thread.
preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
random_shuffle (bool, optional, default = False) –
Determines whether to randomly shuffle data.
A prefetch buffer with a size equal to
initial_fillis used to read data sequentially, and then samples are selected randomly to form a batch.
read_ahead (bool, optional, default = False) –
Determines whether the accessed data should be read ahead.
For large files such as LMDB, RecordIO, or TFRecord, this argument slows down the first access but decreases the time of all of the following accesses.
seed (int, optional, default = -1) –
If not provided, it will be populated based on the global seed of the pipeline.
shard_id (int, optional, default = 0) – Index of the shard to read.
skip_cached_images (bool, optional, default = False) –
If set to True, the loading data will be skipped when the sample is in the decoder cache.
In this case, the output of the loader will be empty.
skip_vfr_check (bool, optional, default = False) –
Skips the check for the variable frame rate (VFR) videos.
Use this flag to suppress false positive detection of VFR videos.
When the dataset indeed contains VFR files, setting this flag may cause the decoder to malfunction.
step (int, optional, default = -1) –
Frame interval between each sequence.
When the value is less than 0,
stepis set to
stick_to_shard (bool, optional, default = False) –
Determines whether the reader should stick to a data shard instead of going through the entire dataset.
If decoder caching is used, it significantly reduces the amount of data to be cached, but might affect accuracy of the training.
stride (int, optional, default = 1) – Distance between consecutive frames in the sequence.
tensor_init_bytes (int, optional, default = 1048576) – Hint for how much memory to allocate per image.