nvidia.dali.fn.experimental.inputs.video¶
- nvidia.dali.fn.experimental.inputs.video(*inputs, **kwargs)¶
Streams and decodes a video from a memory buffer. To be used with long and high resolution videos.
Returns a batch of sequences of frames, with the layout:
(F, H, W, C)
, where:F
- number of frames in a sequence,H
- height of the frame,W
- width of the frame,C
- number of channels in the frame.
When using
fn.inputs.video
operator inside the DALI Pipeline, the user needs to provide the data usingPipeline.feed_input()
. When the Operator is fed with data, the Pipeline can be run multiple times and thefn.inputs.video
operator will return consecutive sequences, as long as there is enough data to decode. When the source of the frames (the video file) depletes, user needs to call anotherfeed_input
again to provide the next video file to the operator. This Operator has an inner-queue for the data, so thefeed_input
may be called multiple times and when given video file ends, the Operator will fetch the next one automatically from the top of the queue. Running the pipeline while there is no data for thefn.inputs.video
to run results in an error.This operator takes only one video as and input (i.e.
input_batch_size=1
) and will return batches of sequences. Every output batch will have themax_batch_size
samples, set during the Pipeline creation. When the number of frames in the video file does not allow to split the frames uniformly across batches, the last batch returned by this operator for a given video will be partial and the last sequence in this batch will be determined usinglast_sequence_policy
parameter. For example:This is a video that consists of 67 frames (every '-' is a frame): ------------------------------------------------------------------- User decided that there shall be 5 frames per sequence and the last_sequence_policy='partial': ------------------------------------------------------------------- [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][] ------------------------------------------------------------------- Since there are not enough frames, the last sequence comprises 2 frames. The Pipeline has max_batch_size=3, therefore the operator will return 5 batches of sequences. First 4 batches comprise 3 sequences and the last batch is partial and comprises 2 sequences. --------------- --------------- --------------- --------------- ------- [ ][ ][ ] [ ][ ][ ] [ ][ ][ ] [ ][ ][ ] [ ][] --------------- --------------- --------------- --------------- ------- With the last_sequence_policy='pad', the last sequence of the last batch will be padded with 0: --------------- --------------- --------------- --------------- -------000 [ ][ ][ ] [ ][ ][ ] [ ][ ][ ] [ ][ ][ ] [ ][ ] --------------- --------------- --------------- --------------- -------000
The difference between
fn.inputs.video
andfn.readers.video
is that the former reads an encoded video from memory and the latter reads the encoded video from disk.The difference between
fn.inputs.video
andfn.decoders.video
is that the former does not decode the whole video file in one go. This behaviour is needed for longer videos. E.g. 5-min, 4k, 30fps decoded video takes about 1.7 TB of memory.This operator accepts most of the video containers and file formats. FFmpeg is used to parse the video container. In the situations, that the container does not contain required metadata (e.g. frames sizes, number of frames, etc…), the operator needs to find it out itself, which may result in a slowdown.
- Supported backends
‘cpu’
‘mixed’
- Keyword Arguments:
sequence_length (int) – Number of frames in each sequence.
affine (bool, optional, default = True) –
Applies only to the mixed backend type. If set to True, each thread in the internal thread pool will be tied to a specific CPU core.
Otherwise, the threads can be reassigned to any CPU core by the operating system.
blocking (bool, optional, default = True) – If
True
, this operator will block until the data is available (e.g. by callingfeed_input
). IfFalse
, the operator will raise an error, if the data is not available.bytes_per_sample_hint (int or list of int, optional, default = [0]) –
Output size hint, in bytes per sample.
If specified, the operator’s outputs residing in GPU or page-locked host memory will be preallocated to accommodate a batch of samples of this size.
last_sequence_policy (str, optional, default = ‘partial’) –
Specifies, how to handle the last sequence in the video file.
For a given number of frames in the video file and
frames_per_sequence
parameter, it might happen that the video can’t be split uniformly across sequences. If thelast_sequence_policy='partial'
, the last sequence might have fewer frames thanframes_per_sequence
value specified. If thelast_sequence_policy='partial'
, the last sequence will always haveframes_per_sequence
frames and will be padded with empty frames.Allowed values are
'partial'
and'pad'
.no_copy (bool, optional, default = False) –
Determines whether DALI should copy the buffer when
feed_input
is called.If set to True, DALI passes the user’s memory directly to the pipeline, instead of copying it. It is the user’s responsibility to keep the buffer alive and unmodified until it is consumed by the pipeline.
The buffer can be modified or freed again after the outputs of the relevant iterations have been consumed. Effectively, it happens after
prefetch_queue_depth
orcpu_queue_depth * gpu_queue_depth
(when they are not equal) iterations following thefeed_input
call.The memory location must match the specified
device
parameter of the operator. For the CPU, the provided memory can be one contiguous buffer or a list of contiguous Tensors. For the GPU, to avoid extra copy, the provided buffer must be contiguous. If you provide a list of separate Tensors, there will be an additional copy made internally, consuming both memory and bandwidth.preserve (bool, optional, default = False) – Prevents the operator from being removed from the graph even if its outputs are not used.
seed (int, optional, default = -1) –
Random seed.
If not provided, it will be populated based on the global seed of the pipeline.