VideoNeVA

User Guide (Latest Version)

VideoNeVA adds support for video modality in NeVa by representing video as multiple image frames.

To enable pretraining on video input data, a minor modification has been made to the MegatronNevaModel class within the nemo.collections.multimodal.models.multimodal_llm.neva.neva_model module.

Representing video input as a series of images is handled by the TarOrFolderVideoLoader class in the nemo.collections.multimodal.data.neva module. This process utilizes Decord, which offers convenient video slicing methods.

Copy
Copied!
            

data: media_type: video splice_single_frame: null num_frames: 8 image_token_len: 256 image_folder: null video_folder: null

  • media_type: When set to video, NeVa’s dataloader performs additional preprocessing steps to represent the input video data as a series of image frames.

  • splice_single_frame: This parameter can be set to either first, middle, or last. It determines which specific frame within the video will be selected.

  • image_token_len: The NeVa dataloader calculates image_token_len based on the height and width of the preprocessed image frame and the patch size of the CLIP model being used.

Copy
Copied!
            

image_token_len = (224 // 14) * (224 // 14) = 16 * 16 = 256

  • num_frames: This parameter is used to select the number of image frames that will be used to represent the video.

  • video_folder: This parameter specifies the directory where the video files are located. This follows the same format as NeVa’s image_folder.

Feature

Training

Inference

Data parallelism Yes N/A
Tensor parallelism Yes Yes
Pipeline parallelism No No
Sequence parallelism No No
Activation checkpointing Yes (Uniform or Block) No
FP32/TF32 Yes Yes (FP16 enabled by default)
AMP/FP16 No Yes
AMP/BF16 Yes No
BF16 O2 Yes No
TransformerEngine/FP8 No No
Multi-GPU Yes Yes
Multi-Node Yes Yes
Inference deployment N/A NVIDIA Triton supported
SW stack support Slurm DeepOps/Base Command Manager/Base Command Platform Slurm DeepOps/Base Command Manager/Base Command Platform
NVfuser No N/A
Distributed Optimizer No N/A
TorchInductor No N/A
Flash Attention Yes N/A
Previous Performance
Next Data Preparation
© | | | | | | |. Last updated on May 30, 2024.