Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
VideoNeVA
VideoNeVA adds support for video modality in NeVa by representing video as multiple image frames.
To enable pretraining on video input data, a minor modification has been made to the MegatronNevaModel class within the nemo.collections.multimodal.models.multimodal_llm.neva.neva_model
module.
Representing video input as a series of images is handled by the TarOrFolderVideoLoader class in the nemo.collections.multimodal.data.neva
module. This process utilizes Decord, which offers convenient video slicing methods.
Configure VideoNeVA
data:
media_type: video
splice_single_frame: null
num_frames: 8
image_token_len: 256
image_folder: null
video_folder: null
media_type
: When set to video, NeVa’s dataloader performs additional preprocessing steps to represent the input video data as a series of image frames.splice_single_frame
: This parameter can be set to either first, middle, or last. It determines which specific frame within the video will be selected.image_token_len
: The NeVa dataloader calculates image_token_len based on the height and width of the preprocessed image frame and the patch size of the CLIP model being used.
image_token_len = (224 // 14) * (224 // 14) = 16 * 16 = 256
num_frames
: This parameter is used to select the number of image frames that will be used to represent the video.video_folder
: This parameter specifies the directory where the video files are located. This follows the same format as NeVa’simage_folder
.
Feature |
Training |
Inference |
---|---|---|
Data parallelism |
Yes |
N/A |
Tensor parallelism |
Yes |
Yes |
Pipeline parallelism |
No |
No |
Sequence parallelism |
No |
No |
Activation checkpointing |
Yes (Uniform or Block) |
No |
FP32/TF32 |
Yes |
Yes (FP16 enabled by default) |
AMP/FP16 |
No |
Yes |
AMP/BF16 |
Yes |
No |
BF16 O2 |
Yes |
No |
TransformerEngine/FP8 |
No |
No |
Multi-GPU |
Yes |
Yes |
Multi-Node |
Yes |
Yes |
Inference deployment |
N/A |
|
SW stack support |
Slurm DeepOps/Base Command Manager/Base Command Platform |
Slurm DeepOps/Base Command Manager/Base Command Platform |
NVfuser |
No |
N/A |
Distributed Optimizer |
No |
N/A |
TorchInductor |
No |
N/A |
Flash Attention |
Yes |
N/A |