nemo_curator.stages.video.caption.caption_generation
nemo_curator.stages.video.caption.caption_generation
Module Contents
Classes
API
Dataclass
Bases: ProcessingStage[VideoTask, VideoTask]
Stage that generates captions for video windows using specified VL model.
This stage processes prepared video windows through the specified vision-language model to generate descriptive captions, with support for both synchronous and asynchronous processing.
caption_batch_size
disable_mmcache
fp8
generate_stage2_caption
max_output_tokens
model_dir
model_does_preprocess
model_variant
name
stage2_prompt_text
verbose
Download weights and initialize vLLM once per node to avoid torch.compile race conditions.