nemo_curator.stages.video.caption.caption_enhancement
nemo_curator.stages.video.caption.caption_enhancement
Module Contents
Classes
Functions
Data
API
Dataclass
Bases: ProcessingStage[VideoTask, VideoTask]
Stage that enhances video captions using language models.
This stage takes existing captions and uses LLM (e.g. Qwen) to generate more detailed and refined descriptions of the video content.
fp8
max_output_tokens
model_batch_size
model_dir
model_variant
name
prompt_text
prompt_variant
verbose
Generate enhanced captions and assign them to video windows.
Check if window has valid caption data.
Prepare caption inputs from video clips and windows.
Download weights and initialize vLLM once per node to avoid torch.compile race conditions.