nemo_curator.stages.video.caption.caption_generation

View as Markdown

Module Contents

Classes

NameDescription
CaptionGenerationStageStage that generates captions for video windows using specified VL model.

API

class nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage(
model_dir: str = 'models/qwen',
model_variant: str = 'qwen',
caption_batch_size: int = 16,
fp8: bool = False,
max_output_tokens: int = 512,
model_does_preprocess: bool = False,
disable_mmcache: bool = False,
verbose: bool = False,
generate_stage2_caption: bool = False,
stage2_prompt_text: str | None = None,
name: str = 'caption_generation'
)
Dataclass

Bases: ProcessingStage[VideoTask, VideoTask]

Stage that generates captions for video windows using specified VL model.

This stage processes prepared video windows through the specified vision-language model to generate descriptive captions, with support for both synchronous and asynchronous processing.

caption_batch_size
int = 16
disable_mmcache
bool = False
fp8
bool = False
generate_stage2_caption
bool = False
max_output_tokens
int = 512
model_dir
str = 'models/qwen'
model_does_preprocess
bool = False
model_variant
str = 'qwen'
name
str = 'caption_generation'
stage2_prompt_text
str | None = None
verbose
bool = False
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.__post_init__() -> None
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage._assign_captions(
video: nemo_curator.tasks.video.Video,
mapping: dict[int, tuple[int, int]],
captions: collections.abc.Iterable[tuple[int, str]]
) -> None
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage._initialize_model() -> None
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.inputs() -> tuple[list[str], list[str]]
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.outputs() -> tuple[list[str], list[str]]
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.process(
task: nemo_curator.tasks.video.VideoTask
) -> nemo_curator.tasks.video.VideoTask
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.setup(
worker_metadata: nemo_curator.backends.base.WorkerMetadata | None = None
) -> None
nemo_curator.stages.video.caption.caption_generation.CaptionGenerationStage.setup_on_node(
node_info: nemo_curator.backends.base.NodeInfo,
worker_metadata: nemo_curator.backends.base.WorkerMetadata
) -> None

Download weights and initialize vLLM once per node to avoid torch.compile race conditions.