Prepare inputs, generate captions, optionally enhance them, and produce preview images.
Use the pipeline stages or the example script flags to prepare captions and preview images.
Prepare caption inputs from each clip window. This step splits clips into fixed windows, formats model‑ready inputs for Qwen‑VL, and optionally stores per‑window mp4 bytes for previews.
Optionally generate .webp previews from each window’s mp4 bytes for quick QA and review.
Caption preparation parameters
Generate window‑level captions with a vision‑language model (Qwen‑VL). This stage reads clip.windows[*].qwen_llm_input created earlier and writes window.caption["qwen"].
Optionally enhance captions with a text‑based LLM (Qwen‑LM) to expand and refine descriptions. This stage reads window.caption["qwen"] and writes window.enhanced_caption["qwen_lm"].
Caption generation parameters
Generate lightweight .webp previews for each caption window to support review and QA workflows. A dedicated PreviewStage reads per-window mp4 bytes and encodes WebP using ffmpeg.
target_fps (default 1.0): Target frames per second for preview generation.target_height (default 240): Output height. Width auto-scales to preserve aspect ratio.compression_level (range 0–6, default 6): WebP compression level. 0 is lossless; higher values reduce size with lower quality.quality (range 0–100, default 50): WebP quality. Higher values increase quality and size.num_cpus_per_worker (default 4.0): Number of CPU threads mapped to ffmpeg -threads.verbose (default False): Emit more logs.Behavior notes:
target_fps or the input height is lower than target_height, the stage logs a warning and preview quality can degrade.ffmpeg fails, the stage logs the error and skips assigning preview bytes for that window.The stage writes .webp files under the previews/ directory that ClipWriterStage manages. Use the helper to resolve the path:
Refer to Save & Export for directory structure and file locations: Save & Export.
ffmpeg with WebP (libwebp) support must be available in the environment.target_fps or target_height to better match inputs.ffmpeg command and output to diagnose missing encoders.