Curation Parameters#

The following is a description of all available JSON parameters for the curation pipeline:

pipeline (string): The curation pipeline method, which can be either “split” or “shard”. We recommend using “split”.
args (JSON object): An object containing the following parameters:
- generate_embeddings (Boolean): If True, the curator service will generate embeddings for each video. The default value is True.
- generate_previews (Boolean): If True, the curator service will generate a preview image for each video. The default value is True.
- generate_captions (Boolean): If True, the curator service will generate text captions for each video. The default value is True.
- splitting_algorithm (string): Specifies the algorithm used to segment videos. The following options are available:
  - “transnetv2”: Segments videos using the TransNetV2 algorithm, which detects obvious cuts/transitions in videos.
  - “panda70m”: Segments videos using the PANDA70M algorithm, which detects more subtle transitions in videos, but is more computationally intensive than TransNetV2.
  - “fixed_stride”: Segments videos into clips of uniform length.
- captioning_prompt_variant (string): The type of text prompt used to generate captions for each video. The following options are available:
  - “default”: A general text prompt is used.
  - “av”: A text prompt specific to recordings from an autonomous vehicle camera is used.
  - “av-surveillance”: A text prompt specific to recordings from a fixed camera (e.g. a surveillance camera) is used.
- captioning_prompt_text (string): The text prompt used to generate captions for each video. If this string is empty, the captioning_prompt_variant parameter is used to determine the prompt. A non-empty string will override the captioning_prompt_variant parameter.
- nvdec_for_clipping (integer): If splitting_algorithm is set to “panda70m”, this value indicates the number of GPUs to use for hardware decoding with the panda70m algorithm. The default value is 0 since this detracts from other GPU-intensive parts of the pipeline .
- fixed_stride_split_duration (integer): If splitting_algorithm is set to fixed_stride, this value specifies the length of each video clip (in seconds). The default value is 10.
- encoder (string): Specifies the video encoder, which can be either “libopenh264” or “h264_nvenc”. The default value is “libopenh264”.
- use_hwaccel_for_transcoding (Boolean): If True, the curator service will use hardware acceleration for the decoding portion of transcoding. The default value is False.
- captioning_algorithm (string): Specifies the captioning algorithm to use, which can be either “qwen” or “vila-32b”. The default value is “qwen”.
- qwen_batch_size (integer): If captioning_algorithm is set to “qwen”, this value specifies the batch size to use for the Qwen algorithm. The default value is 8.
- fp8_weights_for_qwen (Boolean): If captioning_algorithm is set to “qwen”, this value indicates whether to enable FP8 weight quantization for the Qwen algorithm.
- limit (integer): Specifies the maximum number of videos to process.
- limit_clips (integer): Specifies the maximum number of clips to process. The default value is 0 (i.e. there is no limit on the number of clips).
- num_cpu_workers_download (integer): (Only applicable for S3 storage) Specifies the number of CPU workers for raw data downloaded into the pipeline. The default value is 4.
- num_cpu_workers_clipwriter (integer): (Only applicable for S3 storage) Specifies the number of CPU workers for writing data to S3 storage. The default value is 8.