Curation Parameters#

The following is a description of all available JSON parameters for the curation pipeline:

  • pipeline (string): The curation pipeline method, which can be either “split” or “shard”. We recommend using “split”.

  • args (JSON object): An object containing the following parameters:

    • generate_embeddings (Boolean): If True, the curator service will generate embeddings for each video. The default value is True.

    • generate_previews (Boolean): If True, the curator service will generate a preview image for each video. The default value is True.

    • generate_captions (Boolean): If True, the curator service will generate text captions for each video. The default value is True.

    • splitting_algorithm (string): Specifies the algorithm used to segment videos. The following options are available:

      • “transnetv2”: Segments videos using the TransNetV2 algorithm, which detects obvious cuts/transitions in videos.

      • “panda70m”: Segments videos using the PANDA70M algorithm, which detects more subtle transitions in videos, but is more computationally intensive than TransNetV2.

      • “fixed_stride”: Segments videos into clips of uniform length.

    • captioning_prompt_variant (string): The type of text prompt used to generate captions for each video. The following options are available:

      • “default”: A general text prompt is used.

      • “av”: A text prompt specific to recordings from an autonomous vehicle camera is used.

      • “av-surveillance”: A text prompt specific to recordings from a fixed camera (e.g. a surveillance camera) is used.

    • captioning_prompt_text (string): The text prompt used to generate captions for each video. If this string is empty, the captioning_prompt_variant parameter is used to determine the prompt. A non-empty string will override the captioning_prompt_variant parameter.

    • nvdec_for_clipping (integer): If splitting_algorithm is set to “panda70m”, this value indicates the number of GPUs to use for hardware decoding with the panda70m algorithm. The default value is 0 since this detracts from other GPU-intensive parts of the pipeline .

    • fixed_stride_split_duration (integer): If splitting_algorithm is set to fixed_stride, this value specifies the length of each video clip (in seconds). The default value is 10.

    • encoder (string): Specifies the video encoder, which can be either “libopenh264” or “h264_nvenc”. The default value is “libopenh264”.

    • use_hwaccel_for_transcoding (Boolean): If True, the curator service will use hardware acceleration for the decoding portion of transcoding. The default value is False.

    • captioning_algorithm (string): Specifies the captioning algorithm to use, which can be either “qwen” or “vila-32b”. The default value is “qwen”.

    • qwen_batch_size (integer): If captioning_algorithm is set to “qwen”, this value specifies the batch size to use for the Qwen algorithm. The default value is 8.

    • fp8_weights_for_qwen (Boolean): If captioning_algorithm is set to “qwen”, this value indicates whether to enable FP8 weight quantization for the Qwen algorithm.

    • limit (integer): Specifies the maximum number of videos to process.

    • limit_clips (integer): Specifies the maximum number of clips to process. The default value is 0 (i.e. there is no limit on the number of clips).

    • num_cpu_workers_download (integer): (Only applicable for S3 storage) Specifies the number of CPU workers for raw data downloaded into the pipeline. The default value is 4.

    • num_cpu_workers_clipwriter (integer): (Only applicable for S3 storage) Specifies the number of CPU workers for writing data to S3 storage. The default value is 8.