nemo_curator.models.prompt_formatter
Module Contents
Classes
Data
API
Create a message.
Parameters:
The text input to create a message for.
Returns: list[dict[str, Any]]
List of messages for the VLM model including the text prompt and video.
Generate inputs for video and text data based on prompt_variant.
Processes video and text inputs to create the input for the model. It handles both video and image inputs, decoding video and applying preprocessing if needed, and creates a structured input dictionary containing the processed prompt and multimodal data.
Parameters:
Text prompt to be included with the input.
Frames per second of the input video.
Data type to use for preprocessing the video/image inputs.
Number of frames to extract from the video. If 0, uses all frames.
Whether to flip the input video/image horizontally.
Pre-processed video inputs. If None, and video data is to be passed to the model, then video cannot be None.
whether the text prompt should be overridden
Returns: dict[str, Any]
dict containing:
- “prompt”: The processed text prompt with chat template applied
- “multi_modal_data”: Dictionary containing processed “image” and/or “video” inputs