bridge.models.nemotron_vl.nemotron_vl_utils#
Module Contents#
Functions#
Encode a PIL image to a base64-encoded data URL. |
|
Sample frames from a video and return base64-encoded data URLs along with metadata. |
|
Convert a path or URL to data URLs, handling videos, images, and remote files. |
|
Decode a base64-encoded image to a PIL image. |
|
Ensures the input_ids tensor contains the correct number of |
API#
- bridge.models.nemotron_vl.nemotron_vl_utils.encode_pil_to_jpeg_data_url(pil_image)#
Encode a PIL image to a base64-encoded data URL.
- bridge.models.nemotron_vl.nemotron_vl_utils.sample_video_frames_to_data_urls(
- video_path_local,
- fps=1,
- nframe=0,
- nframe_max=-1,
Sample frames from a video and return base64-encoded data URLs along with metadata.
- Parameters:
video_path_local β Path to the video file
fps β Target frames per second for sampling (if > 0, uses fps-based sampling)
nframe β Number of frames to sample (used if fps <= 0)
nframe_max β Maximum number of frames to sample
- Returns:
(frame_data_urls, metadata)
frame_data_urls: List of base64-encoded frame images
metadata: VideoMetadata dataclass containing info about the sampled frames:
total_num_frames: Number of sampled frames
fps: Effective frame rate of the sampled frames
duration: Duration covered by the sampled frames (in seconds)
video_backend: Backend used for video processing (βdecordβ)
- Return type:
tuple
- bridge.models.nemotron_vl.nemotron_vl_utils.maybe_path_or_url_to_data_urls(
- path_or_url,
- fps=1,
- nframe=0,
- nframe_max=-1,
Convert a path or URL to data URLs, handling videos, images, and remote files.
- Parameters:
path_or_url β Path or URL to the media file
fps β Target frames per second for video sampling (if > 0, uses fps-based sampling)
nframe β Number of frames to sample from video (used if fps <= 0)
nframe_max β Maximum number of frames to sample
- Returns:
(data_urls, metadata)
data_urls: List of base64-encoded data URLs
metadata: VideoMetadata dataclass with video metadata or None for images
- Return type:
tuple
- bridge.models.nemotron_vl.nemotron_vl_utils.pil_image_from_base64(b64_str: str) PIL.Image.Image#
Decode a base64-encoded image to a PIL image.
- bridge.models.nemotron_vl.nemotron_vl_utils.adjust_image_tokens(
- input_ids: torch.Tensor | Dict[str, torch.Tensor],
- num_tiles: int | List[int],
- img_start_token_id: int,
- img_end_token_id: int,
Ensures the input_ids tensor contains the correct number of
tokens as specified by num_tiles. This adjustment is necessary to bridge the gap between from HF processor to Megatron LLaVAModel. .. rubric:: Example
input_ids decoded may look like this System: β¦ User:β¦ Image 1:
β¦ # adjust number of tokens to be num_tiles[0] Image 2: β¦ # adjust number of tokens to be num_tiles[1] β¦ etc - Parameters:
input_ids β The input_ids tensor (output of HF processor) or a dictionary of tensors, one of the keys of which must be βinput_idsβ, and other tensors must have the same shape as input_ids
num_tiles β The number of
tokens to ensure, either a single int or a list of ints img_start_token_id β The token id of
img_end_token_id β The token id of
- Returns:
The input_ids tensor with the correct number of
tokens or a dictionary of tensors each with the same shape as input_ids