`nemo_deploy.multimodal.query_multimodal`#

Module Contents#

Classes#

`NemoQueryMultimodal`	Sends a query to Triton for Multimodal inference.
`NemoQueryMultimodalPytorch`	Sends a query to Triton for Multimodal inference using PyTorch deployment.

API#

class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodal(url, model_name, model_type)#

Sends a query to Triton for Multimodal inference.

.. rubric:: Example

from nemo_deploy.multimodal import NemoQueryMultimodal

nq = NemoQueryMultimodal(url=”localhost”, model_name=”neva”, model_type=”neva”)

input_text = “Hi! What is in this image?” output = nq.query( input_text=input_text, input_media=”/path/to/image.jpg”, max_output_len=30, top_k=1, top_p=0.0, temperature=1.0, ) print(“prompts: “, prompts)

Initialization

setup_media(input_media)#: Setup input media.

frame_len(frames)#: Get frame len.

get_subsampled_frames(frames, subsample_len)#: Get subsampled frames.

query( input_text, input_media, batch_size=1, max_output_len=30, top_k=1, top_p=0.0, temperature=1.0, repetition_penalty=1.0, num_beams=1, init_timeout=60.0, lora_uids=None, )#: Run query.

class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodalPytorch(url, model_name)#

Sends a query to Triton for Multimodal inference using PyTorch deployment.

.. rubric:: Example

from nemo_deploy.multimodal import NemoQueryMultimodalPytorch import base64

nq = NemoQueryMultimodalPytorch(url=”localhost”, model_name=”qwen”)

Encode image to base64#

with open(“image.jpg”, “rb”) as f: image_base64 = base64.b64encode(f.read()).decode(‘utf-8’)

output = nq.query_multimodal( prompts=[“Describe this image”], images=[image_base64], max_length=100, top_k=1, top_p=0.0, temperature=1.0, ) print(“output: “, output)

Initialization

query_multimodal( prompts: List[str], images: List[str], max_length: Optional[int] = None, max_batch_size: Optional[int] = None, top_k: Optional[int] = None, top_p: Optional[float] = None, temperature: Optional[float] = None, random_seed: Optional[int] = None, apply_chat_template: Optional[bool] = None, init_timeout: float = 60.0, )#

Query the Triton server synchronously for multimodal inference.

Parameters:

prompts (List[str]) – List of input text prompts.
images (List[str]) – List of base64-encoded image strings.
max_length (Optional[int]) – Maximum number of tokens to generate.
max_batch_size (Optional[int]) – Maximum batch size for inference.
top_k (Optional[int]) – Limits to the top K tokens to consider at each step.
top_p (Optional[float]) – Limits to the top tokens within cumulative probability p.
temperature (Optional[float]) – Sampling temperature.
random_seed (Optional[int]) – Random seed for generation.
apply_chat_template (Optional[bool]) – Whether to apply chat template.
init_timeout (float) – Timeout for the connection.

Returns:

Dictionary containing generated sentences.

Return type:

dict

nemo_deploy.multimodal.query_multimodal#

Module Contents#

Classes#

API#

Encode image to base64#

`nemo_deploy.multimodal.query_multimodal`#