nemo_deploy.multimodal.query_multimodal#

Module Contents#

Classes#

NemoQueryMultimodal

Sends a query to Triton for Multimodal inference.

NemoQueryMultimodalPytorch

Sends a query to Triton for Multimodal inference using PyTorch deployment.

API#

class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodal(url, model_name, model_type)#

Sends a query to Triton for Multimodal inference.

.. rubric:: Example

from nemo_deploy.multimodal import NemoQueryMultimodal

nq = NemoQueryMultimodal(url=”localhost”, model_name=”neva”, model_type=”neva”)

input_text = “Hi! What is in this image?” output = nq.query( input_text=input_text, input_media=”/path/to/image.jpg”, max_output_len=30, top_k=1, top_p=0.0, temperature=1.0, ) print(“prompts: “, prompts)

Initialization

setup_media(input_media)#

Setup input media.

frame_len(frames)#

Get frame len.

get_subsampled_frames(frames, subsample_len)#

Get subsampled frames.

query(
input_text,
input_media,
batch_size=1,
max_output_len=30,
top_k=1,
top_p=0.0,
temperature=1.0,
repetition_penalty=1.0,
num_beams=1,
init_timeout=60.0,
lora_uids=None,
)#

Run query.

class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodalPytorch(url, model_name)#

Sends a query to Triton for Multimodal inference using PyTorch deployment.

.. rubric:: Example

from nemo_deploy.multimodal import NemoQueryMultimodalPytorch

nq = NemoQueryMultimodalPytorch(url=”localhost”, model_name=”qwen”)

output = nq.query_multimodal( prompts=prompts, images=images, max_length=100, top_k=1, top_p=0.0, temperature=1.0, ) print(“output: “, output)

Initialization

query_multimodal(
prompts: List[str],
images: List[PIL.Image.Image],
max_length: Optional[int] = None,
max_batch_size: Optional[int] = None,
top_k: Optional[int] = None,
top_p: Optional[float] = None,
temperature: Optional[float] = None,
random_seed: Optional[int] = None,
init_timeout: float = 60.0,
)#

Query the Triton server synchronously for multimodal inference.

Parameters:
  • prompts (List[str]) – List of input text prompts.

  • images (List[Image.Image]) – List of input PIL Images.

  • max_length (Optional[int]) – Maximum number of tokens to generate.

  • max_batch_size (Optional[int]) – Maximum batch size for inference.

  • top_k (Optional[int]) – Limits to the top K tokens to consider at each step.

  • top_p (Optional[float]) – Limits to the top tokens within cumulative probability p.

  • temperature (Optional[float]) – Sampling temperature.

  • random_seed (Optional[int]) – Random seed for generation.

  • init_timeout (float) – Timeout for the connection.

Returns:

Dictionary containing generated sentences.

Return type:

dict