nemo_deploy.multimodal.query_multimodal
#
Module Contents#
Classes#
Sends a query to Triton for Multimodal inference. |
|
Sends a query to Triton for Multimodal inference using PyTorch deployment. |
API#
- class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodal(url, model_name, model_type)#
Sends a query to Triton for Multimodal inference.
.. rubric:: Example
from nemo_deploy.multimodal import NemoQueryMultimodal
nq = NemoQueryMultimodal(url=”localhost”, model_name=”neva”, model_type=”neva”)
input_text = “Hi! What is in this image?” output = nq.query( input_text=input_text, input_media=”/path/to/image.jpg”, max_output_len=30, top_k=1, top_p=0.0, temperature=1.0, ) print(“prompts: “, prompts)
Initialization
- setup_media(input_media)#
Setup input media.
- frame_len(frames)#
Get frame len.
- get_subsampled_frames(frames, subsample_len)#
Get subsampled frames.
- query(
- input_text,
- input_media,
- batch_size=1,
- max_output_len=30,
- top_k=1,
- top_p=0.0,
- temperature=1.0,
- repetition_penalty=1.0,
- num_beams=1,
- init_timeout=60.0,
- lora_uids=None,
Run query.
- class nemo_deploy.multimodal.query_multimodal.NemoQueryMultimodalPytorch(url, model_name)#
Sends a query to Triton for Multimodal inference using PyTorch deployment.
.. rubric:: Example
from nemo_deploy.multimodal import NemoQueryMultimodalPytorch
nq = NemoQueryMultimodalPytorch(url=”localhost”, model_name=”qwen”)
output = nq.query_multimodal( prompts=prompts, images=images, max_length=100, top_k=1, top_p=0.0, temperature=1.0, ) print(“output: “, output)
Initialization
- query_multimodal(
- prompts: List[str],
- images: List[PIL.Image.Image],
- max_length: Optional[int] = None,
- max_batch_size: Optional[int] = None,
- top_k: Optional[int] = None,
- top_p: Optional[float] = None,
- temperature: Optional[float] = None,
- random_seed: Optional[int] = None,
- init_timeout: float = 60.0,
Query the Triton server synchronously for multimodal inference.
- Parameters:
prompts (List[str]) – List of input text prompts.
images (List[Image.Image]) – List of input PIL Images.
max_length (Optional[int]) – Maximum number of tokens to generate.
max_batch_size (Optional[int]) – Maximum batch size for inference.
top_k (Optional[int]) – Limits to the top K tokens to consider at each step.
top_p (Optional[float]) – Limits to the top tokens within cumulative probability p.
temperature (Optional[float]) – Sampling temperature.
random_seed (Optional[int]) – Random seed for generation.
init_timeout (float) – Timeout for the connection.
- Returns:
Dictionary containing generated sentences.
- Return type:
dict