`nemo_deploy.multimodal.nemo_multimodal_deployable`#

Module Contents#

Classes#

NeMoMultimodalDeployable

Triton inference server compatible deploy class for a NeMo multimodal model file.

Functions#

dict_to_str

Serializes dict to str.

Data#

LOGGER

API#

nemo_deploy.multimodal.nemo_multimodal_deployable.LOGGER = 'getLogger(...)'#

nemo_deploy.multimodal.nemo_multimodal_deployable.dict_to_str(messages)#: Serializes dict to str.

class nemo_deploy.multimodal.nemo_multimodal_deployable.NeMoMultimodalDeployable( nemo_checkpoint_filepath: str = None, tensor_parallel_size: int = 1, pipeline_parallel_size: int = 1, params_dtype: torch.dtype = torch.bfloat16, inference_batch_times_seqlen_threshold: int = 1000, )#

Bases: nemo_deploy.ITritonDeployable

Triton inference server compatible deploy class for a NeMo multimodal model file.

Parameters:

nemo_checkpoint_filepath (str) – path for the nemo checkpoint.
tensor_parallel_size (int) – tensor parallelism.
pipeline_parallel_size (int) – pipeline parallelism.
params_dtype (torch.dtype) – data type for model parameters.
inference_batch_times_seqlen_threshold (int) – sequence threshold.

Initialization

generate( prompts: List[str], images: List[PIL.Image], inference_params: Optional[megatron.core.inference.common_inference_params.CommonInferenceParams] = None, max_batch_size: int = 4, random_seed: Optional[int] = None, apply_chat_template: bool = False, ) → dict#

Generates text based on the provided input prompts and images.

Parameters:

prompts (List[str]) – A list of input strings.
images (List[Union[Image, List[Image]]]) – A list of input images.
inference_params (Optional[CommonInferenceParams]) – Parameters for controlling the inference process.
max_batch_size (int) – max batch size for inference. Defaults to 4.
random_seed (Optional[int]) – random seed for inference. Defaults to None.
apply_chat_template (bool) – Whether to apply chat template. Defaults to False.

Returns:

A dictionary containing the generated results.

Return type:

dict

apply_chat_template(messages, add_generation_prompt=True)#

Apply the chat template using the processor.

Works when model’s processor has chat template (typically chat models).

base64_to_image(image_base64)#: Convert base64-encoded image to PIL Image.

property get_triton_input#

property get_triton_output#

triton_infer_fn(**inputs: numpy.ndarray)#

_infer_fn( prompts, images, temperature=1.0, top_k=1, top_p=0.0, num_tokens_to_generate=256, random_seed=None, max_batch_size=4, apply_chat_template=False, )#

Private helper function that handles the core inference logic shared between triton and ray inference.

Parameters:

prompts (List[str]) – List of input prompts
images (List[str]) – List of input base64 encoded images
temperature (float) – Sampling temperature
top_k (int) – Top-k sampling parameter
top_p (float) – Top-p sampling parameter
num_tokens_to_generate (int) – Maximum number of tokens to generate
random_seed (Optional[int]) – Random seed for inference
max_batch_size (int) – Maximum batch size for inference
apply_chat_template (bool) – Whether to apply chat template

Returns:

sentences.

Return type:

dict

ray_infer_fn(inputs: dict)#

Ray-compatible inference function that takes a dictionary of inputs and returns a dictionary of outputs.

Parameters:

inputs (dict) –

Dictionary containing the following optional keys:

prompts (List[str]): List of input prompts
images (List[str]): List of input base64 encoded images
temperature (float): Sampling temperature (default: 1.0)
top_k (int): Top-k sampling parameter (default: 1)
top_p (float): Top-p sampling parameter (default: 0.0)
max_length (int): Maximum number of tokens to generate (default: 50)
random_seed (Optional[int]): Random seed for reproducibility (default: None)
max_batch_size (int): Maximum batch size for inference (default: 4)
apply_chat_template (bool): Whether to apply chat template (default: False)

Returns:

Dictionary containing: - sentences (List[str]): List of generated texts

Return type:

dict

nemo_deploy.multimodal.nemo_multimodal_deployable#

Module Contents#

Classes#

Functions#

Data#

API#

`nemo_deploy.multimodal.nemo_multimodal_deployable`#