nemo_deploy.multimodal.nemo_multimodal_deployable
#
Module Contents#
Classes#
Triton inference server compatible deploy class for a NeMo multimodal model file. |
Functions#
Serializes dict to str. |
Data#
API#
- nemo_deploy.multimodal.nemo_multimodal_deployable.LOGGER = 'getLogger(...)'#
- nemo_deploy.multimodal.nemo_multimodal_deployable.dict_to_str(messages)#
Serializes dict to str.
- class nemo_deploy.multimodal.nemo_multimodal_deployable.NeMoMultimodalDeployable(
- nemo_checkpoint_filepath: str = None,
- tensor_parallel_size: int = 1,
- pipeline_parallel_size: int = 1,
- params_dtype: torch.dtype = torch.bfloat16,
- inference_batch_times_seqlen_threshold: int = 1000,
Bases:
nemo_deploy.ITritonDeployable
Triton inference server compatible deploy class for a NeMo multimodal model file.
- Parameters:
nemo_checkpoint_filepath (str) – path for the nemo checkpoint.
tensor_parallel_size (int) – tensor parallelism.
pipeline_parallel_size (int) – pipeline parallelism.
params_dtype (torch.dtype) – data type for model parameters.
inference_batch_times_seqlen_threshold (int) – sequence threshold.
Initialization
- generate(
- prompts: List[str],
- images: List[PIL.Image.Image],
- inference_params: Optional[megatron.core.inference.common_inference_params.CommonInferenceParams] = None,
- max_batch_size: int = 4,
- random_seed: Optional[int] = None,
Generates text based on the provided input prompts and images.
- Parameters:
prompts (List[str]) – A list of input strings.
images (List[Union[Image, List[Image]]]) – A list of input images.
inference_params (Optional[CommonInferenceParams]) – Parameters for controlling the inference process.
max_batch_size (int) – max batch size for inference. Defaults to 4.
random_seed (Optional[int]) – random seed for inference. Defaults to None.
- Returns:
A dictionary containing the generated results.
- Return type:
dict
- property get_triton_input#
- property get_triton_output#
- triton_infer_fn(**inputs: numpy.ndarray)#
- _infer_fn(
- prompts,
- images,
- temperature=1.0,
- top_k=1,
- top_p=0.0,
- num_tokens_to_generate=256,
- random_seed=None,
- max_batch_size=4,
Private helper function that handles the core inference logic shared between triton and ray inference.
- Parameters:
prompts (List[str]) – List of input prompts
images (List[Union[Image, List[Image]]]) – List of input images
temperature (float) – Sampling temperature
top_k (int) – Top-k sampling parameter
top_p (float) – Top-p sampling parameter
num_tokens_to_generate (int) – Maximum number of tokens to generate
random_seed (Optional[int]) – Random seed for inference
max_batch_size (int) – Maximum batch size for inference
- Returns:
sentences.
- Return type:
dict