`nemo_deploy.service.fastapi_interface_to_pytriton_multimodal`#

Module Contents#

Classes#

`TritonSettings`	TritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT.
`BaseMultimodalRequest`	Common parameters for multimodal completions and chat requests.
`MultimodalCompletionRequest`	Represents a request for multimodal text completion.
`ImageContent`	Image content in chat messages.
`TextContent`	Text content in chat messages.
`MultimodalChatCompletionRequest`	Represents a request for multimodal chat completion.

Functions#

`health_check`	Health check endpoint to verify that the API is running.
`check_triton_health`	This method exposes endpoint “/triton_health”.
`convert_numpy`	Convert NumPy arrays in output to lists.
`_helper_fun`	run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list.
`query_multimodal_async`	Sends requests to `NemoQueryMultimodalPytorch.query_multimodal` in a non-blocking way.
`completions_v1`	Defines the multimodal completions endpoint and queries the model deployed on PyTriton server.
`dict_to_str`	Serializes dict to str.
`chat_completions_v1`	Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server.

Data#

`app`
`triton_settings`

API#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TritonSettings#

Bases: pydantic_settings.BaseSettings

TritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT.

Initialization

_triton_service_port: int = None#

_triton_service_ip: str = None#

property triton_service_port#: Returns the port number for the Triton service.

property triton_service_ip#: Returns the IP address for the Triton service.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.app = 'FastAPI(...)'#

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.triton_settings = 'TritonSettings(...)'#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest#

Bases: pydantic.BaseModel

Common parameters for multimodal completions and chat requests.

.. attribute:: model

The name of the model to use for completion.

Type:: str

.. attribute:: max_tokens

The maximum number of tokens to generate in the response.

Type:: int

.. attribute:: temperature

Sampling temperature for randomness in generation.

Type:: float

.. attribute:: top_p

Cumulative probability for nucleus sampling.

Type:: float

.. attribute:: top_k

Number of highest-probability tokens to consider for sampling.

Type:: int

.. attribute:: random_seed

Random seed for generation.

Type:: Optional[int]

.. attribute:: max_batch_size

Maximum batch size for inference.

Type:: int

model: str = None#

max_tokens: int = 50#

temperature: float = 1.0#

top_p: float = 0.0#

top_k: int = 1#

random_seed: Optional[int] = None#

max_batch_size: int = 4#

set_greedy_params()#: Validate parameters for greedy decoding.

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalCompletionRequest#

Bases: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest

Represents a request for multimodal text completion.

.. attribute:: prompt

The input text to generate a response from.

Type:: str

.. attribute:: image

Base64-encoded image or image URL.

Type:: Optional[str]

.. attribute:: apply_chat_template

Whether to apply chat template.

Type:: bool

prompt: str = None#

image: Optional[str] = None#

apply_chat_template: bool = False#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.ImageContent#

Bases: pydantic.BaseModel

Image content in chat messages.

type: str = 'image_url'#

image_url: dict = None#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TextContent#

Bases: pydantic.BaseModel

Text content in chat messages.

type: str = 'text'#

text: str = None#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest#

Bases: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest

Represents a request for multimodal chat completion.

.. attribute:: messages

A list of message dictionaries for chat completion.

Type:: List[dict]

messages: List[dict] = None#

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.health_check()#

Health check endpoint to verify that the API is running.

Returns:: A dictionary indicating the status of the application.
Return type:: dict

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.check_triton_health()#

This method exposes endpoint “/triton_health”.

This can be used to verify if Triton server is accessible while running the REST or FastAPI application. Verify by running: curl http://service_http_address:service_port/v1/triton_health and the returned status should inform if the server is accessible.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.convert_numpy(obj)#: Convert NumPy arrays in output to lists.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal._helper_fun( url, model, prompts, images, temperature, top_k, top_p, max_length, random_seed, max_batch_size, apply_chat_template, )#: run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.query_multimodal_async( *, url, model, prompts, images, temperature, top_k, top_p, max_length, random_seed, max_batch_size, apply_chat_template, )#

Sends requests to NemoQueryMultimodalPytorch.query_multimodal in a non-blocking way.

This allows the server to process concurrent requests. This way enables batching of requests in the underlying Triton server.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.completions_v1( request: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalCompletionRequest, )#: Defines the multimodal completions endpoint and queries the model deployed on PyTriton server.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.dict_to_str(messages)#: Serializes dict to str.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.chat_completions_v1( request: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest, )#: Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal#

Module Contents#

Classes#

Functions#

Data#

API#

`nemo_deploy.service.fastapi_interface_to_pytriton_multimodal`#