nemo_deploy.service.fastapi_interface_to_pytriton_multimodal#

Module Contents#

Classes#

TritonSettings

TritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT.

BaseMultimodalRequest

Common parameters for multimodal completions and chat requests.

MultimodalCompletionRequest

Represents a request for multimodal text completion.

ImageContent

Image content in chat messages.

TextContent

Text content in chat messages.

MultimodalChatCompletionRequest

Represents a request for multimodal chat completion.

Functions#

health_check

Health check endpoint to verify that the API is running.

check_triton_health

This method exposes endpoint “/triton_health”.

convert_numpy

Convert NumPy arrays in output to lists.

_helper_fun

run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list.

query_multimodal_async

Sends requests to NemoQueryMultimodalPytorch.query_multimodal in a non-blocking way.

completions_v1

Defines the multimodal completions endpoint and queries the model deployed on PyTriton server.

dict_to_str

Serializes dict to str.

chat_completions_v1

Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server.

Data#

API#

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TritonSettings#

Bases: pydantic_settings.BaseSettings

TritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT.

Initialization

_triton_service_port: int = None#
_triton_service_ip: str = None#
property triton_service_port#

Returns the port number for the Triton service.

property triton_service_ip#

Returns the IP address for the Triton service.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.app = 'FastAPI(...)'#
nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.triton_settings = 'TritonSettings(...)'#
class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest#

Bases: pydantic.BaseModel

Common parameters for multimodal completions and chat requests.

.. attribute:: model

The name of the model to use for completion.

Type:

str

.. attribute:: max_tokens

The maximum number of tokens to generate in the response.

Type:

int

.. attribute:: temperature

Sampling temperature for randomness in generation.

Type:

float

.. attribute:: top_p

Cumulative probability for nucleus sampling.

Type:

float

.. attribute:: top_k

Number of highest-probability tokens to consider for sampling.

Type:

int

.. attribute:: random_seed

Random seed for generation.

Type:

Optional[int]

.. attribute:: max_batch_size

Maximum batch size for inference.

Type:

int

model: str = None#
max_tokens: int = 50#
temperature: float = 1.0#
top_p: float = 0.0#
top_k: int = 1#
random_seed: Optional[int] = None#
max_batch_size: int = 4#
set_greedy_params()#

Validate parameters for greedy decoding.

class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalCompletionRequest#

Bases: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest

Represents a request for multimodal text completion.

.. attribute:: prompt

The input text to generate a response from.

Type:

str

.. attribute:: image

Base64-encoded image or image URL.

Type:

Optional[str]

.. attribute:: apply_chat_template

Whether to apply chat template.

Type:

bool

prompt: str = None#
image: Optional[str] = None#
apply_chat_template: bool = False#
class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.ImageContent#

Bases: pydantic.BaseModel

Image content in chat messages.

type: str = 'image_url'#
image_url: dict = None#
class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TextContent#

Bases: pydantic.BaseModel

Text content in chat messages.

type: str = 'text'#
text: str = None#
class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest#

Bases: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest

Represents a request for multimodal chat completion.

.. attribute:: messages

A list of message dictionaries for chat completion.

Type:

List[dict]

messages: List[dict] = None#
nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.health_check()#

Health check endpoint to verify that the API is running.

Returns:

A dictionary indicating the status of the application.

Return type:

dict

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.check_triton_health()#

This method exposes endpoint “/triton_health”.

This can be used to verify if Triton server is accessible while running the REST or FastAPI application. Verify by running: curl http://service_http_address:service_port/v1/triton_health and the returned status should inform if the server is accessible.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.convert_numpy(obj)#

Convert NumPy arrays in output to lists.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal._helper_fun(
url,
model,
prompts,
images,
temperature,
top_k,
top_p,
max_length,
random_seed,
max_batch_size,
apply_chat_template,
)#

run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.query_multimodal_async(
*,
url,
model,
prompts,
images,
temperature,
top_k,
top_p,
max_length,
random_seed,
max_batch_size,
apply_chat_template,
)#

Sends requests to NemoQueryMultimodalPytorch.query_multimodal in a non-blocking way.

This allows the server to process concurrent requests. This way enables batching of requests in the underlying Triton server.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.completions_v1(
request: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalCompletionRequest,
)#

Defines the multimodal completions endpoint and queries the model deployed on PyTriton server.

nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.dict_to_str(messages)#

Serializes dict to str.

async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.chat_completions_v1(
request: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest,
)#

Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server.