nemo_deploy.service.fastapi_interface_to_pytriton_multimodal#
Module Contents#
Classes#
TritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT. |
|
Common parameters for multimodal completions and chat requests. |
|
Represents a request for multimodal text completion. |
|
Image content in chat messages. |
|
Text content in chat messages. |
|
Represents a request for multimodal chat completion. |
Functions#
Health check endpoint to verify that the API is running. |
|
This method exposes endpoint “/triton_health”. |
|
Convert NumPy arrays in output to lists. |
|
run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list. |
|
Sends requests to |
|
Defines the multimodal completions endpoint and queries the model deployed on PyTriton server. |
|
Serializes dict to str. |
|
Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server. |
Data#
API#
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TritonSettings#
Bases:
pydantic_settings.BaseSettingsTritonSettings class that gets the values of TRITON_HTTP_ADDRESS and TRITON_PORT.
Initialization
- _triton_service_port: int = None#
- _triton_service_ip: str = None#
- property triton_service_port#
Returns the port number for the Triton service.
- property triton_service_ip#
Returns the IP address for the Triton service.
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.app = 'FastAPI(...)'#
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.triton_settings = 'TritonSettings(...)'#
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequest#
Bases:
pydantic.BaseModelCommon parameters for multimodal completions and chat requests.
.. attribute:: model
The name of the model to use for completion.
- Type:
str
.. attribute:: max_tokens
The maximum number of tokens to generate in the response.
- Type:
int
.. attribute:: temperature
Sampling temperature for randomness in generation.
- Type:
float
.. attribute:: top_p
Cumulative probability for nucleus sampling.
- Type:
float
.. attribute:: top_k
Number of highest-probability tokens to consider for sampling.
- Type:
int
.. attribute:: random_seed
Random seed for generation.
- Type:
Optional[int]
.. attribute:: max_batch_size
Maximum batch size for inference.
- Type:
int
- model: str = None#
- max_tokens: int = 50#
- temperature: float = 1.0#
- top_p: float = 0.0#
- top_k: int = 1#
- random_seed: Optional[int] = None#
- max_batch_size: int = 4#
- set_greedy_params()#
Validate parameters for greedy decoding.
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalCompletionRequest#
Bases:
nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequestRepresents a request for multimodal text completion.
.. attribute:: prompt
The input text to generate a response from.
- Type:
str
.. attribute:: image
Base64-encoded image or image URL.
- Type:
Optional[str]
.. attribute:: apply_chat_template
Whether to apply chat template.
- Type:
bool
- prompt: str = None#
- image: Optional[str] = None#
- apply_chat_template: bool = False#
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.ImageContent#
Bases:
pydantic.BaseModelImage content in chat messages.
- type: str = 'image_url'#
- image_url: dict = None#
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.TextContent#
Bases:
pydantic.BaseModelText content in chat messages.
- type: str = 'text'#
- text: str = None#
- class nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest#
Bases:
nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.BaseMultimodalRequestRepresents a request for multimodal chat completion.
.. attribute:: messages
A list of message dictionaries for chat completion.
- Type:
List[dict]
- messages: List[dict] = None#
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.health_check()#
Health check endpoint to verify that the API is running.
- Returns:
A dictionary indicating the status of the application.
- Return type:
dict
- async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.check_triton_health()#
This method exposes endpoint “/triton_health”.
This can be used to verify if Triton server is accessible while running the REST or FastAPI application. Verify by running: curl http://service_http_address:service_port/v1/triton_health and the returned status should inform if the server is accessible.
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.convert_numpy(obj)#
Convert NumPy arrays in output to lists.
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal._helper_fun(
- url,
- model,
- prompts,
- images,
- temperature,
- top_k,
- top_p,
- max_length,
- random_seed,
- max_batch_size,
- apply_chat_template,
run_in_executor doesn’t allow to pass kwargs, so we have this helper function to pass args as a list.
- async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.query_multimodal_async(
- *,
- url,
- model,
- prompts,
- images,
- temperature,
- top_k,
- top_p,
- max_length,
- random_seed,
- max_batch_size,
- apply_chat_template,
Sends requests to
NemoQueryMultimodalPytorch.query_multimodalin a non-blocking way.This allows the server to process concurrent requests. This way enables batching of requests in the underlying Triton server.
- async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.completions_v1( )#
Defines the multimodal completions endpoint and queries the model deployed on PyTriton server.
- nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.dict_to_str(messages)#
Serializes dict to str.
- async nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.chat_completions_v1(
- request: nemo_deploy.service.fastapi_interface_to_pytriton_multimodal.MultimodalChatCompletionRequest,
Defines the multimodal chat completions endpoint and queries the model deployed on PyTriton server.