nemo_deploy.service.rest_model_api#

Module Contents#

Classes#

Functions#

Data#

API#

class nemo_deploy.service.rest_model_api.TritonSettings[source]#

Bases: pydantic_settings.BaseSettings

_triton_service_port: int = None#
_triton_service_ip: str = None#
_triton_request_timeout: str = None#
property triton_service_port#
property triton_service_ip#
property triton_request_timeout#
property openai_format_response#

Retuns the response from Triton server in OpenAI compatible format if set to True.

property output_generation_logits#

Retuns the generation logits along with text in Triton server output if set to True.

nemo_deploy.service.rest_model_api.app = 'FastAPI(...)'#
nemo_deploy.service.rest_model_api.triton_settings = 'TritonSettings(...)'#
class nemo_deploy.service.rest_model_api.CompletionRequest(/, **data: typing.Any)[source]#

Bases: pydantic.BaseModel

model: str = None#
prompt: str = None#
max_tokens: int = 512#
temperature: float = 1.0#
top_p: float = 0.0#
top_k: int = 1#
stream: bool = False#
stop: str | None = None#
frequency_penalty: float = 1.0#
nemo_deploy.service.rest_model_api.health_check()[source]#
async nemo_deploy.service.rest_model_api.check_triton_health()[source]#

check_triton_health.

This method exposes endpoint “/triton_health” which can be used to verify if Triton server is accessible while running the REST or FastAPI application. Verify by running: curl http://service_http_address:service_port/v1/triton_health and the returned status should inform if the server is accessible.

nemo_deploy.service.rest_model_api.completions_v1(
request: nemo_deploy.service.rest_model_api.CompletionRequest,
)[source]#