nemo_deploy.service.rest_model_api#
Module Contents#
Classes#
Functions#
check_triton_health. |
|
Data#
API#
- class nemo_deploy.service.rest_model_api.TritonSettings[source]#
Bases:
pydantic_settings.BaseSettings- _triton_service_port: int = None#
- _triton_service_ip: str = None#
- _triton_request_timeout: str = None#
- property triton_service_port#
- property triton_service_ip#
- property triton_request_timeout#
- property openai_format_response#
Retuns the response from Triton server in OpenAI compatible format if set to True.
- property output_generation_logits#
Retuns the generation logits along with text in Triton server output if set to True.
- nemo_deploy.service.rest_model_api.app = 'FastAPI(...)'#
- nemo_deploy.service.rest_model_api.triton_settings = 'TritonSettings(...)'#
- class nemo_deploy.service.rest_model_api.CompletionRequest(/, **data: typing.Any)[source]#
Bases:
pydantic.BaseModel- model: str = None#
- prompt: str = None#
- max_tokens: int = 512#
- temperature: float = 1.0#
- top_p: float = 0.0#
- top_k: int = 1#
- stream: bool = False#
- stop: str | None = None#
- frequency_penalty: float = 1.0#
- async nemo_deploy.service.rest_model_api.check_triton_health()[source]#
check_triton_health.
This method exposes endpoint “/triton_health” which can be used to verify if Triton server is accessible while running the REST or FastAPI application. Verify by running: curl http://service_http_address:service_port/v1/triton_health and the returned status should inform if the server is accessible.