nemo_deploy.deploy_pytriton#
Module Contents#
Classes#
Deploys any models to Triton Inference Server that implements ITritonDeployable interface in nemo_deploy. |
Data#
API#
- nemo_deploy.deploy_pytriton.LOGGER = 'getLogger(...)'#
- class nemo_deploy.deploy_pytriton.DeployPyTriton(
- triton_model_name: str,
- triton_model_version: int = 1,
- model=None,
- max_batch_size: int = 128,
- http_port: int = 8000,
- grpc_port: int = 8001,
- address='0.0.0.0',
- allow_grpc=True,
- allow_http=True,
- streaming=False,
- pytriton_log_verbose=0,
Bases:
nemo_deploy.deploy_base.DeployBaseDeploys any models to Triton Inference Server that implements ITritonDeployable interface in nemo_deploy.
Initialization
A nemo checkpoint or model is expected for serving on Triton Inference Server.
- Parameters:
triton_model_name (str) – Name for the service
triton_model_version (int) – Version for the service
checkpoint_path (str) – path of the nemo file
model (ITritonDeployable) – A model that implements the ITritonDeployable from nemo_deploy import ITritonDeployable
max_batch_size (int) – max batch size
port (int) – port for the Triton server
address (str) – http address for Triton server to bind.
- deploy()#
Deploys any models to Triton Inference Server.
- serve()#
Starts serving the model and waits for the requests.
- run()#
Starts serving the model asynchronously.
- stop()#
Stops serving the model.