`nemo_deploy.deploy_pytriton`#

Module Contents#

Classes#

DeployPyTriton

Deploys any models to Triton Inference Server that implements ITritonDeployable interface in nemo_deploy.

Data#

LOGGER

API#

nemo_deploy.deploy_pytriton.LOGGER = 'getLogger(...)'#

class nemo_deploy.deploy_pytriton.DeployPyTriton( triton_model_name: str, triton_model_version: int = 1, model=None, max_batch_size: int = 128, http_port: int = 8000, grpc_port: int = 8001, address='0.0.0.0', allow_grpc=True, allow_http=True, streaming=False, pytriton_log_verbose=0, )#

Bases: nemo_deploy.deploy_base.DeployBase

Deploys any models to Triton Inference Server that implements ITritonDeployable interface in nemo_deploy.

Initialization

A nemo checkpoint or model is expected for serving on Triton Inference Server.

Parameters:

triton_model_name (str) – Name for the service
triton_model_version (int) – Version for the service
checkpoint_path (str) – path of the nemo file
model (ITritonDeployable) – A model that implements the ITritonDeployable from nemo_deploy import ITritonDeployable
max_batch_size (int) – max batch size
port (int) – port for the Triton server
address (str) – http address for Triton server to bind.

deploy()#: Deploys any models to Triton Inference Server.

serve()#: Starts serving the model and waits for the requests.

run()#: Starts serving the model asynchronously.

stop()#: Stops serving the model.

nemo_deploy.deploy_pytriton#

Module Contents#

Classes#

Data#

API#

`nemo_deploy.deploy_pytriton`#