Model Configurations#
NVIDIA ACE Agent NLP server utilizes model information mentioned in model_config.yaml
to host model servers and provide the unified API server for integrating models to the dialog pipeline. model_config.yaml
group model information under different model server blocks.
On the top level, model_config.yaml
has a single model_servers
key and it contains a list of maps where each map represents a single instance of model_server
. Each model server map should contain the name field mandatorily. Available options are triton
, riva
, and custom
.
Triton Inference Server#
Triton Inference Server is an open source inference serving software which enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, Python, and more.
Key |
Description |
---|---|
|
Triton gRPC URL [Mandatory] |
|
Triton HTTP URL [Optional] |
|
List of NLP models for deployment, skip if using an already hosted Triton server. |
|
List of ASR and TTS models, will only be deployed if the |
Riva Skills Server#
NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications, customized for your use case, and delivering real-time performance.
Key |
Description |
---|---|
|
Riva Triton gRPC URL [Mandatory] |
|
Riva API gRPC Server URL, needed if using Riva translation models. |
|
Riva Triton HTTP URL [Optional] |
|
List of NMT models for deployment, skip if using an already hosted Riva Skills server. |
|
List of ASR and TTS models, will only be deployed if the Recommend Speech Models
- ASR model - |
Custom Models#
The NLP server allows you to easily deploy any Hugging Face, NeMo or any other custom model by creating the @model_api
and @pytriton
decorators inference clients.
Key |
Description |
---|---|
|
|
|
Speech model inference clients will be only deployed if the |