Model Configurations

NVIDIA ACE Agent NLP server utilizes model information mentioned in model_config.yaml to host model servers and provide the unified API server for integrating models to the dialog pipeline. model_config.yaml group model information under different model server blocks.

On the top level, model_config.yaml has a single model_servers key and it contains a list of maps where each map represents a single instance of model_server. Each model server map should contain the name field mandatorily. Available options are triton, riva, nemo_llm, openai_llm, and custom.

Triton Inference Server

Triton Inference Server is an open source inference serving software which enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, Python, and more.

Triton Inference Server Keys

Key

Description

url

Triton gRPC URL [Mandatory]

http_url

Triton HTTP URL [Optional]

nlp_models

List of NLP models for deployment, skip if using an already hosted Triton server.

speech_models

List of ASR and TTS models, will only be deployed if the --speech flag is used.

Riva Skills Server

NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications, customized for your use case, and delivering real-time performance.

Riva Skills Server Keys

Key

Description

url

Riva Triton gRPC URL [Mandatory]

riva_url

Riva API gRPC Server URL, needed if using Riva translation models.

http_url

Riva Triton HTTP URL [Optional]

nlp_models

List of NLP models for deployment, skip if using an already hosted Riva Skills server.

speech_models

List of ASR and TTS models, will only be deployed if the --speech flag is used.

Recommend Speech Models - ASR model - nvidia/ucs-ms/rmir_asr_parakeet_1-1b_en_us_str_vad:2.15.0 - TTS model - nvidia/riva/rmir_tts_fastpitch_hifigan_en_us:2.13.0

Custom Models

The NLP server allows you to easily deploy any Hugging Face, NeMo or any other custom model by creating the @model_api and @pytriton decorators inference clients.

Custom Models Keys

Key

Description

nlp_models

@mode_api and @pytriton decorated inference model clients.

speech_models

Speech model inference clients will be only deployed if the --speech flag is used.