Architecture Introduction

ACE Agent is a collection of microservices to help build LLM driven scalable and customizable Conversational AI Agents. ACE Agent microservices along with the other ACE microservices provide the ability to orchestrate pipelines including complex multimodal conversational use cases. Let’s first introduce the ACE Agent microservices.

Chat Controller

The Chat Controller orchestrates the end-to-end bot pipeline for a speech IO based bot. It exposes a gRPC API that you can use to build conversational AI client applications. The Chat Controller creates a pipeline consisting of Automatic Speech Recognition (ASR), Chat Engine, Text-To-Speech (TTS), NVIDIA Omniverse Audio2Face Client, and manages the flow of audio or text data between these modules. It can use Redis message brokers for sending metadata and sending and receiving events. Optionally, it can connect to the Plugin server instead of the Chat Engine to allow direct integration of LangChain or other frameworks based agents.

Chat Engine

The Chat Engine is the key component of the NVIDIA ACE Agent pipeline that drives the conversation. It interacts with other components in the ACE Agent ecosystem to formulate a response for the user request. It is responsible for maintaining user context and conversational history. It can be deployed in CLI, server or event interface and you can choose an interface based on use cases for interaction with your bot.

The Chat Engine is based on the NVIDIA NeMo Guardrails with architecture described here. You can design conversational flow using Colang which is an event-based dialog modeling language to enable the design of highly flexible conversational interactions between a human and a bot. Since learning a new language is not an easy task, Colang was designed as a mix of natural language and Python. Under the hood, Colang scripts are interpreted by a Python runtime that is currently part of NeMo Guardrails (0.8.1).

Plugin Server

The Plugin server allows us to add use case/domain specific business logic in the bots. For example, you will need to call weather APIs for answering weather related queries and similarly the bot may need to access some internal database either hosted locally or remotely to provide an answer to a query.

The Plugin server is a FastAPI-based server that enables ACE Agent microservices to interact with custom business logic using a REST interface. It exposes a Swagger endpoint, which allows developers to easily write and validate plugins in a sandbox environment.

The Plugin server allows you to integrate your own agent built using LangChain or LlamaIndex or any other framework with a simple interface and allows you to add Speech AI and Avatar AI using ACE microservices.

NLP Server

The ACE Agent NLP server exposes unified RESTful interfaces for integrating various NLP models and tasks. The NLP server can deploy models using the NVIDIA Triton Inference Server and supports NVIDIA TensorRT, PyTorch, TensorFlow, ONNX, and Python backends. You can also deploy Hugging Face supported models using PyTriton or integrate externally deployed models by writing a custom model client using @model_api decorator.

ACE Agent microservices can be used in different pipelines based on use cases, let’s go over a few possible architecture choices.

Chat Engine Server Architecture

In this architecture, the Chat Controller microservice has a state machine also referred to as a speech pipeline manager, which manages the flow of audio or text data between different components. The Chat Controller generates a user transcript using ASR extension, passes it to the chat engine for generating bot response and generates audio for bot response using TTS extension. Optionally, it can connect to Vision AI, Redis Message broker, and Avatar AI depending on the Chat Controller pipeline configuration.

Chat Engine Server Architecture

The Chat Controller utilizes Rest APIs exposed by Chat Engine microservice in the server interface. The Chat Controller microservice can be deployed in speech_lite, speech, or avatar pipeline configs. For more details on pipeline configs, refer to Speech AI.

The Stock bot deployed in Quick Start Guide follows this architecture. For building a bot using this architecture, refer to the Building a bot using Colang 1.0 tutorial.

Plugin Server Architecture

In this architecture, the only difference from the Chat Engine Server Architecture described above, the Chat Engine microservice is replaced with the Plugin server microservice to allow easy integration of the agents and bots built using 3rd party frameworks such as LangChain, LlamaIndex, and so on.

Plugin Server Architecture

The Plugin server deploys agents and bots built using other frameworks with predefined Rest API schema for interaction with the Chat Controller. The Chat Controller microservice can be deployed in speech_lite, speech, or avatar pipeline configs. For more details on pipeline configs, refer to Speech AI.

The DuckDuckGo LangChain bot follows this architecture. For building bots using this architecture, follow tutorials Building LangChain based bots or Building a bot using NVIDIA RAG examples.

ACE Agent Event Architecture

ACE Agent Event Architecture

The ACE Agent event interface provides an asynchronous, event-based interface to interact with bots written in Colang 2.0 that allows bots to make full use of all features in UMIM (Unified Multimodal Interaction Management).

What is UMIM? UMIM was designed to help manage complex multimodal interactions between a bot and a user (multiple things can happen simultaneously, temporal alignment of actions of the bot/avatar/user are important, no strict turn taking, and so on). Furthermore, UMIM provides a layer of abstraction between the interaction design and the actual system implementing the necessary components. For more information, refer to the UMIM Documentation.

How does the ACE Agent Event interface work? The ACE Agent Event interface provides a UMIM compatible asynchronous, event-based interface (currently only Redis is supported). The event interface manages interactions using Redis streams by forwarding UMIM events to the Colang 2.0 runtime and publishing back events from the runtime to the event streams. The chat controller doesn’t utilize speech pipeline manager/state machine instead publish ASR transcripts on redis and retrieve bot response for TTS generation asynchronously.

The Chat Engine microservice gets deployed in the event interface. The Chat Controller microservice can be deployed in speech_umim or avatar_umim pipeline configs. For more details on pipeline configs, refer to Speech AI.

The Colang 2.0 sample bot follows this architecture. For building a bot using this architecture, follow the Building a Bot using Colang 2.0 and Event Interface tutorial.