Basic Concepts

Bot

A bot is a virtual agent that can conduct natural conversation with humans for a given domain. A bot uses natural language understanding (NLU) to comprehend what the user is asking for and then uses dialog management and domain specific Plugin Engine to either extract corresponding information and present it to the user or execute some action to satisfy the user’s command.

In ACE Agent, the primary bot related configurations are stored in a <bot_name>_bot_config.yaml file.

Chat Controller

The Chat Controller orchestrates the end-to-end bot pipeline for a speech IO based bot. It exposes a gRPC API that you can use to build conversational AI client applications. It also supports a Redis interface that can be used to build an application. The Chat Controller creates a pipeline consisting of Automatic Speech Recognition (ASR), Chat Engine, and Text-To-Speech (TTS) and manages the flow of audio or text data between these modules. The Chat Controller also supports storage of incoming and outgoing audio, textual data, relevant metadata, and is the interface to the NVIDIA Omniverse Audio2Face and Omniverse Animation microservices.

Chat Engine

The Chat Engine is the key component of the NVIDIA ACE Agent pipeline that drives the conversation. It interacts with other components in the ACE Agent ecosystem to formulate a response for the user request. It is responsible for maintaining user context and conversational history. It exposes different kinds of interfaces based on use cases for interaction with your bot.

The Chat Engine is based on the NVIDIA NeMo Guardrails. The architecture is described here.

Colang

Colang is a modeling language enabling the design of guardrails for conversational systems.

Configuring dialogs involves defining the rules that the Chat Engine must follow to generate a response. For modeling this dialog, NVIDIA’s proprietary language called Colang can be used. The same language is also used by NeMo Guardrails to build programmable rails for your LLM applications.

Colang based files have .co extension.

Context

In natural language, a conversation is a continuous process and in many cases a single sentence does not convey the full meaning of the user’s intention, it depends on the past sentences as well. Same is true for dialog management.

Users in many cases use past context while asking a query. ACE Agent needs to be able to understand it to be able to provide a valid response. There are three types of contexts that ACE Agent supports:

Multi-Turn - Multi-turn conversation is a case where the dialog continues on the same topic for multiple turns or iterations. In tech-support bots, multi-turn dialog is a very common scenario. In multi-turn dialog, either the bot asks the user multiple questions to understand the problem or can ask the user to go through multiple steps to resolve an issue. For example:

Multi-turn conversation

Missing Entity - In some cases, the user may not provide all the required entities in a query and refer to an entity previously used in the conversation. In the following example, in the second query, the user refers to the entity, that is, LA which was provided in the first query.

Missing entity conversation

Missing Intent - Another scenario is where the bot cannot find intent from a user query and it needs to look for it in earlier queries. In the following example, the user does not mention what he is asking about tomorrow and it can be only understood from the first query.

Missing intent conversation

Entity

To understand a user query and be able to provide an answer, the bot may need to extract a few keywords from the query or response which are typically nouns. These are called entities. For a weather bot location, name or date can be the entities because you need that information to query about forecasts from any weather service API.

ACE Agent uses the Joint Intent classification & Slot recognition model as well as the Named Entity Recognizer (NER) model to recognize entities. The models need to be trained with data showing examples of entities. Apart from the Joint Intent classification & Slot recognition and NER models, entities can also be tagged using lookup, or RegEx.

Intent

An intent represents the intention of the user expressed in a single query. For example, a weather bot will have different intents to support, for example, weather_query, temperature_query, or humidity_query.

To build a bot, you need to first identify all possible intents required to cover the use cases you target to cover. When you issue a query, ACE Agent passes the query through the domain-specific Riva Joint Intent & Slot Classification model or LLM models to recognize the intent and decides what action to take next.

Knowledge Base

Knowledge base is a collection of documents that can be provided to the ACE Agent so that when a user asks a relevant query, the ACE Agent can automatically find an appropriate answer from the knowledge base using the Information retrieval and Large Language models.

NeMo Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or “rails” for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.

The ACE Agent Chat Engine is based on the NVIDIA NeMo Guardrails. The architecture is described here.

NLP Server

The NLP server provides a single unified interface to integrate different NLP models in the dialog pipeline. It utilizes already production tested model servers like the NVIDIA Triton Inference Server while also allowing to integrate experimental custom models easily in the pipeline.

Plugin Server

For certain domains, the bot may need to get information from third party services to be able to provide an accurate answer to a user query, for example, a weather query or restaurant search.

For some domains, the bot may need to access some internal database either hosted locally or remotely to provide an answer to a query, for example, an order status query.

The Plugin server is a FastAPI-based server that enables ACE Agent to interact with third-party applications or APIs over a REST interface. It exposes a Swagger endpoint, which allows developers to easily write and validate Plugin servers in a sandbox environment.

Prompt

A prompt is an instruction to an LLM. If you have interacted with an LLM like ChatGPT, you have used prompts. Ideally, a prompt elicits an answer that is correct, adequate in form and content, and has the right length.

Essentially, prompting is about packaging your intent in a natural-language query that will cause the model to return the desired response.

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the task of accurately answering questions using some specified documents as a knowledge base (KB). RAG typically involves the following steps:

  1. Retrieval: A component called a retriever is responsible for fetching paragraphs or documents from the Knowledge Base which are relevant to the user’s question. The retrieved text is typically called the context.

  2. Generation: In this step, we create a prompt that includes the user’s question, the retrieved context and instructions on how to answer the question. An LLM uses this prompt to formulate an answer to the question that is informative and accurate.

In the context of ACE Agent, RAG is often used to answer out-of-domain questions, or questions for which the bot needs to fetch information from a set of ingested documents.

Slot

Slot is a key-value pair in which ACE Agent stores any information in its memory. ACE Agent extracts relevant slots from memory whenever needed to efficiently understand and answer user queries.

For example, if you ask a weather-bot How is the weather in Paris?, it will store Paris as the location-name slot in its memory. If the next user query is Will it rain tomorrow? ACE Agent will fetch the location-name slot from its memory and use it since the location information is not present in the last query.

In many cases, entities detected from user queries are stored by ACE Agent as slots as shown in the above example. But it is also possible to obtain slot information from slot rules or other sources, for example, the bot can use the GPS location of your mobile phone or car as a slot. If you register your name or address with the bot, those can be used as slots as well.

ACE Agent has two types of memory: short-term and long-term. Short-term slots are removed from ACE Agent’s memory after a specified number of turns or a specified duration, for example, for a weather query, location-name and date can be considered as short-term memory slots. After the relevant conversation is over, those slots will not be useful. But long-term slots are kept in memory throughout the session or even for the lifetime of the bot, for example, the address or name of the user of the bot has a very low chance of changing over time.

ACE Agent facilitates the detection and handling of slots using a file called slots.yaml. You can define the slots you want to support, as well as attributes such as memory and rules for slot tagging and/or slot manipulation.