General Configurations#

The bot configuration file is the entrypoint for your bot. The section lists the options you can configure in the bot_config.yaml file. It contains all the key configurations needed for building different use cases. It is an extension of the configurations supported by NeMo Guardrails. All configurations supported by NeMo Guardrails are also supported by NVIDIA ACE Agent.

In addition, the following configurations are supported exclusively by NVIDIA ACE Agent.

Bot Name and Version#

Attention

It is mandatory to specify a bot name in the bot configuration file.

The bot name and version combination must be unique and will be used uniquely to identify your bot, when multiple bots are deployed together. Bot name with the version should look like {bot_name}_v{bot_version}. For example:

bot: llm_bot_v1

It is optional to specify a bot version as part of the bot configuration file. Without a version, the bot name should look like {bot_name}. For example:

bot: llm_bot

Configuring the NLP and Plugin Server Endpoints#

These configurations can be specified under the configs field in the bot_config.yaml file. The available configurations are listed below. All of them have a default value, and get utilized by default unless specified otherwise.

configs:
# The endpoint URL of the NLP server service
  nlp_server_url: http://localhost:9000

# The endpoint URL of the plugin service
  plugin_server_url: http://localhost:9002

# The interval in seconds used to poll nlp server and fm server for checking availability by Chat Engine. This polling is used to ensure a smooth fallback if either of these services goes down in the middle of conversation.

  health_check_interval: 5

Multilingual Configurations#

NVIDIA ACE Agent supports multi-language input and multi-language output.

If you are using Riva’s Neural Machine Translation model to do translation either at the query level or response level, then you need to provide the request_language and response_language code as shown below. The default values are en-US for all three.

language: Optional[str] = en-US
    request_language: Optional[str] = en-US
    response_language: Optional[str] = en-US

If the request_language is not equal to language, then the NMT model (if deployed) will be called to translate the query with the source language set to request_language and target language set to language. For example, you need to set the following configs to convert all incoming queries from Spanish to English:

language: Optional[str] = en-US
request_language: Optional[str] = es-US

If the response_language is not equal to language, then the NMT model (if deployed) will be called to translate the response text with the source language set to language and target language set to response_language. For example, you need to set the following configs to convert all outgoing responses from English to Spanish:

language: Optional[str] = en-US
response_language: Optional[str] = es-US

Tip

  • Refer to Spanish Weather Bot to see the above configurations in action.

  • The request and response language can also be provided at runtime with every query. Refer to the /chat endpoint schema.

LLM Model Configurations#

To configure the backbone LLM model that will be used by the guardrails configuration, set the models key:

models:
- type: main
    engine: openai
    model: gpt-4-turbo

The meaning of the attributes are as follows:

  • type is set to main indicating the main LLM model

  • engine is the LLM provider.

  • model is the name of the model. The recommended option is gpt-4-turbo.

  • parameters are any additional parameters, for example, temperature, top_k, and so on.

Only one LLM model should be specified in the models key.

All NeMo Guardrails supported LLM models and their configurations are also supported by NVIDIA ACE Agent.

LangChain Models#

You can use any LLM provider from the cloud that is supported by LangChain by mentioning its name in the engine field. For example:

  • ai21

  • aleph_alpha

  • anthropic

  • anyscale

  • azure

  • cohere

  • huggingface_endpoint

  • huggingface_hub

  • openai

  • self_hosted

  • self_hosted_hugging_face

Refer to the LangChain documentation for the complete list.

NVIDIA AI Endpoints#

You can use any available NVIDIA API Catalog LLM as part of the NVIDIA AI Endpoints model type. Set the engine type field to nim to plug it in with your bot and provide the model name along with any parameters needed.

models:
  - type: main
    engine: nim
    model: <MODEL_NAME>

Visit the NVIDIA API Catalog website to browse the LLMs and try them out.

For using local NIM deployment:

models:
  - type: main
    engine: nim
    model: <MODEL_NAME>
    parameters:
      base_url: <NIM_ENDPOINT_URL>

To use any of the providers, you will need to install additional packages. When you first try to use a configuration with a new provider, you will typically receive an error from LangChain that will instruct you on what package should be installed.

Important

You can instantiate any of the LLM providers above, depending on the capabilities of the model, some will work better than others with the NVIDIA ACE Agent toolkit. The toolkit includes prompts that have been optimized for certain types of models (for example, openai). For others, you can optimize the prompts yourself.

The following models have been tested with the Colang 2.0-beta version.

OpenAI models:

  • gpt-3.5-turbo-instruct

  • gGpt-3.5-turbo

  • gpt-4-turbo

  • gpt-4o

  • gpt-4o-mini

NIM models:

  • meta/llama3-8b-instruct

  • meta/llama3-70b-instruct

  • meta/llama-3.1-8b-instruct

  • meta/llama-3.1-70b-instruct

Custom LLM Models using the LangChain Wrapper#

To register a custom LLM provider using the LangChain wrapper, you need to create a class that inherits from the BaseLanguageModel class provided by LangChain and register it using the register_llm_provider utility method provided by NeMo Guardrails. This code can be provided in the initialization file named config.py. NVIDIA ACE Agent will automatically execute it before starting the bot.

from langchain.base_language import BaseLanguageModel
from nemoguardrails.llm.providers import register_llm_provider


class CustomLLM(BaseLanguageModel):
    """A custom LLM."""


register_llm_provider("custom_llm", CustomLLM)

You can then use the custom LLM provider in your bot configuration file:

models:
- type: main
    engine: custom_llm

LLM Prompts#

You can customize the prompts that are used for the various LLM tasks (for example, generate user intent, generate next step, generate bot message) using the prompts key. For example, to override the prompt used for the generate_user_intent task for the OpenAI gpt-4-turbo model:

prompts:
- task: generate_user_intent
    models:
    - gpt-4-turbo
    content: |-
    <<This is a placeholder for a custom prompt for generating the user intent>>

The full list of predefined tasks used by the NVIDIA ACE Agent toolkit include the following:

  • general generates the next bot message, when no canonical forms are used.

  • generate_user_intent generates the canonical user message.

  • generate_next_steps generates the next thing the bot should do or say.

  • generate_bot_message generates the next bot message.

  • generate_value generates the value for a context variable (in other words, extract the user-provided values).

  • fact_checking checks the facts from the bot response against the provided evidence.

  • jailbreak_check checks if there is an attempt to break moderation policies.

  • output_moderation checks if the bot response is harmful, unethical, or illegal.

  • check_hallucination checks if the bot response is a hallucination.

You can check the default prompts in the prompts folder of the NeMo Guardrails repository.