Restrict Topics with Llama 3.1 NemoGuard 8B TopicControl NIM#

Learn how to restrict conversations to allowed topics using Llama 3.1 NemoGuard 8B TopicControl NIM.

By following this tutorial, you learn how to configure a set of allowed topics and interact with both on-topic and off-topic requests.

Prerequisites#

The NeMo Guardrails library installed with the nvidia extra.
A personal NVIDIA API key generated on https://build.nvidia.com/.

Configure Guardrails#

Create a configuration directory:
```
mkdir config
```
Create a config/config.yml file and add the following content.
```
models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct

  - type: topic_control
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-topic-control

rails:
  input:
    flows:
      - topic safety check input $model=topic_control
```
The config.yml file contains the models used by Guardrails in the models section and rails controlling when to use these models. The models section configures the type and name of each model, along with the engine used to perform LLM inference. The model with type main is used to generate responses to user queries. The rails section configures input and output rails. Topic control only operates on user input, so there is no output rail flow. For more information about guardrail configurations, refer to Configure Rails.

Create a config/prompts.yml file with the topic control prompt template.

prompts:
  - task: topic_safety_check_input $model=topic_control
    content: |
      You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines

      Guidelines for the user messages:
      - Do not answer questions related to personal opinions or advice on user's order, future recommendations
      - Do not provide any information on non-company products or services.
      - Do not answer enquiries unrelated to the company policies.
      - Do not answer questions asking for personal details about the agent or its creators.
      - Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.
      - If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.
      - Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.
      - allow user comments that are related to small talk and chit-chat.

You can customize the guidelines to match your specific use case and allowed topics. These guidelines are passed to the topic control model in the system prompt. The user request is placed in the user prompt. The topic control model responds with either on-topic or off-topic depending on whether the user input matches one of the topics in the prompt.

Run the Guardrails chat application#

Set the NVIDIA_API_KEY environment variable. Guardrails uses this to access models hosted on https://build.nvidia.com/.
```
$ export NVIDIA_API_KEY="..."
```

Run the interactive chat application.

  $ nemoguardrails chat --config config

  Starting the chat (Press Ctrl + C twice to quit) ...

  > _

Enter an off-topic request.

The prompt specifically instructs the model not to respond to questions about politics. The topic control input rail detects a policy violation and responds with the I'm sorry, I can't respond to that. refusal text. Because this input rail blocked the user input, an LLM response is not generated.
```
  > Which party should I vote for in the next election?
  I'm sorry, I can't respond to that.
```

Enter an on-topic request.

This request is in line with the topics in the prompt, so the topic control rail does not block the user input. The user input is passed to the Application LLM for generation.

> I'd like to cancel my subscription. Can I do this by phone or on the website?
I'd be happy to help you with canceling your subscription. You have a couple of options to do so, and I'll walk you
through them.

[The NeMo Guardrails library responds with instructions and information on subscription cancellations]

Import the NeMo Guardrails Library in Python#

Follow these steps to use the IPython REPL to import the NeMo Guardrails library and issue some requests:

Install the IPython REPL and run it to interpret the Python code below.
```
$ pip install ipython
$ ipython

In [1]:
```

Load the guardrails configuration you created earlier.

import asyncio
from nemoguardrails import LLMRails, RailsConfig

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

Verify the guardrails with an off-topic political question.

messages = [{"role": "user", "content": "Which party should I vote for in the next election?"}]
response = await rails.generate_async(messages=messages)
print(response['content'])

The model blocks the Application LLM from generating a response.

"I'm sorry, I can't respond to that."

Verify the guardrails with an on-topic question.

messages = [{"role": "user", "content": "I'd like to cancel my subscription. Can I do this by phone or on the website?"}]
response = await rails.generate_async(messages=messages)
print(response['content'])

The model responds with advice on how to cancel a subscription by phone or website.

Deploy Llama 3.1 NemoGuard 8B TopicControl NIM Locally#

This section shows how to run the NemoGuard 8B TopicControl model locally while still using the main model hosted on build.nvidia.com. The prerequisites are:

The NeMo Guardrails library installed.
A personal NVIDIA NGC API key with NVIDIA NGC Catalog and NVIDIA Public API Endpoints services access. For more information, refer to NGC API Keys in the NVIDIA GPU cloud documentation.
Docker installed.
NVIDIA Container Toolkit installed.
GPUs meeting the memory requirement specified in the NVIDIA Llama 3.1 NemoGuard 8B TopicControl NIM Model Profiles.

To run the Llama 3.1 NemoGuard 8B TopicControl in a Docker container, follow these steps:

Update the config.yml file you created earlier to point to a local NIM deployment rather than build.nvidia.com. The following configuration adds a base_url and model_name field under parameters, which tells the NeMo Guardrails library to make requests to the nvidia/llama-3.1-nemoguard-8b-topic-control model hosted at http://localhost:8123/v1. The Guardrails configuration must match the NIM Docker container configuration for them to communicate.

 models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct

  - type: topic_control
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-topic-control
    parameters:
      base_url: "http://localhost:8123/v1"
      model_name: "nvidia/llama-3.1-nemoguard-8b-topic-control"

rails:
  input:
    flows:
      - topic safety check input $model=topic_control

Start the Llama 3.1 Topic Control NIM Docker container. Store your personal NGC API key in the NGC_API_KEY environment variable, then pull and run the NIM Docker image locally.
1. Log in to your NVIDIA NGC account.
  
  Export your personal NGC API key to an environment variable.
```
$ export NGC_API_KEY="..."
```
  Log in to the NGC registry by running the following command.
```
$ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_API_KEY
```
2. Download the container.
```
$ docker pull nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control:1.10.1
```
3. Create a model cache directory on the host machine.
```
$ export LOCAL_NIM_CACHE=~/.cache/llama-nemotron-topic-guard
$ mkdir -p "${LOCAL_NIM_CACHE}"
$ chmod 700 "${LOCAL_NIM_CACHE}"
```
4. Run the container with the cache directory mounted.
  
  The -p argument maps the Docker container port 8000 to 8123 to avoid conflicts with other servers running locally.
```
$ docker run -d \
  --name llama-nemotron-topic-guard \
  --gpus=all --runtime=nvidia \
  --shm-size=64GB \
  -e NGC_API_KEY \
   -u $(id -u) \
   -v "${LOCAL_NIM_CACHE}:/opt/nim/.cache/" \
   -p 8123:8000 \
   nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control:1.10.1
```
  The container requires several minutes to start and download the model from NGC. You can monitor the progress by running the docker logs llama-nemotron-topic-guard command.
5. Confirm the service is ready to respond to inference requests.
```
$ curl -X GET http://localhost:8123/v1/health/ready
```
  This returns the following response.
```
{"object":"health-response","message":"ready"}
```
Follow the steps in Run the Guardrails Chat Application and Import the NeMo Guardrails Library in Python to run Guardrails with the local model.