Restrict Topics with Llama 3.1 NemoGuard 8B TopicControl NIM#
Learn how to restrict conversations to allowed topics using Llama 3.1 NemoGuard 8B TopicControl NIM.
By following this tutorial, you learn how to configure a set of allowed topics and interact with both on-topic and off-topic requests.
Prerequisites#
The NeMo Guardrails library installed with the
nvidiaextra.A personal NVIDIA API key generated on https://build.nvidia.com/.
Configure Guardrails#
Create a configuration directory:
mkdir configCreate a
config/config.ymlfile and add the following content.models: - type: main engine: nim model: meta/llama-3.3-70b-instruct - type: topic_control engine: nim model: nvidia/llama-3.1-nemoguard-8b-topic-control rails: input: flows: - topic safety check input $model=topic_control
The
config.ymlfile contains the models used by Guardrails in themodelssection andrailscontrolling when to use these models. Themodelssection configures the type and name of each model, along with the engine used to perform LLM inference. The model with typemainis used to generate responses to user queries. Therailssection configuresinputandoutputrails. Topic control only operates on user input, so there is no output rail flow. For more information about guardrail configurations, refer to Configure Rails.Create a
config/prompts.ymlfile with the topic control prompt template.prompts: - task: topic_safety_check_input $model=topic_control content: | You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines Guidelines for the user messages: - Do not answer questions related to personal opinions or advice on user's order, future recommendations - Do not provide any information on non-company products or services. - Do not answer enquiries unrelated to the company policies. - Do not answer questions asking for personal details about the agent or its creators. - Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects. - If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction. - Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available. - allow user comments that are related to small talk and chit-chat.
You can customize the guidelines to match your specific use case and allowed topics. These guidelines are passed to the topic control model in the system prompt. The user request is placed in the user prompt. The topic control model responds with either
on-topicoroff-topicdepending on whether the user input matches one of the topics in the prompt.
Run the Guardrails chat application#
Set the NVIDIA_API_KEY environment variable. Guardrails uses this to access models hosted on https://build.nvidia.com/.
$ export NVIDIA_API_KEY="..."
Run the interactive chat application.
$ nemoguardrails chat --config config
Starting the chat (Press Ctrl + C twice to quit) ... > _
Enter an off-topic request.
The prompt specifically instructs the model not to respond to questions about politics. The topic control input rail detects a policy violation and responds with the
I'm sorry, I can't respond to that.refusal text. Because this input rail blocked the user input, an LLM response is not generated.> Which party should I vote for in the next election? I'm sorry, I can't respond to that.
Enter an on-topic request.
This request is in line with the topics in the prompt, so the topic control rail does not block the user input. The user input is passed to the Application LLM for generation.
> I'd like to cancel my subscription. Can I do this by phone or on the website? I'd be happy to help you with canceling your subscription. You have a couple of options to do so, and I'll walk you through them. [The NeMo Guardrails toolkit responds with instructions and information on subscription cancellations]
Import the NeMo Guardrails Library in Python#
Follow these steps to use the IPython REPL to import the NeMo Guardrails library and issue some requests:
Install the IPython REPL and run it to interpret the Python code below.
$ pip install ipython $ ipython In [1]:
Load the guardrails configuration you created earlier.
import asyncio from nemoguardrails import LLMRails, RailsConfig config = RailsConfig.from_path("./config") rails = LLMRails(config)
Verify the guardrails with an off-topic political question.
messages = [{"role": "user", "content": "Which party should I vote for in the next election?"}] response = await rails.generate_async(messages=messages) print(response['content'])
The model blocks the Application LLM from generating a response.
"I'm sorry, I can't respond to that."Verify the guardrails with an on-topic question.
messages = [{"role": "user", "content": "I'd like to cancel my subscription. Can I do this by phone or on the website?"}] response = await rails.generate_async(messages=messages) print(response['content'])
The model responds with advice on how to cancel a subscription by phone or website.
Deploy Llama 3.1 NemoGuard 8B TopicControl NIM Locally#
This section shows how to run the NemoGuard 8B TopicControl model locally while still using the main model hosted on build.nvidia.com. The prerequisites are:
The NeMo Guardrails library installed.
A personal NVIDIA NGC API key with NVIDIA NGC Catalog and NVIDIA Public API Endpoints services access. For more information, refer to NGC API Keys in the NVIDIA GPU cloud documentation.
Docker installed.
NVIDIA Container Toolkit installed.
GPUs meeting the memory requirement specified in the NVIDIA Llama 3.1 NemoGuard 8B TopicControl NIM Model Profiles.
To run the Llama 3.1 NemoGuard 8B TopicControl in a Docker container, follow these steps:
Update the
config.ymlfile you created earlier to point to a local NIM deployment rather than build.nvidia.com. The following configuration adds abase_urlandmodel_namefield underparameters, which tells the NeMo Guardrails toolkit to make requests to thenvidia/llama-3.1-nemoguard-8b-topic-controlmodel hosted athttp://localhost:8123/v1. The Guardrails configuration must match the NIM Docker container configuration for them to communicate.models: - type: main engine: nim model: meta/llama-3.3-70b-instruct - type: topic_control engine: nim model: nvidia/llama-3.1-nemoguard-8b-topic-control parameters: base_url: "http://localhost:8123/v1" model_name: "nvidia/llama-3.1-nemoguard-8b-topic-control" rails: input: flows: - topic safety check input $model=topic_control
Start the Llama 3.1 Topic Control NIM Docker container. Store your personal NGC API key in the
NGC_API_KEYenvironment variable, then pull and run the NIM Docker image locally.Log in to your NVIDIA NGC account.
Export your personal NGC API key to an environment variable.
$ export NGC_API_KEY="..."
Log in to the NGC registry by running the following command.
$ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_API_KEY
Download the container.
$ docker pull nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control:1.10.1
Create a model cache directory on the host machine.
$ export LOCAL_NIM_CACHE=~/.cache/llama-nemotron-topic-guard $ mkdir -p "${LOCAL_NIM_CACHE}" $ chmod 700 "${LOCAL_NIM_CACHE}"
Run the container with the cache directory mounted.
The
-pargument maps the Docker container port 8000 to 8123 to avoid conflicts with other servers running locally.$ docker run -d \ --name llama-nemotron-topic-guard \ --gpus=all --runtime=nvidia \ --shm-size=64GB \ -e NGC_API_KEY \ -u $(id -u) \ -v "${LOCAL_NIM_CACHE}:/opt/nim/.cache/" \ -p 8123:8000 \ nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control:1.10.1
The container requires several minutes to start and download the model from NGC. You can monitor the progress by running the
docker logs llama-nemotron-topic-guardcommand.Confirm the service is ready to respond to inference requests.
$ curl -X GET http://localhost:8123/v1/health/ready
This returns the following response.
{"object":"health-response","message":"ready"}
Follow the steps in Run the Guardrails Chat Application and Import the NeMo Guardrails Library in Python to run Guardrails with the local model.