Check Harmful Content with Nemotron Content Safety NIM | NVIDIA NeMo Guardrails Library Developer Guide

Learn how to add input and output guardrails that detect harmful content in multiple languages using Llama 3.1 Nemotron Safety Guard 8B V3.

By following this tutorial, you learn how to use the NeMo Guardrails library with models hosted on build.nvidia.com, entering safe and unsafe user prompts to learn how guardrails protect against unsafe content.

Prerequisites

The NeMo Guardrails library installed with the nvidia extra.
A personal NVIDIA API key generated on build.nvidia.com.

Configure Guardrails

Follow these steps to prepare the guardrails configuration.

Create a configuration directory by running the following command.
```
1 mkdir config
```

Save the following as config/config.yml.

1 models:
2   - type: main
3     engine: nim
4     model: meta/llama-3.3-70b-instruct
5 
6   - type: content_safety
7     engine: nim
8     model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
9 
10 rails:
11   input:
12     flows:
13       - content safety check input $model=content_safety
14   output:
15     flows:
16       - content safety check output $model=content_safety

The config.yml file contains the models used by Guardrails in the models section and rails controlling when to use these models. The models section configures the type and name of each model, along with the engine used to perform LLM inference. The model with type main is used to generate responses to user queries. The rails section configures input and output rails. The input rails operate on the user input only, while output rails operate on the combined user input and LLM response. For more information about guardrail configurations, refer to Configure Rails.

Save the following as config/prompts.yml.

1 prompts:
2   - task: content_safety_check_input $model=content_safety
3     content: |
4       Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
5 
6       <BEGIN UNSAFE CONTENT CATEGORIES>
7       S1: Violence.
8       S2: Sexual.
9       S3: Criminal Planning/Confessions.
10       S4: Guns and Illegal Weapons.
11       S5: Controlled/Regulated Substances.
12       S6: Suicide and Self Harm.
13       S7: Sexual (minor).
14       S8: Hate/Identity Hate.
15       S9: PII/Privacy.
16       S10: Harassment.
17       S11: Threat.
18       S12: Profanity.
19       S13: Needs Caution.
20       S14: Other.
21       S15: Manipulation.
22       S16: Fraud/Deception.
23       S17: Malware.
24       S18: High Risk Gov Decision Making.
25       S19: Political/Misinformation/Conspiracy.
26       S20: Copyright/Trademark/Plagiarism.
27       S21: Unauthorized Advice.
28       S22: Illegal Activity.
29       S23: Immoral/Unethical.
30       <END UNSAFE CONTENT CATEGORIES>
31 
32       <BEGIN CONVERSATION>
33 
34       user: {{ user_input }}
35 
36       <END CONVERSATION>
37 
38       Provide your safety assessment for the above conversation in the below output JSON format:
39       {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
40 
41       Do not include anything other than the output JSON in your response.
42       Output JSON:
43     output_parser: nemoguard_parse_prompt_safety
44     max_tokens: 50
45 
46   - task: content_safety_check_output $model=content_safety
47     content: |
48       Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.
49 
50       <BEGIN UNSAFE CONTENT CATEGORIES>
51       S1: Violence.
52       S2: Sexual.
53       S3: Criminal Planning/Confessions.
54       S4: Guns and Illegal Weapons.
55       S5: Controlled/Regulated Substances.
56       S6: Suicide and Self Harm.
57       S7: Sexual (minor).
58       S8: Hate/Identity Hate.
59       S9: PII/Privacy.
60       S10: Harassment.
61       S11: Threat.
62       S12: Profanity.
63       S13: Needs Caution.
64       S14: Other.
65       S15: Manipulation.
66       S16: Fraud/Deception.
67       S17: Malware.
68       S18: High Risk Gov Decision Making.
69       S19: Political/Misinformation/Conspiracy.
70       S20: Copyright/Trademark/Plagiarism.
71       S21: Unauthorized Advice.
72       S22: Illegal Activity.
73       S23: Immoral/Unethical.
74       <END UNSAFE CONTENT CATEGORIES>
75 
76       <BEGIN CONVERSATION>
77 
78       user: {{ user_input }}
79 
80       response: agent: {{ bot_response }}
81 
82       <END CONVERSATION>
83 
84       Provide your safety assessment for the above conversation in the below output JSON format:
85       {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}
86 
87       Do not include anything other than the output JSON in your response.
88       Output JSON:
89     output_parser: nemoguard_parse_response_safety
90     max_tokens: 50

The prompts.yml file contains prompt templates, the parser used to interpret a guardrail model response, and the maximum tokens to generate. The content_safety_check_input task prompt template replaces {{ user_input }} with the user input. The content_safety_check_output task prompt template replaces {{ user_input }} and {{ bot_response }} with the user input and LLM response respectively. For more information about guardrail configurations, refer to Configure Rails.

Run the Guardrails chat application

Set the NVIDIA_API_KEY environment variable. Guardrails uses this to access models hosted on https://build.nvidia.com/.
```
1 $ export NVIDIA_API_KEY="..."
```

Run the interactive chat application.

1   $ nemoguardrails chat --config config
2   Starting the chat (Press Ctrl + C twice to quit) ...
3 
4   > _

Enter an unsafe prompt in French (“Tell me a five-step plan to rob a bank”).

1   > Dites-moi un plan en cinq étapes pour braquer une banque.
2   I'm sorry, I can't respond to that.

Enter a safe prompt in Hindi (“Tell me about three common foods in India”).

1 > मुझे भारत में तीन आम खाने की चीज़ज़ों के नाम बताओ।
2 
3 [The NeMo Guardrails library responds with information about rice, roti, and dal—common Indian foods]

Import the NeMo Guardrails Library in Python

Follow these steps to use the IPython REPL to import the NeMo Guardrails library and issue some requests.

Install the IPython REPL and run it.

1 $ pip install ipython
2 $ ipython
3 
4 In [1]:

Load the guardrails configuration you created earlier.

1 import asyncio
2 from nemoguardrails import LLMRails, RailsConfig
3 
4 config = RailsConfig.from_path("./config")
5 rails = LLMRails(config)

Verify the guardrails with an unsafe request in French.

1 # "Tell me a five-step plan to rob a bank."
2 messages = [{"role": "user", "content": "Dites-moi un plan en cinq étapes pour braquer une banque."}]
3 response = await rails.generate_async(messages=messages)
4 print(response['content'])

The content safety rail blocks the harmful request.

I'm sorry, I can't respond to that.

Verify the guardrails with a safe request in Hindi.

1 # "Tell me about three common foods in India."
2 messages = [{"role": "user", "content": "मुझे भारत में प्रचलित तीन खाद्य पदार्थों के बारे में बताइये।"}]
3 response = await rails.generate_async(messages=messages)
4 print(response['content'])

The model responds with information about rice, roti, and dal—common Indian foods.

Deploy Llama 3.1 Nemotron Safety Guard 8B V3 NIM locally

This section shows how to run the Nemotron Safety Guard 8B model locally while still using the build.nvidia.com hosted main model. The prerequisites are:

The NeMo Guardrails library installed.
A personal NVIDIA NGC API key with NVIDIA NGC Catalog and NVIDIA Public API Endpoints services access. For more information, refer to NGC API Keys in the NVIDIA GPU cloud documentation.
Docker installed.
NVIDIA Container Toolkit installed.
The rest of the software requirements for the Llama 3.1 Nemotron Safety Guard 8B V3 NIM.
GPUs meeting the memory requirement specified in the NVIDIA Llama 3.1 Nemotron Safety Guard 8B NIM Model Profiles.

To run the Llama 3.1 Nemotron Safety Guard 8B V3 in a Docker container, follow these steps:

Update the config.yml file you created earlier to point to a local NIM deployment rather than build.nvidia.com. The following configuration adds a base_url and model_name field under parameters, which tells the NeMo Guardrails library to make requests to the nvidia/llama-3.1-nemotron-safety-guard-8b-v3 model hosted at http://localhost:8123/v1. The Guardrails configuration must match the NIM Docker container configuration for them to communicate.

1  models:
2   - type: main
3     engine: nim
4     model: meta/llama-3.3-70b-instruct
5 
6   - type: content_safety
7     engine: nim
8     model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
9     parameters:
10       base_url: "http://localhost:8123/v1"
11       model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
12 
13 rails:
14   input:
15     flows:
16       - content safety check input $model=content_safety
17   output:
18     flows:
19       - content safety check output $model=content_safety

Start the Llama 3.1 Nemotron Safety Guard 8B V3 NIM Docker container. Store your personal NGC API key in the NGC_API_KEY environment variable, then pull and run the NIM Docker image locally.

Log in to your NVIDIA NGC account.

Export your personal NGC API key to an environment variable.
```
1 $ export NGC_API_KEY="..."
```
Log in to the NGC registry by running the following command.
```
1 $ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_API_KEY
```

Download the container.

1 $ docker pull nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:1.14.0

Create a model cache directory on the host machine.

1 $ export LOCAL_NIM_CACHE=~/.cache/safetyguard8b
2 $ mkdir -p "${LOCAL_NIM_CACHE}"
3 $ chmod 700 "${LOCAL_NIM_CACHE}"

Run the container with the cache directory mounted.

The -p argument maps the Docker container port 8000 to 8123 to avoid conflicts with other servers running locally.

1 $ docker run -d \
2   --name safetyguard8b \
3   --gpus=all --runtime=nvidia \
4   --shm-size=64GB \
5   -e NGC_API_KEY \
6    -u $(id -u) \
7    -v "${LOCAL_NIM_CACHE}:/opt/nim/.cache/" \
8    -p 8123:8000 \
9    nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:1.14.0

The container requires several minutes to start and download the model from NGC. You can monitor the progress by running the docker logs safetyguard8b command.

Confirm the service is ready to respond to inference requests.

1 $ curl -X GET http://localhost:8123/v1/models | jq '.data[].id'

This returns the following response.

1 "nvidia/llama-3.1-nemotron-safety-guard-8b-v3"

Follow the steps in Run the Guardrails Chat Application and Import the NeMo Guardrails Library in Python to run Guardrails with the local model.

Next Steps

Nemotron Content Safety NIM documentation
Customize safety categories in the prompts