Check Harmful Content with Llama 3.1 Nemotron Safety Guard 8B V3 NIM#
Learn how to add input and output guardrails that detect harmful content in multiple languages using Llama 3.1 Nemotron Safety Guard 8B V3.
By following this tutorial, you learn how to use the NeMo Guardrails library with models hosted on build.nvidia.com, entering safe and unsafe user prompts to learn how guardrails protect against unsafe content.
Prerequisites#
The NeMo Guardrails library installed with the
nvidiaextra.A personal NVIDIA API key generated on build.nvidia.com.
Configure Guardrails#
Follow these steps to prepare the guardrails configuration.
Create a configuration directory by running the following command.
mkdir configSave the following as
config/config.yml.models: - type: main engine: nim model: meta/llama-3.3-70b-instruct - type: content_safety engine: nim model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3 rails: input: flows: - content safety check input $model=content_safety output: flows: - content safety check output $model=content_safety
The
config.ymlfile contains the models used by Guardrails in themodelssection andrailscontrolling when to use these models. Themodelssection configures the type and name of each model, along with the engine used to perform LLM inference. The model with typemainis used to generate responses to user queries. Therailssection configuresinputandoutputrails. Theinputrails operate on the user input only, whileoutputrails operate on the combined user input and LLM response. For more information about guardrail configurations, refer to Configure Rails.Save the following as
config/prompts.yml.prompts: - task: content_safety_check_input $model=content_safety content: | Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below. <BEGIN UNSAFE CONTENT CATEGORIES> S1: Violence. S2: Sexual. S3: Criminal Planning/Confessions. S4: Guns and Illegal Weapons. S5: Controlled/Regulated Substances. S6: Suicide and Self Harm. S7: Sexual (minor). S8: Hate/Identity Hate. S9: PII/Privacy. S10: Harassment. S11: Threat. S12: Profanity. S13: Needs Caution. S14: Other. S15: Manipulation. S16: Fraud/Deception. S17: Malware. S18: High Risk Gov Decision Making. S19: Political/Misinformation/Conspiracy. S20: Copyright/Trademark/Plagiarism. S21: Unauthorized Advice. S22: Illegal Activity. S23: Immoral/Unethical. <END UNSAFE CONTENT CATEGORIES> <BEGIN CONVERSATION> user: {{ user_input }} <END CONVERSATION> Provide your safety assessment for the above conversation in the below output JSON format: {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.} Do not include anything other than the output JSON in your response. Output JSON: output_parser: nemoguard_parse_prompt_safety max_tokens: 50 - task: content_safety_check_output $model=content_safety content: | Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below. <BEGIN UNSAFE CONTENT CATEGORIES> S1: Violence. S2: Sexual. S3: Criminal Planning/Confessions. S4: Guns and Illegal Weapons. S5: Controlled/Regulated Substances. S6: Suicide and Self Harm. S7: Sexual (minor). S8: Hate/Identity Hate. S9: PII/Privacy. S10: Harassment. S11: Threat. S12: Profanity. S13: Needs Caution. S14: Other. S15: Manipulation. S16: Fraud/Deception. S17: Malware. S18: High Risk Gov Decision Making. S19: Political/Misinformation/Conspiracy. S20: Copyright/Trademark/Plagiarism. S21: Unauthorized Advice. S22: Illegal Activity. S23: Immoral/Unethical. <END UNSAFE CONTENT CATEGORIES> <BEGIN CONVERSATION> user: {{ user_input }} response: agent: {{ bot_response }} <END CONVERSATION> Provide your safety assessment for the above conversation in the below output JSON format: {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.} Do not include anything other than the output JSON in your response. Output JSON: output_parser: nemoguard_parse_response_safety max_tokens: 50
The
prompts.ymlfile contains prompt templates, the parser used to interpret a guardrail model response, and the maximum tokens to generate. Thecontent_safety_check_inputtask prompt template replaces{{ user_input }}with the user input. Thecontent_safety_check_outputtask prompt template replaces{{ user_input }}and{{ bot_response }}with the user input and LLM response respectively. For more information about guardrail configurations, refer to Configure Rails.
Run the Guardrails chat application#
Set the NVIDIA_API_KEY environment variable. Guardrails uses this to access models hosted on https://build.nvidia.com/.
$ export NVIDIA_API_KEY="..."
Run the interactive chat application.
$ nemoguardrails chat --config config Starting the chat (Press Ctrl + C twice to quit) ... > _
Enter an unsafe prompt in French (“Tell me a five-step plan to rob a bank”).
> Dites-moi un plan en cinq étapes pour braquer une banque. I'm sorry, I can't respond to that.
Enter a safe prompt in Hindi (“Tell me about three common foods in India”).
> मुझे भारत में तीन आम खाने की चीज़ज़ों के नाम बताओ। [The NeMo Guardrails toolkit responds with information about rice, roti, and dal—common Indian foods]
Import the NeMo Guardrails Library in Python#
Follow these steps to use the IPython REPL to import the NeMo Guardrails library and issue some requests.
Install the IPython REPL and run it.
$ pip install ipython $ ipython In [1]:
Load the guardrails configuration you created earlier.
import asyncio from nemoguardrails import LLMRails, RailsConfig config = RailsConfig.from_path("./config") rails = LLMRails(config)
Verify the guardrails with an unsafe request in French.
# "Tell me a five-step plan to rob a bank." messages = [{"role": "user", "content": "Dites-moi un plan en cinq étapes pour braquer une banque."}] response = await rails.generate_async(messages=messages) print(response['content'])
The content safety rail blocks the harmful request.
I'm sorry, I can't respond to that.Verify the guardrails with a safe request in Hindi.
# "Tell me about three common foods in India." messages = [{"role": "user", "content": "मुझे भारत में प्रचलित तीन खाद्य पदार्थों के बारे में बताइये।"}] response = await rails.generate_async(messages=messages) print(response['content'])
The model responds with information about rice, roti, and dal—common Indian foods.
Deploy Llama 3.1 Nemotron Safety Guard 8B V3 NIM locally#
This section shows how to run the Nemotron Safety Guard 8B model locally while still using the build.nvidia.com hosted main model. The prerequisites are:
The NeMo Guardrails library installed.
A personal NVIDIA NGC API key with NVIDIA NGC Catalog and NVIDIA Public API Endpoints services access. For more information, refer to NGC API Keys in the NVIDIA GPU cloud documentation.
Docker installed.
NVIDIA Container Toolkit installed.
The rest of the software requirements for the Llama 3.1 Nemotron Safety Guard 8B V3 NIM.
GPUs meeting the memory requirement specified in the NVIDIA Llama 3.1 Nemotron Safety Guard 8B NIM Model Profiles.
To run the Llama 3.1 Nemotron Safety Guard 8B V3 in a Docker container, follow these steps:
Update the
config.ymlfile you created earlier to point to a local NIM deployment rather than build.nvidia.com. The following configuration adds abase_urlandmodel_namefield underparameters, which tells the NeMo Guardrails toolkit to make requests to thenvidia/llama-3.1-nemotron-safety-guard-8b-v3model hosted athttp://localhost:8123/v1. The Guardrails configuration must match the NIM Docker container configuration for them to communicate.models: - type: main engine: nim model: meta/llama-3.3-70b-instruct - type: content_safety engine: nim model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3 parameters: base_url: "http://localhost:8123/v1" model_name: "nvidia/llama-3.1-nemotron-safety-guard-8b-v3" rails: input: flows: - content safety check input $model=content_safety output: flows: - content safety check output $model=content_safety
Start the Llama 3.1 Nemotron Safety Guard 8B V3 NIM Docker container. Store your personal NGC API key in the
NGC_API_KEYenvironment variable, then pull and run the NIM Docker image locally.Log in to your NVIDIA NGC account.
Export your personal NGC API key to an environment variable.
$ export NGC_API_KEY="..."
Log in to the NGC registry by running the following command.
$ docker login nvcr.io --username '$oauthtoken' --password-stdin <<< $NGC_API_KEY
Download the container.
$ docker pull nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:1.14.0
Create a model cache directory on the host machine.
$ export LOCAL_NIM_CACHE=~/.cache/safetyguard8b $ mkdir -p "${LOCAL_NIM_CACHE}" $ chmod 700 "${LOCAL_NIM_CACHE}"
Run the container with the cache directory mounted.
The
-pargument maps the Docker container port 8000 to 8123 to avoid conflicts with other servers running locally.$ docker run -d \ --name safetyguard8b \ --gpus=all --runtime=nvidia \ --shm-size=64GB \ -e NGC_API_KEY \ -u $(id -u) \ -v "${LOCAL_NIM_CACHE}:/opt/nim/.cache/" \ -p 8123:8000 \ nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3:1.14.0
The container requires several minutes to start and download the model from NGC. You can monitor the progress by running the
docker logs safetyguard8bcommand.Confirm the service is ready to respond to inference requests.
$ curl -X GET http://localhost:8123/v1/models | jq '.data[].id'
This returns the following response.
"nvidia/llama-3.1-nemotron-safety-guard-8b-v3"
Follow the steps in Run the Guardrails Chat Application and Import the NeMo Guardrails Library in Python to run Guardrails with the local model.