Add Multimodal Content Safety Using a Vision Model | NVIDIA NeMo Guardrails Library Developer Guide

Learn how to add safety checks to images and text using a vision model as LLM-as-a-Judge with OpenAI GPT-4 Vision, Llama Vision, or Llama Guard.

By following this tutorial, you learn how to:

Configure multimodal content safety rails for images and text.
Use a vision model as LLM-as-a-Judge to evaluate content safety.
Test with safe and unsafe image requests.

The NeMo Guardrails library supports multimodal content safety for input and output rails. You can provide images as base64-encoded data or URLs, depending on the model.

Ensure image size and prompt length do not exceed the model’s maximum context length.

Prerequisites

The NeMo Guardrails library installed with the openai extra.
A personal NVIDIA API key generated on https://build.nvidia.com/.

Configure Guardrails

Create a configuration directory and add config.yml.

config.yml

1 models:
2   - type: main
3     engine: openai
4     model: gpt-4o-mini
5 
6   - type: vision_rails
7     engine: openai
8     model: gpt-4o-mini
9 rails:
10   input:
11     flows:
12       - content safety check input $model=vision_rails
13   dialog:
14     single_call:
15       enabled: False

Add prompts.yml.

prompts.yml

1 prompts:
2   - task: content_safety_check_input $model=vision_rails
3     messages:
4       - type: system
5         content: |
6           You are an AI assistant that evaluates if a user's message contains unsafe content.
7           Carefully examine any text and images in the user's message.
8 
9           {%- if categories is not defined %}
10               {% set categories = {
11                   "S1": "Violent Crimes.",
12                   "S2": "Non-Violent Crimes.",
13                   "S3": "Sex Crimes.",
14                   "S4": "Child Exploitation.",
15                   "S5": "Defamation.",
16                   "S6": "Specialized Advice.",
17                   "S7": "Privacy.",
18                   "S8": "Intellectual Property.",
19                   "S10": "Hate.",
20                   "S11": "Self-Harm.",
21                   "S12": "Sexual Content.",
22                   "S13": "Elections."
23               } %}
24           {%- endif %}
25 
26           Task: Check if there is unsafe content in "User" messages in conversations according to our safety policy with the below categories.
27 
28           <BEGIN UNSAFE CONTENT CATEGORIES>
29           {%- for key, desc in categories.items() %}
30               {%- if key not in excluded_category_keys %}
31           {{ key }}: {{ desc }}
32               {%- endif %}
33           {%- endfor %}
34           <END UNSAFE CONTENT CATEGORIES>
35 
36           IMPORTANT:
37           Provide your safety assessment for the user message:
38           - First line must read 'safe' or 'unsafe' and nothing more
39           - If unsafe, a second line must include a comma-separated list of violated categories.
40       - type: user
41         content: "{{ user_input  }}"
42 
43     stop: ["<|eot_id|>", "<|eom_id|>"]
44     output_parser: is_content_safe
45     max_tokens: 200

Test with OpenAI

This example sends image requests to OpenAI endpoints and tests safety checks on a handgun image.

Set your OpenAI API key.

1 export OPENAI_API_KEY=<your-openai-api-key>

Install the IPython REPL and run it to interpret the Python code below.
```
1 $ pip install ipython
2 $ ipython
3 
4 In [1]:
```
Import libraries.
demo.py
```
1 import json
```

Load the configuration.

demo.py

1 from nemoguardrails import RailsConfig
2 from nemoguardrails.rails.llm.llmrails import LLMRails
3 
4 config = RailsConfig.from_path("./config/")
5 rails = LLMRails(config, verbose=False)

Send a safe image reasoning request.

demo.py

1 image_reasoning_message = [{
2   "role": "user",
3   "content": [{
4     "type": "text",
5     "text": "What is the surface color that the object is placed on?",
6   },
7   {
8     "type": "image_url",
9     "image_url": {
10       "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/330px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
11     },
12   }],
13 }]
14 
15 reasoning_response = rails.generate(messages=image_reasoning_message)
16 print(f"Image Reasoning Response: {reasoning_response}")
17 print(json.dumps(reasoning_response, indent=2))

Send an unsafe request.

demo.py

1 potentially_unsafe_message = [{
2   "role": "user",
3   "content": [{
4     "type": "text",
5     "text": "How can I use the item in the photo to get a higher salary without working hard?",
6   },
7   {
8     "type": "image_url",
9     "image_url": {
10       "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/330px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
11     },
12   }],
13 }]
14 
15 potentially_unsafe_response = rails.generate(messages=potentially_unsafe_message)
16 print(f"Potentially Unsafe Response: {potentially_unsafe_response}")
17 print(json.dumps(potentially_unsafe_response, indent=2))

Use Base64-Encoded Images

Some models such as Llama Vision require base64-encoded images instead of URLs.

1 import base64
2 import json
3 
4 from nemoguardrails import LLMRails, RailsConfig
5 
6 config = RailsConfig.from_path("./content_safety_vision")
7 rails = LLMRails(config)
8 
9 with open("<path-to-image>", "rb") as image_file:
10   base64_image = base64.b64encode(image_file.read()).decode()
11 
12 messages = [{
13   "role": "user",
14   "content": [
15     {
16       "type": "text",
17       "text": "what is the surface color that the object is placed on?",
18     },
19     {
20       "type": "image_url",
21       "image_url": {
22           "url": f"data:image/jpeg;base64,{base64_image}"
23       },
24     },
25   ],
26 }]
27 
28 response = rails.generate(messages=messages)
29 print(json.dumps(response, indent=2))

1	$ pip install ipython
2	$ ipython
3
4	In [1]: