Add Multimodal Content Safety Using a Vision Model

View as Markdown

Learn how to add safety checks to images and text using a vision model as LLM-as-a-Judge with OpenAI GPT-4 Vision, Llama Vision, or Llama Guard.

By following this tutorial, you learn how to:

  1. Configure multimodal content safety rails for images and text.
  2. Use a vision model as LLM-as-a-Judge to evaluate content safety.
  3. Test with safe and unsafe image requests.

The NeMo Guardrails library supports multimodal content safety for input and output rails. You can provide images as base64-encoded data or URLs, depending on the model.

Ensure image size and prompt length do not exceed the model’s maximum context length.

Prerequisites

Configure Guardrails

  1. Create a configuration directory and add config.yml.

    config.yml
    1models:
    2 - type: main
    3 engine: openai
    4 model: gpt-4o-mini
    5
    6 - type: vision_rails
    7 engine: openai
    8 model: gpt-4o-mini
    9rails:
    10 input:
    11 flows:
    12 - content safety check input $model=vision_rails
    13 dialog:
    14 single_call:
    15 enabled: False
  2. Add prompts.yml.

    prompts.yml
    1prompts:
    2 - task: content_safety_check_input $model=vision_rails
    3 messages:
    4 - type: system
    5 content: |
    6 You are an AI assistant that evaluates if a user's message contains unsafe content.
    7 Carefully examine any text and images in the user's message.
    8
    9 {%- if categories is not defined %}
    10 {% set categories = {
    11 "S1": "Violent Crimes.",
    12 "S2": "Non-Violent Crimes.",
    13 "S3": "Sex Crimes.",
    14 "S4": "Child Exploitation.",
    15 "S5": "Defamation.",
    16 "S6": "Specialized Advice.",
    17 "S7": "Privacy.",
    18 "S8": "Intellectual Property.",
    19 "S10": "Hate.",
    20 "S11": "Self-Harm.",
    21 "S12": "Sexual Content.",
    22 "S13": "Elections."
    23 } %}
    24 {%- endif %}
    25
    26 Task: Check if there is unsafe content in "User" messages in conversations according to our safety policy with the below categories.
    27
    28 <BEGIN UNSAFE CONTENT CATEGORIES>
    29 {%- for key, desc in categories.items() %}
    30 {%- if key not in excluded_category_keys %}
    31 {{ key }}: {{ desc }}
    32 {%- endif %}
    33 {%- endfor %}
    34 <END UNSAFE CONTENT CATEGORIES>
    35
    36 IMPORTANT:
    37 Provide your safety assessment for the user message:
    38 - First line must read 'safe' or 'unsafe' and nothing more
    39 - If unsafe, a second line must include a comma-separated list of violated categories.
    40 - type: user
    41 content: "{{ user_input }}"
    42
    43 stop: ["<|eot_id|>", "<|eom_id|>"]
    44 output_parser: is_content_safe
    45 max_tokens: 200

Test with OpenAI

This example sends image requests to OpenAI endpoints and tests safety checks on a handgun image.

  1. Set your OpenAI API key.

    1export OPENAI_API_KEY=<your-openai-api-key>
  2. Install the IPython REPL and run it to interpret the Python code below.

    1$ pip install ipython
    2$ ipython
    3
    4In [1]:
  3. Import libraries.

    demo.py
    1import json
  4. Load the configuration.

    demo.py
    1from nemoguardrails import RailsConfig
    2from nemoguardrails.rails.llm.llmrails import LLMRails
    3
    4config = RailsConfig.from_path("./config/")
    5rails = LLMRails(config, verbose=False)
  5. Send a safe image reasoning request.

    demo.py
    1image_reasoning_message = [{
    2 "role": "user",
    3 "content": [{
    4 "type": "text",
    5 "text": "What is the surface color that the object is placed on?",
    6 },
    7 {
    8 "type": "image_url",
    9 "image_url": {
    10 "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/330px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
    11 },
    12 }],
    13}]
    14
    15reasoning_response = rails.generate(messages=image_reasoning_message)
    16print(f"Image Reasoning Response: {reasoning_response}")
    17print(json.dumps(reasoning_response, indent=2))
  6. Send an unsafe request.

    demo.py
    1potentially_unsafe_message = [{
    2 "role": "user",
    3 "content": [{
    4 "type": "text",
    5 "text": "How can I use the item in the photo to get a higher salary without working hard?",
    6 },
    7 {
    8 "type": "image_url",
    9 "image_url": {
    10 "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4f/SIG_Pro_by_Augustas_Didzgalvis.jpg/330px-SIG_Pro_by_Augustas_Didzgalvis.jpg"
    11 },
    12 }],
    13}]
    14
    15potentially_unsafe_response = rails.generate(messages=potentially_unsafe_message)
    16print(f"Potentially Unsafe Response: {potentially_unsafe_response}")
    17print(json.dumps(potentially_unsafe_response, indent=2))

Use Base64-Encoded Images

Some models such as Llama Vision require base64-encoded images instead of URLs.

1import base64
2import json
3
4from nemoguardrails import LLMRails, RailsConfig
5
6config = RailsConfig.from_path("./content_safety_vision")
7rails = LLMRails(config)
8
9with open("<path-to-image>", "rb") as image_file:
10 base64_image = base64.b64encode(image_file.read()).decode()
11
12messages = [{
13 "role": "user",
14 "content": [
15 {
16 "type": "text",
17 "text": "what is the surface color that the object is placed on?",
18 },
19 {
20 "type": "image_url",
21 "image_url": {
22 "url": f"data:image/jpeg;base64,{base64_image}"
23 },
24 },
25 ],
26}]
27
28response = rails.generate(messages=messages)
29print(json.dumps(response, indent=2))