Add Multimodal Content Safety Using a Vision Model
Learn how to add safety checks to images and text using a vision model as LLM-as-a-Judge with OpenAI GPT-4 Vision, Llama Vision, or Llama Guard.
By following this tutorial, you learn how to:
- Configure multimodal content safety rails for images and text.
- Use a vision model as LLM-as-a-Judge to evaluate content safety.
- Test with safe and unsafe image requests.
The NeMo Guardrails library supports multimodal content safety for input and output rails. You can provide images as base64-encoded data or URLs, depending on the model.
Ensure image size and prompt length do not exceed the model’s maximum context length.
Prerequisites
- The NeMo Guardrails library installed with the
openaiextra. - A personal NVIDIA API key generated on https://build.nvidia.com/.
Configure Guardrails
-
Create a configuration directory and add
config.yml.config.yml -
Add
prompts.yml.prompts.yml
Test with OpenAI
This example sends image requests to OpenAI endpoints and tests safety checks on a handgun image.
-
Set your OpenAI API key.
-
Install the IPython REPL and run it to interpret the Python code below.
-
Import libraries.
demo.py -
Load the configuration.
demo.py -
Send a safe image reasoning request.
demo.py -
Send an unsafe request.
demo.py
Use Base64-Encoded Images
Some models such as Llama Vision require base64-encoded images instead of URLs.