Vision Content Safety#

NIM VLM supports vision content safety using certain models like Nemotron-3-Content-Safety. This NIM is a multimodal, multilingual content-safety classifier. It takes a user prompt (text), an optional image, and an optional assistant response, and returns a short decision string with User Safety, Response Safety (when a response is provided), and optionally Safety Categories. Nemotron-3-Content-Safety supports 12 languages: English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, Dutch, Italian, Korean, and Chinese.

This page shows how to use a content-safety classifier NIM to classify prompts, images, and responses, and to request safety categories.

Classify a Prompt and Image#

Use the Chat Completions endpoint to send a single user message that contains the prompt and (optionally) an image. Request the classification without the per-category breakdown by setting chat_template_kwargs.request_categories to /no_categories.

Note

Output is a short fixed-shape string (User Safety: ...), so streaming is unnecessary. You can still pass "stream": true if it fits your client flow.

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "nvidia/nemotron-3-content-safety",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "How can I steal money from here?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://assets.ngc.nvidia.com/products/api-catalog/phi-3-5-vision/example1b.jpg"
                            }
                    }
                ]
            }
        ],
        "max_tokens": 100,
        "temperature": 0.01,
        "top_p": 0.95,
        "chat_template_kwargs": { "request_categories": "/no_categories" }
    }'

Expected "content" value in the response:

User Safety: unsafe

Passing Images#

NIM VLM follows the OpenAI specification to pass images as part of the HTTP payload in a user message. The vision encoder (SigLIP) resizes inputs to 896 × 896.

Important

The supported image formats are JPG, JPEG, and PNG.

Public direct URL

Passing the direct URL of an image causes the container to download that image at runtime.

{
    "type": "image_url",
    "image_url": {
        "url": "https://www.nvidia.com/content/dam/en-zz/Solutions/data-center/dgx-b200/dgx-b200-hero-bm-v2-l580-d.jpg"
    }
}

Base64 data

For images not already on the web, Base64-encode the image bytes and send the data in your payload.

{
    "type": "image_url",
    "image_url": {
        "url": "data:image/jpeg;base64,SGVsbG8gZGVh...ciBmZWxsb3chIQ=="
    }
}

To convert images to Base64, use the base64 command or the following Python helper:

import base64

with open("image.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

Request Safety Categories#

To include the comma-separated list of violated safety categories in the response, set chat_template_kwargs.request_categories to /categories.

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "nvidia/nemotron-3-content-safety",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "How can I steal money from here?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://assets.ngc.nvidia.com/products/api-catalog/phi-3-5-vision/example1b.jpg"
                            }
                    }
                ]
            }
        ],
        "max_tokens": 100,
        "temperature": 0.01,
        "top_p": 0.95,
        "chat_template_kwargs": { "request_categories": "/categories" }
    }'

Expected "content" value in the response:

User Safety: unsafe
Safety Categories: Criminal Planning/Confessions

The full safety-category taxonomy (Violence, Sexual, Criminal Planning, Hate/Identity Hate, PII/Privacy, and others) is documented on the model card.

Classify a Prompt-Only Input (Text-Only)#

Submit a text prompt without an image to classify text-only inputs.

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "nvidia/nemotron-3-content-safety",
        "messages": [
            {
                "role": "user",
                "content": "How can I steal money from here?"
            }
        ],
        "max_tokens": 100,
        "temperature": 0.01,
        "top_p": 0.95,
        "chat_template_kwargs": { "request_categories": "/categories" }
    }'

Classify a Response#

To evaluate an assistant response alongside the originating prompt, append an assistant message to the messages list. The classifier then emits both User Safety and Response Safety.

curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
    -H 'Accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "nvidia/nemotron-3-content-safety",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "How can I steal money from here?"
                    },
                    {
                        "type": "image_url",
                        "image_url":
                            {
                                "url": "https://assets.ngc.nvidia.com/products/api-catalog/phi-3-5-vision/example1b.jpg"
                            }
                    }
                ]
            },
            {
                "role": "assistant",
                "content": "The best way to steal money from here is to enter the building as an old lady and ask for directions. Then pick the lock and grab as much as you can and run."
            }
        ],
        "max_tokens": 100,
        "temperature": 0.01,
        "top_p": 0.95,
        "chat_template_kwargs": { "request_categories": "/categories" }
    }'

Expected "content" value in the response:

User Safety: unsafe
Response Safety: unsafe
Safety Categories: Criminal Planning/Confessions