Deploy Nemotron Content Safety Reasoning 4B
Overview
Nemotron-Content-Safety-Reasoning-4B is a Large Language Model (LLM) classifier designed to function as a dynamic and adaptable guardrail for content safety and dialogue moderation.
Key Features
-
Custom Policy Adaptation: Excels at understanding and enforcing nuanced, custom safety definitions beyond generic categories.
-
Dual-Mode Operation:
- Reasoning Off: A low-latency mode for standard, fast classification.
- Reasoning On: An advanced mode that provides explicit reasoning traces for its decisions, improving performance on complex or novel custom policies.
- Examples: Reasoning On and Reasoning Off on HuggingFace.
-
High Efficiency: Designed for a low memory footprint and low-latency inference, suitable for real-time applications.
Model Details
See the full Model Architecture on HuggingFace.
Prerequisites
-
Python 3.10 or later
-
GPU with at least 16GB VRAM (see Hardware Requirements on HuggingFace)
-
vLLM installed:
-
HuggingFace access to the model (accept the license at HuggingFace)
Deploying the Content Safety Model with vLLM
Start a vLLM server for the Nemotron-Content-Safety-Reasoning-4B model. See also Serving with vLLM on HuggingFace for additional options.
Verify the server is ready:
Configuring NeMo Guardrails
Step 1: Create Configuration Directory
Create a configuration directory for your guardrails setup:
Step 2: Create config.yml
Save the following as config/config.yml:
You can use any LLM provider for the main model (OpenAI, NIM, Anthropic, etc.). See the Model Configuration guide for available engines.
Step 3: Create prompts.yml
Save the following as config/prompts.yml. This uses the Recommended Prompt Template from HuggingFace:
The reasoning_enabled variable is automatically passed to prompt templates by the content safety action, based on the rails.config.content_safety.reasoning.enabled setting.
Running Inference
Load the Configuration
Test with a Safe Request
Example Output
When reasoning mode is disabled, the model generates a safety prediction directly:
Test with an Unsafe Request
Example Output
When reasoning mode is enabled, the model generates a reasoning trace followed by the safety prediction:
Configuration Options
Reasoning Mode
Toggle between reasoning modes in config.yml:
Reasoning On (/think): Provides explicit reasoning traces for decisions. Better for complex or novel custom policies. Higher latency. See example.
Reasoning Off (/no_think): Fast classification without reasoning. Suitable for standard content safety policies. Lower latency. See example.
Custom Safety Policies
Nemotron-Content-Safety-Reasoning-4B excels at custom policy enforcement. You can modify the taxonomy in prompts.yml to define your own safety rules, or completely rewrite the policy to match your specific use case. See the Topic Following for Custom Safety example on HuggingFace.
Adding Categories
Add new categories to the existing taxonomy:
Replacing the Entire Policy
You can completely replace the default taxonomy with your own custom policy. For example, for a customer service bot that should only discuss product-related topics:
This flexibility allows you to adapt the model for topic-following, dialogue moderation, or any custom content filtering scenario.
Custom Output Parsers
If you need to customize how the model output is parsed (e.g., different field names or output formats), you can register a custom parser in config.py.
Example: Parsing Custom Field Names
If you’ve customized your prompt to use different output fields like “User request: safe/unsafe”, create a parser to handle it:
Then reference it in prompts.yml:
Next Steps
- Explore how to use custom safety policies to adapt the model to your specific use case
- Learn about topic following for dialogue moderation
- Read the paper that describes how we built Nemotron-Content-Safety-Reasoning-4B: “Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models”