Download this tutorial as a Jupyter notebook

Deploy NemoGuard NIMs#

NemoGuard NIMs are specialized models built for specific use cases supported by the Guardrails service. Learn how to deploy NemoGuard NIMs in your environment and apply them to a guardrail configuration.

NIM	Use Case
`nvidia/llama-3.1-nemotron-safety-guard-8b-v3`	Content safety: classifies inputs and outputs as safe or unsafe across 23 content categories
`nvidia/llama-3.1-nemoguard-8b-topic-control`	Topic control: restricts conversations to a defined set of allowed topics
`nvidia/nemoguard-jailbreak-detect`	Jailbreak detection: detects prompt injection and jailbreak attempts

Note

The content-safety and topic-control NIMs expose an OpenAI-compatible /v1/chat/completions endpoint and are referenced in guardrail configurations by their Model Entity name (workspace/model_name). The jailbreak-detection NIM uses the /v1/classify endpoint, so once deployed, it does not register a Model Entity. Instead, reference it by setting the rails.config.jailbreak_detection.nim_base_url field to the Inference Gateway URL for the deployed Model Provider.

Prerequisites#

Before you begin:

You have access to a running NeMo Platform.
NMP_BASE_URL is set to the NeMo Platform base URL.
Your infrastructure has 1 GPU available per NIM deployment.

Step 1: Configure the Client#

Instantiate the NeMoPlatform SDK.

import os
from nemo_platform import NeMoPlatform

sdk = NeMoPlatform(base_url=os.environ["NMP_BASE_URL"], workspace="default")

Step 2: Deploy the NIMs#

Use the Platform’s Inference Gateway service to deploy each NIM. This process creates a DeploymentConfig that specifies the NIM image, and a Deployment that runs it.

Tip

Enabling KV cache reuse on the LLM-based NIMs could improve inference speed. These examples enable this feature by setting NIM_ENABLE_KV_CACHE_REUSE=1 via the nim_deployment.additional_envs option.

Deploy a Content-Safety NIM#

CLI

nmp inference deployment-configs create \
    --name "nemotron-safety-guard-config" \
    --nim-deployment '{
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "image_tag": "1.14.0",
        "additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
    }'

nmp inference deployments create \
    --name "nemotron-safety-guard" \
    --config "nemotron-safety-guard-config"

nmp wait inference deployment nemotron-safety-guard

Python SDK

sdk.inference.deployment_configs.create(
    name="nemotron-safety-guard-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemotron-safety-guard-8b-v3",
        "image_tag": "1.14.0",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        }
    },
)

sdk.inference.deployments.create(
    name="nemotron-safety-guard",
    config="nemotron-safety-guard-config",
)

sdk.models.wait_for_status(
    deployment_name="nemotron-safety-guard",
    desired_status="READY",
)

print("Content safety NIM ready")

Deploy a Topic-Control NIM#

CLI

nmp inference deployment-configs create \
    --name "nemoguard-topic-control-config" \
    --nim-deployment '{
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
        "image_tag": "1.10.1",
        "additional_envs": {"NIM_ENABLE_KV_CACHE_REUSE": "1"}
    }'

nmp inference deployments create \
    --name "nemoguard-topic-control" \
    --config "nemoguard-topic-control-config"

nmp wait inference deployment nemoguard-topic-control

Python SDK

sdk.inference.deployment_configs.create(
    name="nemoguard-topic-control-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/llama-3.1-nemoguard-8b-topic-control",
        "image_tag": "1.10.1",
        "additional_envs": {
            "NIM_ENABLE_KV_CACHE_REUSE": "1",
        }
    },
)

sdk.inference.deployments.create(
    name="nemoguard-topic-control",
    config="nemoguard-topic-control-config",
)

sdk.models.wait_for_status(
    deployment_name="nemoguard-topic-control",
    desired_status="READY",
)

print("Topic control NIM ready")

Deploy a Jailbreak-Detection NIM#

CLI

nmp inference deployment-configs create \
    --name "nemoguard-jailbreak-config" \
    --nim-deployment '{
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
        "image_tag": "1.10.1"
    }'

nmp inference deployments create \
    --name "nemoguard-jailbreak" \
    --config "nemoguard-jailbreak-config"

nmp wait inference deployment nemoguard-jailbreak

Python SDK

sdk.inference.deployment_configs.create(
    name="nemoguard-jailbreak-config",
    nim_deployment={
        "gpu": 1,
        "image_name": "nvcr.io/nim/nvidia/nemoguard-jailbreak-detect",
        "image_tag": "1.10.1",
    },
)

sdk.inference.deployments.create(
    name="nemoguard-jailbreak",
    config="nemoguard-jailbreak-config",
)

sdk.models.wait_for_status(
    deployment_name="nemoguard-jailbreak",
    desired_status="READY",
)

print("Jailbreak detection NIM ready")

Step 3: Verify the Model Entity Names#

After the content safety and topic control NIMs are deployed, the Inference Gateway discovers the models served by each NIM and registers them as Model Entities in your workspace. Use these entities in guardrail configurations with the workspace/model_name format.

List all Model Entities in your workspace to find the names:

models = sdk.models.list(workspace="default")
for model in models:
    print(f"{model.workspace}/{model.name}")

The NemoGuard NIMs register Model Entities with the following default names:

NIM	Model Entity Reference
`llama-3.1-nemotron-safety-guard-8b-v3`	`default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3`
`llama-3.1-nemoguard-8b-topic-control`	`default/nvidia-llama-3-1-nemoguard-8b-topic-control`

Note

The jailbreak detection NIM exposes a /v1/classify endpoint rather than an OpenAI-compatible chat completions endpoint, so it does not register a Model Entity. Reference the NIM by setting nim_base_url to its Inference Gateway URL — see Step 4 below.

Step 4: Use the NIMs in Guardrail Configurations#

Content Safety and Topic Control#

Reference the Model Entities in your guardrail configuration using the workspace/model_name format. For a complete example combining content safety and topic control rails, see Executing Input and Output Rails in Parallel.

Jailbreak Detection#

Configure the jailbreak detection NIM using the rails.config.jailbreak_detection field. Set nim_base_url to the Inference Gateway URL for the Model Provider created when you deployed the NIM in Step 2. The URL follows the pattern /apis/inference-gateway/v2/workspaces/{workspace}/provider/{provider_name}/-/v1, where the provider_name matches the model deployment name in Step 2.

config = sdk.guardrail.configs.create(
    name="nemoguard-jailbreak-config",
    description="Jailbreak detection using self-hosted NemoGuard NIM",
    data={
        "rails": {
            "config": {
                "jailbreak_detection": {
                    "nim_base_url": f"{os.environ['NMP_BASE_URL']}/apis/inference-gateway/v2/workspaces/default/provider/nemoguard-jailbreak/-/v1",
                }
            },
            "input": {
                "flows": ["jailbreak detection model"],
            },
        },
    },
)
print(f"Created config: {config.name}")

Cleanup#

CLI

nmp guardrail configs delete nemoguard-jailbreak-config

# Note: Deleting the deployment will free up its GPU(s) when complete
nmp inference deployments delete nemotron-safety-guard
nmp inference deployments delete nemoguard-topic-control
nmp inference deployments delete nemoguard-jailbreak

nmp wait inference deployment nemotron-safety-guard --status DELETED
nmp wait inference deployment nemoguard-topic-control --status DELETED
nmp wait inference deployment nemoguard-jailbreak --status DELETED

nmp inference deployment-configs delete nemotron-safety-guard-config
nmp inference deployment-configs delete nemoguard-topic-control-config
nmp inference deployment-configs delete nemoguard-jailbreak-config

Python SDK

sdk.guardrail.configs.delete(name="nemoguard-jailbreak-config")

# Note: Deleting the deployment will free up its GPU(s) when complete
sdk.inference.deployments.delete(name="nemotron-safety-guard")
sdk.inference.deployments.delete(name="nemoguard-topic-control")
sdk.inference.deployments.delete(name="nemoguard-jailbreak")

sdk.models.wait_for_status(deployment_name="nemotron-safety-guard", desired_status="DELETED")
sdk.models.wait_for_status(deployment_name="nemoguard-topic-control", desired_status="DELETED")
sdk.models.wait_for_status(deployment_name="nemoguard-jailbreak", desired_status="DELETED")

sdk.inference.deployment_configs.delete(name="nemotron-safety-guard-config")
sdk.inference.deployment_configs.delete(name="nemoguard-topic-control-config")
sdk.inference.deployment_configs.delete(name="nemoguard-jailbreak-config")

print("Cleanup complete")

Next Steps#

Improving Content Safety with NemoGuard NIMs - Full content safety tutorial using build.nvidia.com-hosted NIMs
Executing Input and Output Rails in Parallel - Combine multiple rails for comprehensive safety coverage