Download this tutorial as a Jupyter notebook

Executing Input and Output Rails in Parallel#

Run input and output rails in parallel to improve the response time of guardrail checks. This tutorial shows how to enable parallel rails using the NeMo Platform Python SDK.

When to Use Parallel Rails Execution#

Parallel execution is most effective for the following:

I/O-bound rails, such as external API calls to models or third-party integrations.
Independent input or output rails without shared state dependencies.
Production environments where response latency affects user experience and business metrics.

Note

Input rail mutations can lead to erroneous results during parallel execution because of race conditions that arise from the execution order and timing of parallel operations. This can result in output divergence compared to sequential execution. For such cases, use sequential mode.

When Not to Use Parallel Rails Execution#

Sequential execution is recommended for the following:

CPU-bound rails; it might not improve performance and can introduce overhead.
Development and testing for debugging and simpler workflows.

Prerequisites#

Before you begin:

You have access to a running NeMo Platform.
NMP_BASE_URL is set to the NeMo Platform base URL.
A ModelProvider is configured to use NIMs hosted at build.nvidia.com for inference. Follow Using an External Endpoint if you haven’t done this yet.

This tutorial uses the following NIMs, available on build.nvidia.com:

main model: meta/llama-3.1-8b-instruct
content_safety model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
topic_control model: nvidia/llama-3.1-nemoguard-8b-topic-control

Step 1: Configure the Client#

Install the required packages.

%pip install -q nemo-platform

Instantiate the NeMoPlatform SDK.

import os
from nemo_platform import NeMoPlatform, ConflictError

sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)

Step 2: Create a Guardrail Configuration#

Create a configuration that enables parallel execution for input rails. This example runs both content safety and topic safety checks in parallel.

You can customize the topic safety check prompt based on your specific use case and allowed topics. The prompt is used as the system prompt for the topic control model to determine if the user message is on-topic or off-topic.

guardrails_config = {
    "models": [
        {
            "type": "main",
            "engine": "nim",
        },
        {
            "type": "content_safety",
            "engine": "nim",
            "model": "system/nvidia-llama-3-1-nemotron-safety-guard-8b-v3",
        },
        {
            "type": "topic_control",
            "engine": "nim",
            "model": "system/nvidia-llama-3-1-nemoguard-8b-topic-control",
        },
    ],
    "rails": {
        "input": {
            "parallel": True,
            "flows": [
                "content safety check input $model=content_safety",
                "topic safety check input $model=topic_control",
            ],
        },
        "output": {
            "flows": [
                "content safety check output $model=content_safety",
            ],
        },
    },
    "prompts": [
        {
            "task": "content_safety_check_input $model=content_safety",
            "content": "Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{\"User Safety\": safe or unsafe rating for the user message, \"Response Safety\": safe or unsafe rating for the agent response. Omit if no agent response present. \"Safety Categories\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:",
            "output_parser": "nemoguard_parse_prompt_safety",
            "max_tokens": 50,
        },
        {
            "task": "content_safety_check_output $model=content_safety",
            "content": "Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\nresponse: agent: {{ bot_response }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{\"User Safety\": safe or unsafe rating for the user message, \"Response Safety\": safe or unsafe rating for the agent response. Omit if no agent response present. \"Safety Categories\": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:",
            "output_parser": "nemoguard_parse_response_safety",
            "max_tokens": 50,
        },
        {
            "task": "topic_safety_check_input $model=topic_control",
            "content": "You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines\n\nGuidelines for the user messages:\n- Do not answer questions related to personal opinions or advice on user's order, future recommendations\n- Do not provide any information on non-company products or services.\n- Do not answer enquiries unrelated to the company policies.\n- Do not answer questions asking for personal details about the agent or its creators.\n- Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.\n- If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.\n- Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.\n- allow user comments that are related to small talk and chit-chat.\n\nUser message: \"{{ user_input }}\"",
            "max_tokens": 50,
        },
    ],
}

config_name = "parallel-rails-config"
try:
    config = sdk.guardrail.configs.create(
        name=config_name,
        description="Parallel rails guardrail configuration",
        data=guardrails_config,
    )
except ConflictError:
    print(f"Config {config_name} already exists, continuing...")

Step 3: Run Chat Completions via Guardrails#

Test the parallel rails configuration by making both safe and off-topic requests.

Make a safe, on-topic request and verify the response is allowed.

response = sdk.guardrail.chat.completions.create(
    model="system/meta-llama-3-1-8b-instruct",
    messages=[{"role": "user", "content": "What is your return policy?"}],
    guardrails={"config_id": "parallel-rails-config"},
    max_tokens=200,
)

print(response.model_dump_json(indent=2))

Make an off-topic request that the topic control input rail blocks.

response = sdk.guardrail.chat.completions.create(
    model="system/meta-llama-3-1-8b-instruct",
    messages=[{"role": "user", "content": "Tell me a joke about quantum gravity."}],
    guardrails={"config_id": "parallel-rails-config"},
    max_tokens=200,
)

print(response.model_dump_json(indent=2))

The off-topic request returns the denial message I'm sorry, I can't respond to that.

Step 4: Check Messages#

Check messages with parallel rails using the check endpoint.

check_result = sdk.guardrail.check(
    model="system/meta-llama-3-1-8b-instruct",
    messages=[
        {"role": "user", "content": "What is your return policy?"}
    ],
    guardrails={"config_id": "parallel-rails-config"},
)

print(check_result.model_dump_json(indent=2))

Cleanup#

sdk.guardrail.configs.delete(name=config_name)
print("Cleanup complete")