Executing Input and Output Rails in Parallel

Run input and output rails in parallel to improve the response time of guardrail checks. This tutorial shows how to enable parallel rails using the NeMo Platform Python SDK.

When to Use Parallel Rails Execution

Parallel execution is most effective for the following:

I/O-bound rails, such as external API calls to models or third-party integrations.
Independent input or output rails without shared state dependencies.
Production environments where response latency affects user experience and business metrics.

Input rail mutations can lead to erroneous results during parallel execution because of race conditions that arise from the execution order and timing of parallel operations. This can result in output divergence compared to sequential execution. For such cases, use sequential mode.

When Not to Use Parallel Rails Execution

Sequential execution is recommended for the following:

CPU-bound rails; it might not improve performance and can introduce overhead.
Development and testing for debugging and simpler workflows.

Prerequisites

Before you begin:

You have access to a running NeMo Platform.
NMP_BASE_URL is set to the NeMo Platform base URL.
A ModelProvider is configured with an LLM provider. Follow Setup if you haven’t done this yet.

This tutorial uses the following NIMs, available on build.nvidia.com:

main model: meta/llama-3.1-8b-instruct
content_safety model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3
topic_control model: nvidia/llama-3.1-nemoguard-8b-topic-control

Step 1: Configure the Client

Instantiate the platform client.

1 import os
2 from nemo_platform import NeMoPlatform, ConflictError
3 
4 client = NeMoPlatform(
5     base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
6     workspace="default",
7 )

Step 2: Create a Guardrail Configuration

Create a configuration that enables parallel execution for input rails. This example runs both content safety and topic safety checks in parallel.

You can customize the topic safety check prompt based on your specific use case and allowed topics. The prompt is used as the system prompt for the topic control model to determine if the user message is on-topic or off-topic.

1 guardrails_config = {
2     "models": [
3         {
4             "type": "content_safety",
5             "engine": "nim",
6             "model": "default/nvidia-llama-3-1-nemotron-safety-guard-8b-v3",
7         },
8         {
9             "type": "topic_control",
10             "engine": "nim",
11             "model": "default/nvidia-llama-3-1-nemoguard-8b-topic-control",
12         },
13     ],
14     "rails": {
15         "input": {
16             "parallel": True,
17             "flows": [
18                 "content safety check input $model=content_safety",
19                 "topic safety check input $model=topic_control",
20             ],
21         },
22         "output": {
23             "flows": [
24                 "content safety check output $model=content_safety",
25             ],
26         },
27     },
28     "prompts": [
29         {
30             "task": "content_safety_check_input $model=content_safety",
31             "content": 'Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a \'safe\' or \'unsafe\' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:',
32             "output_parser": "nemoguard_parse_prompt_safety",
33             "max_tokens": 50,
34         },
35         {
36             "task": "content_safety_check_output $model=content_safety",
37             "content": 'Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a \'safe\' or \'unsafe\' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.\n\n<BEGIN UNSAFE CONTENT CATEGORIES>\nS1: Violence.\nS2: Sexual.\nS3: Criminal Planning/Confessions.\nS4: Guns and Illegal Weapons.\nS5: Controlled/Regulated Substances.\nS6: Suicide and Self Harm.\nS7: Sexual (minor).\nS8: Hate/Identity Hate.\nS9: PII/Privacy.\nS10: Harassment.\nS11: Threat.\nS12: Profanity.\nS13: Needs Caution.\nS14: Other.\nS15: Manipulation.\nS16: Fraud/Deception.\nS17: Malware.\nS18: High Risk Gov Decision Making.\nS19: Political/Misinformation/Conspiracy.\nS20: Copyright/Trademark/Plagiarism.\nS21: Unauthorized Advice.\nS22: Illegal Activity.\nS23: Immoral/Unethical.\n<END UNSAFE CONTENT CATEGORIES>\n\n<BEGIN CONVERSATION>\n\nuser: {{ user_input }}\n\nresponse: agent: {{ bot_response }}\n\n<END CONVERSATION>\n\nProvide your safety assessment for the above conversation in the below output JSON format:\n{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}\n\nDo not include anything other than the output JSON in your response.\nOutput JSON:',
38             "output_parser": "nemoguard_parse_response_safety",
39             "max_tokens": 50,
40         },
41         {
42             "task": "topic_safety_check_input $model=topic_control",
43             "content": "You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines\n\nGuidelines for the user messages:\n- Do not answer questions related to personal opinions or advice on user's order, future recommendations\n- Do not provide any information on non-company products or services.\n- Do not answer enquiries unrelated to the company policies.\n- Do not answer questions asking for personal details about the agent or its creators.\n- Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.\n- If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.\n- Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.\n- allow user comments that are related to small talk and chit-chat.\n\nUser message: \"{{ user_input }}\"",
44             "max_tokens": 50,
45         },
46     ],
47 }
48 
49 config_name = "parallel-rails-config"
50 try:
51     config = client.guardrail.configs.create(
52         name=config_name,
53         description="Parallel rails guardrail configuration",
54         data=guardrails_config,
55     )
56 except ConflictError:
57     print(f"Config {config_name} already exists, continuing...")

Step 3: Create a VirtualModel

Create a VirtualModel that routes inference through the guardrails middleware. The guardrails configuration is applied as both request and response middleware.

CLI

Python SDK

$ nemo inference virtual-models create guarded-parallel-rails \
>   --default-model-entity default/meta-llama-3-1-8b-instruct \
>   --request-middleware '[{"name":"nemo-guardrails","config_type":"guardrail_config","config_id":"default/parallel-rails-config"}]' \
>   --response-middleware '[{"name":"nemo-guardrails","config_type":"guardrail_config","config_id":"default/parallel-rails-config"}]'

Step 4: Run Chat Completions via Guardrails

Test the parallel rails configuration by making both safe and off-topic requests. Inference calls go through the standard IGW endpoint using the VirtualModel.

Get a pre-configured OpenAI client from the SDK, then make a safe, on-topic request and verify the response is allowed.

1 oai_client = client.models.get_openai_client()
2 
3 response = oai_client.chat.completions.create(
4     model="default/guarded-parallel-rails",
5     messages=[{"role": "user", "content": "What is your return policy?"}],
6     max_tokens=200,
7 )
8 
9 print(response.model_dump_json(indent=2))

Make an off-topic request that the topic control input rail blocks.

1 response = oai_client.chat.completions.create(
2     model="default/guarded-parallel-rails",
3     messages=[{"role": "user", "content": "Tell me a joke about quantum gravity."}],
4     max_tokens=200,
5 )
6 
7 print(response.model_dump_json(indent=2))

The off-topic request returns the denial message I'm sorry, I can't respond to that.

Step 5: Inspect Activated Rails

Ask the plugin to include rail-activation diagnostics in the response by setting guardrails.options.log.activated_rails. The OpenAI client forwards request fields it doesn’t natively know about through extra_body, so no SDK change is needed.

1 response = oai_client.chat.completions.create(
2     model="default/guarded-parallel-rails",
3     messages=[{"role": "user", "content": "What is your return policy?"}],
4     max_tokens=200,
5     extra_body={"guardrails": {"options": {"log": {"activated_rails": True}}}},
6 )
7 
8 print(response.model_dump_json(indent=2))

Inspect guardrails_data.log.activated_rails in the response — it’s a list, with one entry per rail that ran. Each entry carries the rail’s name, type, the decisions it made, and a stop flag indicating whether the rail terminated the request.

Example Response

1 {
2   "guardrails_data": {
3     "config_ids": [
4       "default/parallel-rails-config"
5     ],
6     "log": {
7       "activated_rails": [
8         {
9           "name": "content safety check input $model=content_safety",
10           "type": "input",
11           "decisions": [
12             "continue"
13           ],
14           "stop": false
15         },
16         {
17           "name": "topic safety check input $model=topic_control",
18           "type": "input",
19           "decisions": [
20             "continue"
21           ],
22           "stop": false
23         }
24       ]
25     }
26   }
27 }

Cleanup

1 client.inference.virtual_models.delete(name="guarded-parallel-rails")
2 client.guardrail.configs.delete(name=config_name)
3 print("Cleanup complete")