Overview of the NeMo Guardrails Python APIs

View as Markdown

The NeMo Guardrails library Python API provides two core classes for running guardrails:

  • RailsConfig: Loads and manages guardrails configuration from files or content.
  • LLMRails: The main interface for generating responses with guardrails applied.

Upon initializing the core classes (RailsConfig and LLMRails), the library loads the configuration files you created in the previous chapter About Configuring Guardrails.

Quick Start

The following steps show how to run a sample guardrailed chat request using the NeMo Guardrails library Python API.

Prerequisites

Meet the following prerequisites to use the NeMo Guardrails library Python API.

  1. If you haven’t already, install the NeMo Guardrails library with the nvidia extra, following the instructions in Installation.

  2. Set up an environment variable for your NVIDIA API key.

    1export NVIDIA_API_KEY="your-nvidia-api-key"

    This is required to access NVIDIA-hosted models on build.nvidia.com. The provided example configurations (examples/configs) and code examples throughout the documentation use NVIDIA-hosted models.

Run a Sample Guardrailed Chat Request

The following example shows the minimal code to load the prepared configuration files in the config directory and generate a response using the LLMRails class.

1from nemoguardrails import LLMRails, RailsConfig
2
3# Load configuration from the config directory
4config = RailsConfig.from_path("examples/configs")
5
6# Create the LLMRails instance
7rails = LLMRails(config)
8
9# Generate a response
10response = rails.generate(messages=[
11 {
12 "role": "user",
13 "content": "What is the capital of France?",
14 "config_id": "content_safety"
15 }
16])
17print(response["content"])

Lifetime and Performance

Construct LLMRails once per process and reuse it across requests. Initialization validates configuration, compiles prompt templates, and loads the embedding model (FastEmbed). On a typical developer machine, this work takes on the order of several hundred milliseconds.

After construction, individual generate() / generate_async() calls do not repeat that startup work, so reusing a single LLMRails instance is significantly faster than building a new one per request.

For serverless or FaaS handlers, this initialization cost is paid once per cold start, not per request. Cache the LLMRails instance in module scope (or a singleton) so warm invocations skip the setup:

1from nemoguardrails import LLMRails, RailsConfig
2
3_rails = LLMRails(RailsConfig.from_path("./config"))
4
5def handler(event, context):
6 return _rails.generate(messages=event["messages"])

When to Use Each API

APIUse Case
generate() / generate_async()Standard chat interactions with messages
stream_async()Real-time token streaming
generate_events() / generate_events_async()Low-level event control for custom integrations

Synchronous vs Asynchronous

The NeMo Guardrails library provides both synchronous and asynchronous methods:

SynchronousAsynchronousDescription
generate()generate_async()Generate responses from messages
generate_events()generate_events_async()Generate events from event history
-stream_async()Stream tokens asynchronously

Use asynchronous methods (generate_async, stream_async) in async contexts for better performance. The synchronous generate() method cannot be called from within an async context.