> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# Overview of the NeMo Guardrails Python APIs

> RailsConfig and LLMRails core classes for generating guarded responses.

The NeMo Guardrails library Python API provides two core classes for running guardrails:

* **`RailsConfig`**: Loads and manages guardrails configuration from files or content.
* **`LLMRails`**: The main interface for generating responses with guardrails applied.

Upon initializing the core classes (`RailsConfig` and `LLMRails`), the library loads the configuration files you created in the previous chapter [About Configuring Guardrails](/configure-guardrails/configure-rails).

## Quick Start

The following steps show how to run a sample guardrailed chat request using the NeMo Guardrails library Python API.

### Prerequisites

Meet the following prerequisites to use the NeMo Guardrails library Python API.

1. If you haven't already, install the NeMo Guardrails library with the `nvidia` extra, following the instructions in [Installation](/get-started/installation-guide).

2. Set up an environment variable for your NVIDIA API key.

   ```console
   export NVIDIA_API_KEY="your-nvidia-api-key"
   ```

   This is required to access NVIDIA-hosted models on [build.nvidia.com](https://build.nvidia.com). The provided example configurations ([examples/configs](https://github.com/NVIDIA-NeMo/Guardrails/tree/develop/examples/configs)) and code examples throughout the documentation use NVIDIA-hosted models.

### Run a Sample Guardrailed Chat Request

The following example shows the minimal code to load the prepared configuration files in the `config` directory and generate a response using the `LLMRails` class.

```python
from nemoguardrails import LLMRails, RailsConfig

# Load configuration from the config directory
config = RailsConfig.from_path("examples/configs")

# Create the LLMRails instance
rails = LLMRails(config)

# Generate a response
response = rails.generate(messages=[
    {
        "role": "user",
        "content": "What is the capital of France?",
        "config_id": "content_safety"
    }
])
print(response["content"])
```

## Lifetime and Performance

Construct `LLMRails` once per process and reuse it across requests. Initialization validates configuration, compiles prompt templates, and loads the embedding model (FastEmbed). On a typical developer machine, this work takes on the order of several hundred milliseconds.

After construction, individual `generate()` / `generate_async()` calls do not repeat that startup work, so reusing a single `LLMRails` instance is significantly faster than building a new one per request.

For serverless or FaaS handlers, this initialization cost is paid once per cold start, not per request. Cache the `LLMRails` instance in module scope (or a singleton) so warm invocations skip the setup:

```python
from nemoguardrails import LLMRails, RailsConfig

_rails = LLMRails(RailsConfig.from_path("./config"))

def handler(event, context):
    return _rails.generate(messages=event["messages"])
```

## When to Use Each API

| API                                             | Use Case                                        |
| ----------------------------------------------- | ----------------------------------------------- |
| `generate()` / `generate_async()`               | Standard chat interactions with messages        |
| `stream_async()`                                | Real-time token streaming                       |
| `generate_events()` / `generate_events_async()` | Low-level event control for custom integrations |

## Synchronous vs Asynchronous

The NeMo Guardrails library provides both synchronous and asynchronous methods:

| Synchronous         | Asynchronous              | Description                        |
| ------------------- | ------------------------- | ---------------------------------- |
| `generate()`        | `generate_async()`        | Generate responses from messages   |
| `generate_events()` | `generate_events_async()` | Generate events from event history |
| -                   | `stream_async()`          | Stream tokens asynchronously       |

Use asynchronous methods (`generate_async`, `stream_async`) in async contexts for better performance. The synchronous `generate()` method cannot be called from within an async context.