> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# Streaming Configuration

> Configure streaming for LLM token generation and output rail processing in config.yml.

The NeMo Guardrails library supports streaming out of the box when using the `stream_async()` method. No configuration is required to enable basic streaming.

When you have **output rails** configured, you need to explicitly enable streaming for them to process tokens in chunked mode.

## Quick Example

When using streaming with output rails:

```yaml
rails:
  output:
    flows:
      - self check output
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50
```

## Streaming Configuration Details

The following guides provide detailed documentation for streaming configuration.

Enable and use streaming mode for LLM responses in real-time in the NeMo Guardrails library.

How To

Configure how output rails process streamed tokens in chunked mode.

Reference