> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.rails.llm.buffer

## Module Contents

### Classes

| Name                                                                | Description                                                          |
| ------------------------------------------------------------------- | -------------------------------------------------------------------- |
| [`BufferStrategy`](#nemoguardrails-rails-llm-buffer-BufferStrategy) | Abstract base class for buffer strategies in streaming output rails. |
| [`ChunkBatch`](#nemoguardrails-rails-llm-buffer-ChunkBatch)         | Represents a batch of processed chunks from a buffer strategy.       |
| [`RollingBuffer`](#nemoguardrails-rails-llm-buffer-RollingBuffer)   | A rolling buffer strategy for streaming output rails processing.     |

### Functions

| Name                                                                          | Description                                            |
| ----------------------------------------------------------------------------- | ------------------------------------------------------ |
| [`get_buffer_strategy`](#nemoguardrails-rails-llm-buffer-get_buffer_strategy) | Create a buffer strategy from the given configuration. |

### Data

[`__all__`](#nemoguardrails-rails-llm-buffer-__all__)

### API

```python
class nemoguardrails.rails.llm.buffer.BufferStrategy()
```

Abstract

Abstract base class for buffer strategies in streaming output rails.

This class defines the interface for buffer strategies that manage how
streaming chunks are buffered and processed for output rails.
Concrete implementations should handle the accumulation and yielding of
chunks in a way that optimizes output rails processing while maintaining
streaming performance.

The interface separates concerns:

* Buffer management logic (process\_stream)
* Chunk representation formatting (format\_chunks)

```python
nemoguardrails.rails.llm.buffer.BufferStrategy.__call__(
    streaming_handler
) -> typing.AsyncGenerator[nemoguardrails.rails.llm.buffer.ChunkBatch, None]
```

async

Callable interface that delegates to process\_stream.

It delegates to the `process_stream` method and can
be extended to add common functionality like validation, logging,
or error handling.

**Parameters:**

An async iterator that yields individual string
chunks from the LLM stream.

```python
nemoguardrails.rails.llm.buffer.BufferStrategy.format_chunks(
    chunks: typing.List[str]
) -> str
```

abstract

Format chunks into a string representation for user consumption.

This method defines how chunks should be formatted into a string
representation. Different strategies might join chunks differently
(e.g., preserving spaces, adding separators, etc.).

**Parameters:**

List of chunk tokens to be formatted.

**Returns:** `str`

String representation of the chunks ready for consumers.

```python
nemoguardrails.rails.llm.buffer.BufferStrategy.from_config(
    config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig
) -> nemoguardrails.rails.llm.buffer.BufferStrategy
```

classmethod

abstract

Create a buffer strategy instance from configuration.

**Parameters:**

Configuration object containing
buffer strategy parameters.

**Returns:** `BufferStrategy`

A configured buffer strategy instance.

```python
nemoguardrails.rails.llm.buffer.BufferStrategy.process_stream(
    streaming_handler
) -> typing.AsyncGenerator[nemoguardrails.rails.llm.buffer.ChunkBatch, None]
```

async

abstract

Process streaming chunks and yield chunk batches.

This is the main method that concrete buffer strategies must implement.
It defines how chunks from the streaming handler should be buffered,
processed, and yielded as ChunkBatch objects.

**Parameters:**

An async iterator that yields individual string
chunks from the LLM stream.

```python
class nemoguardrails.rails.llm.buffer.ChunkBatch()
```

**Bases:** `NamedTuple`

Represents a batch of processed chunks from a buffer strategy.

This class contains the raw chunk data from buffer processing. For string
representation of chunks, use the buffer strategy's format\_chunks() method.

```python
class nemoguardrails.rails.llm.buffer.RollingBuffer(
    buffer_context_size: int = 5,
    buffer_chunk_size: int = 10
)
```

**Bases:** [BufferStrategy](#nemoguardrails-rails-llm-buffer-BufferStrategy)

A rolling buffer strategy for streaming output rails processing.

This strategy accumulates incoming chunks in a buffer and yields them in
batches when the buffer reaches the specified chunk size. It maintains
context from previous chunks to ensure continuity in processing output rails.

The buffer operates by:

1. Accumulating incoming chunks until reaching the chunk size threshold
2. Yielding a processing buffer (with context) and new chunks to process
3. Retaining context tokens for the next processing round
4. Yielding any remaining chunks at the end of the stream

**Parameters:**

Number of tokens carried over from
previous chunks to provide context for continuity. Defaults to 5.

Number of tokens in each processing
chunk. This determines the size of token blocks on which output
rails are applied. Defaults to 10.

```python
nemoguardrails.rails.llm.buffer.RollingBuffer.format_chunks(
    chunks: typing.List[str]
) -> str
```

Generate string representation of chunks preserving original token format.

The RollingBuffer strategy preserves the original token format by
joining chunks without modification, maintaining spaces and formatting
as they appeared in the original LLM output.

**Parameters:**

List of chunk tokens to be formatted.

**Returns:** `str`

String representation preserving original token spacing and format.

```python
nemoguardrails.rails.llm.buffer.RollingBuffer.from_config(
    config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig
)
```

classmethod

Create a RollingBuffer instance from a streaming configuration.

**Parameters:**

Configuration object containing
context\_size and chunk\_size parameters.

**Returns:**

A new RollingBuffer instance configured with the
provided parameters.

```python
nemoguardrails.rails.llm.buffer.RollingBuffer.process_stream(
    streaming_handler
) -> typing.AsyncGenerator[nemoguardrails.rails.llm.buffer.ChunkBatch, None]
```

async

Process streaming chunks using rolling buffer strategy.

This method implements the rolling buffer logic, accumulating chunks
and yielding them in batches with context for output rails processing.
The buffer maintains a sliding window of context tokens for continuity.

**Parameters:**

An async iterator that yields individual string
chunks from the LLM stream.

```python
nemoguardrails.rails.llm.buffer.get_buffer_strategy(
    config: nemoguardrails.rails.llm.config.OutputRailsStreamingConfig
) -> nemoguardrails.rails.llm.buffer.BufferStrategy
```

Create a buffer strategy from the given configuration.

**Parameters:**

Configuration object specifying
the buffer strategy parameters.

**Returns:** `BufferStrategy`

A configured buffer strategy instance. Currently
returns a RollingBuffer instance.

```python
nemoguardrails.rails.llm.buffer.__all__ = ['ChunkBatch', 'BufferStrategy', 'RollingBuffer', 'get_buffer_strategy']
```