> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.integrations.langchain.providers.trtllm.llm

A Langchain LLM component for connecting to Triton + TensorRT LLM backend.

## Module Contents

### Classes

| Name                                                                           | Description                                                             |
| ------------------------------------------------------------------------------ | ----------------------------------------------------------------------- |
| [`TRTLLM`](#nemoguardrails-integrations-langchain-providers-trtllm-llm-TRTLLM) | A custom Langchain LLM class that integrates with TRTLLM triton models. |

### Data

[`BAD_WORDS`](#nemoguardrails-integrations-langchain-providers-trtllm-llm-BAD_WORDS)

[`RANDOM_SEED`](#nemoguardrails-integrations-langchain-providers-trtllm-llm-RANDOM_SEED)

[`STOP_WORDS`](#nemoguardrails-integrations-langchain-providers-trtllm-llm-STOP_WORDS)

### API

```python
class nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM()
```

**Bases:** `BaseLLM`

A custom Langchain LLM class that integrates with TRTLLM triton models.

Arguments:
server\_url: (str) The URL of the Triton inference server to use.
model\_name: (str) The name of the Triton TRT model to use.
temperature: (str) Temperature to use for sampling
top\_p: (float) The top-p value to use for sampling
top\_k: (float) The top k values use for sampling
beam\_width: (int) Last n number of tokens to penalize
repetition\_penalty: (int) Last n number of tokens to penalize
length\_penalty: (float) The penalty to apply repeated tokens
tokens: (int) The maximum number of tokens to generate.
client: The client object used to communicate with the inference server

Get all the identifying parameters.

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM._acall(
    args = (),
    kwargs = {}
)
```

async

Async version.

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM._call(
    prompt: str,
    stop: typing.Optional[typing.List[str]] = None,
    run_manager: typing.Optional[langchain_core.callbacks.manager.CallbackManagerForLLMRun] = None,
    kwargs: typing.Any = {}
) -> str
```

Execute an inference request.

**Parameters:**

The prompt to pass into the model.

A list of strings to stop generation when encountered

**Returns:** `str`

The string generated by the model

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM.validate_environment(
    values: typing.Dict[str, typing.Any]
) -> typing.Dict[str, typing.Any]
```

classmethod

Validate that python package exists in environment.

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.BAD_WORDS = ['']
```

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.RANDOM_SEED = 0
```

```python
nemoguardrails.integrations.langchain.providers.trtllm.llm.STOP_WORDS = ['</s>']
```