> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.nvidia.com/nemo/guardrails/llms.txt.
> For full documentation content, see https://docs.nvidia.com/nemo/guardrails/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.nvidia.com/nemo/guardrails/_mcp/server.

# nemoguardrails.integrations.langchain.providers.trtllm.client

## Module Contents

### Classes

| Name                                                                                          | Description                                                    |
| --------------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| [`TritonClient`](#nemoguardrails-integrations-langchain-providers-trtllm-client-TritonClient) | An abstraction of the connection to a triton inference server. |

### Data

[`BAD_WORDS`](#nemoguardrails-integrations-langchain-providers-trtllm-client-BAD_WORDS)

[`RANDOM_SEED`](#nemoguardrails-integrations-langchain-providers-trtllm-client-RANDOM_SEED)

[`STOP_WORDS`](#nemoguardrails-integrations-langchain-providers-trtllm-client-STOP_WORDS)

### API

```python
class nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient(
    server_url: str
)
```

An abstraction of the connection to a triton inference server.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.close_streaming() -> None
```

Close the streaming connection.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.generate_inputs(
    prompt: str,
    tokens: int = 32,
    temperature: float = 0.5,
    top_k: float = 0,
    top_p: float = 0.9,
    beam_width: int = 1,
    repetition_penalty: float = 1,
    length_penalty: float = 1.0
) -> typing.List['grpcclient.InferInput']
```

staticmethod

Create the input for the triton inference server.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.generate_outputs() -> typing.List['grpcclient.InferRequestedOutput']
```

staticmethod

Generate the expected output structure.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.get_model_concurrency(
    model_name: str,
    timeout: int = 1000
) -> int
```

Get the modle concurrency.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.get_model_list() -> typing.List[str]
```

Get a list of models loaded in the triton server.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.load_model(
    model_name: str,
    timeout: int = 1000
) -> None
```

Load a model into the server.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.prepare_tensor(
    name: str,
    input_data: typing.Any
) -> 'grpcclient.InferInput'
```

staticmethod

Prepare an input data structure.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.process_result(
    result: typing.Dict[str, str]
) -> typing.Dict[str, str]
```

staticmethod

Post-process the result from the server.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.request_streaming(
    model_name: str,
    result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]],
    params: typing.Any = {}
) -> None
```

Request a streaming connection.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.send_prompt_streaming(
    model_name: str,
    request_inputs: typing.Any,
    request_outputs: typing.Optional[typing.Any],
    result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]]
) -> None
```

Send the prompt and start streaming the result.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.stream_callback(
    result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]],
    result: typing.Any,
    error: str
) -> None
```

Add streamed result to queue.

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.BAD_WORDS = ['']
```

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.RANDOM_SEED = 0
```

```python
nemoguardrails.integrations.langchain.providers.trtllm.client.STOP_WORDS = ['</s>']
```