nemoguardrails.integrations.langchain.providers.trtllm.llm

A Langchain LLM component for connecting to Triton + TensorRT LLM backend.

Module Contents

Classes

Name	Description
`TRTLLM`	A custom Langchain LLM class that integrates with TRTLLM triton models.

Data

BAD_WORDS

RANDOM_SEED

STOP_WORDS

API

class nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM()

Bases: BaseLLM

A custom Langchain LLM class that integrates with TRTLLM triton models.

Arguments: server_url: (str) The URL of the Triton inference server to use. model_name: (str) The name of the Triton TRT model to use. temperature: (str) Temperature to use for sampling top_p: (float) The top-p value to use for sampling top_k: (float) The top k values use for sampling beam_width: (int) Last n number of tokens to penalize repetition_penalty: (int) Last n number of tokens to penalize length_penalty: (float) The penalty to apply repeated tokens tokens: (int) The maximum number of tokens to generate. client: The client object used to communicate with the inference server

_get_model_default_parameters

Dict[str, Any]

_identifying_params

Dict[str, Any]

Get all the identifying parameters.

_invocation_params

Dict[str, Any]

_llm_type

str

beam_width

Optional[int] = 1

client

Any = Field(default=None, exclude=True)

length_penalty

Optional[float] = 1.0

model_name

str = 'ensemble'

repetition_penalty

Optional[float] = 1.0

server_url

str = Field(None, alias='server_url')

streaming

Optional[bool] = True

temperature

Optional[float] = 1.0

tokens

Optional[int] = 100

top_k

Optional[int] = 1

top_p

Optional[float] = 0

nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM._acall(
    args = (),
    kwargs = {}
)

async

Async version.

nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM._call(
    prompt: str,
    stop: typing.Optional[typing.List[str]] = None,
    run_manager: typing.Optional[langchain_core.callbacks.manager.CallbackManagerForLLMRun] = None,
    kwargs: typing.Any = {}
) -> str

Execute an inference request.

Parameters:

prompt

str

The prompt to pass into the model.

stop

Optional[List[str]]Defaults to None

A list of strings to stop generation when encountered

Returns: str

The string generated by the model

nemoguardrails.integrations.langchain.providers.trtllm.llm.TRTLLM.validate_environment() -> 'TRTLLM'

Validate that python package exists in environment.

nemoguardrails.integrations.langchain.providers.trtllm.llm.BAD_WORDS = ['']

nemoguardrails.integrations.langchain.providers.trtllm.llm.RANDOM_SEED = 0

nemoguardrails.integrations.langchain.providers.trtllm.llm.STOP_WORDS = ['</s>']