nemoguardrails.integrations.langchain.providers.trtllm.llm
A Langchain LLM component for connecting to Triton + TensorRT LLM backend.
Module Contents
Classes
Data
API
Bases: BaseLLM
A custom Langchain LLM class that integrates with TRTLLM triton models.
Arguments: server_url: (str) The URL of the Triton inference server to use. model_name: (str) The name of the Triton TRT model to use. temperature: (str) Temperature to use for sampling top_p: (float) The top-p value to use for sampling top_k: (float) The top k values use for sampling beam_width: (int) Last n number of tokens to penalize repetition_penalty: (int) Last n number of tokens to penalize length_penalty: (float) The penalty to apply repeated tokens tokens: (int) The maximum number of tokens to generate. client: The client object used to communicate with the inference server
Get all the identifying parameters.
Async version.
Execute an inference request.
Parameters:
The prompt to pass into the model.
A list of strings to stop generation when encountered
Returns: str
The string generated by the model
Validate that python package exists in environment.