nemoguardrails.integrations.langchain.providers.trtllm.client

View as Markdown

Module Contents

Classes

NameDescription
TritonClientAn abstraction of the connection to a triton inference server.

Data

BAD_WORDS

RANDOM_SEED

STOP_WORDS

API

class nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient(
server_url: str
)

An abstraction of the connection to a triton inference server.

client
= grpcclient.InferenceServerClient(server_url)
nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.close_streaming() -> None

Close the streaming connection.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.generate_inputs(
prompt: str,
tokens: int = 32,
temperature: float = 0.5,
top_k: float = 0,
top_p: float = 0.9,
beam_width: int = 1,
repetition_penalty: float = 1,
length_penalty: float = 1.0
) -> typing.List['grpcclient.InferInput']
staticmethod

Create the input for the triton inference server.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.generate_outputs() -> typing.List['grpcclient.InferRequestedOutput']
staticmethod

Generate the expected output structure.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.get_model_concurrency(
model_name: str,
timeout: int = 1000
) -> int

Get the modle concurrency.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.get_model_list() -> typing.List[str]

Get a list of models loaded in the triton server.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.load_model(
model_name: str,
timeout: int = 1000
) -> None

Load a model into the server.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.prepare_tensor(
name: str,
input_data: typing.Any
) -> 'grpcclient.InferInput'
staticmethod

Prepare an input data structure.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.process_result(
result: typing.Dict[str, str]
) -> typing.Dict[str, str]
staticmethod

Post-process the result from the server.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.request_streaming(
model_name: str,
result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]],
params: typing.Any = {}
) -> None

Request a streaming connection.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.send_prompt_streaming(
model_name: str,
request_inputs: typing.Any,
request_outputs: typing.Optional[typing.Any],
result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]]
) -> None

Send the prompt and start streaming the result.

nemoguardrails.integrations.langchain.providers.trtllm.client.TritonClient.stream_callback(
result_queue: queue.Queue[typing.Union[typing.Optional[typing.Dict[str, str]], str]],
result: typing.Any,
error: str
) -> None

Add streamed result to queue.

nemoguardrails.integrations.langchain.providers.trtllm.client.BAD_WORDS = ['']
nemoguardrails.integrations.langchain.providers.trtllm.client.RANDOM_SEED = 0
nemoguardrails.integrations.langchain.providers.trtllm.client.STOP_WORDS = ['</s>']