nemoguardrails.integrations.langchain.providers.trtllm.client
Module Contents
Classes
Data
API
An abstraction of the connection to a triton inference server.
client
Close the streaming connection.
staticmethod
Create the input for the triton inference server.
staticmethod
Generate the expected output structure.
Get the modle concurrency.
Get a list of models loaded in the triton server.
Load a model into the server.
staticmethod
Prepare an input data structure.
staticmethod
Post-process the result from the server.
Request a streaming connection.
Send the prompt and start streaming the result.
Add streamed result to queue.