nv_ingest_api.util.nim package#

Module contents#

nv_ingest_api.util.nim.create_inference_client(

endpoints: Tuple[str, str],

model_interface: ModelInterface,

auth_token: str | None = None,

infer_protocol: str | None = None,

timeout: float = 120.0,

max_retries: int = 10,

**kwargs,

) → NimClientManager[source]#

Create a NimClientManager for interfacing with a model inference server.

Parameters:

endpoints (tuple) – A tuple containing the gRPC and HTTP endpoints.
model_interface (ModelInterface) – The model interface implementation to use.
auth_token (str, optional) – Authorization token for HTTP requests (default: None).
infer_protocol (str, optional) – The protocol to use (“grpc” or “http”). If not specified, it is inferred from the endpoints.
timeout (float, optional) – The timeout for the request in seconds (default: 120.0).
max_retries (int, optional) – The maximum number of retries for the request (default: 10).
**kwargs (dict, optional) – Additional keyword arguments to pass to the NimClientManager.

Returns:

The initialized NimClientManager.

Return type:

NimClientManager

Raises:

ValueError – If an invalid infer_protocol is specified.

nv_ingest_api.util.nim.infer_microservice( data, model_name: str | None = None, embedding_endpoint: str | None = None, nvidia_api_key: str | None = None, input_type: str = 'passage', truncate: str = 'END', batch_size: int = 8191, grpc: bool = False, input_names: list = ['text'], output_names: list = ['embeddings'], dtypes: list = ['BYTES'], )[source]#

This function takes the input data and creates a list of embeddings using the NVIDIA embedding microservice.

Parameters:

data (list) – The input data to be embedded.
model_name (str) – The name of the model to use.
embedding_endpoint (str) – The endpoint of the embedding microservice.
nvidia_api_key (str) – The API key for the NVIDIA embedding microservice.
input_type (str) – The type of input to be embedded.
truncate (str) – The truncation of the input data.
batch_size (int) – The batch size of the input data.
grpc (bool) – Whether to use gRPC or HTTP.
input_names (list) – The names of the input data.
output_names (list) – The names of the output data.
dtypes (list) – The data types of the input data.

Returns:

The list of embeddings.

Return type:

list