nv_ingest_api.util.nim package#

Module contents#

nv_ingest_api.util.nim.create_inference_client(
endpoints: Tuple[str, str],
model_interface: ModelInterface,
auth_token: str | None = None,
infer_protocol: str | None = None,
timeout: float = 120.0,
max_retries: int = 10,
**kwargs,
) NimClientManager[source]#

Create a NimClientManager for interfacing with a model inference server.

Parameters:
  • endpoints (tuple) – A tuple containing the gRPC and HTTP endpoints.

  • model_interface (ModelInterface) – The model interface implementation to use.

  • auth_token (str, optional) – Authorization token for HTTP requests (default: None).

  • infer_protocol (str, optional) – The protocol to use (“grpc” or “http”). If not specified, it is inferred from the endpoints.

  • timeout (float, optional) – The timeout for the request in seconds (default: 120.0).

  • max_retries (int, optional) – The maximum number of retries for the request (default: 10).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the NimClientManager.

Returns:

The initialized NimClientManager.

Return type:

NimClientManager

Raises:

ValueError – If an invalid infer_protocol is specified.

nv_ingest_api.util.nim.infer_microservice(
data,
model_name: str | None = None,
embedding_endpoint: str | None = None,
nvidia_api_key: str | None = None,
input_type: str = 'passage',
truncate: str = 'END',
batch_size: int = 8191,
grpc: bool = False,
input_names: list = ['text'],
output_names: list = ['embeddings'],
dtypes: list = ['BYTES'],
)[source]#

This function takes the input data and creates a list of embeddings using the NVIDIA embedding microservice.

Parameters:
  • data (list) – The input data to be embedded.

  • model_name (str) – The name of the model to use.

  • embedding_endpoint (str) – The endpoint of the embedding microservice.

  • nvidia_api_key (str) – The API key for the NVIDIA embedding microservice.

  • input_type (str) – The type of input to be embedded.

  • truncate (str) – The truncation of the input data.

  • batch_size (int) – The batch size of the input data.

  • grpc (bool) – Whether to use gRPC or HTTP.

  • input_names (list) – The names of the input data.

  • output_names (list) – The names of the output data.

  • dtypes (list) – The data types of the input data.

Returns:

The list of embeddings.

Return type:

list