nv_ingest_api.util.nim package#
Module contents#
- nv_ingest_api.util.nim.create_inference_client(
- endpoints: Tuple[str, str],
- model_interface: ModelInterface,
- auth_token: str | None = None,
- infer_protocol: str | None = None,
- timeout: float = 120.0,
- max_retries: int = 10,
- **kwargs,
Create a NimClientManager for interfacing with a model inference server.
- Parameters:
endpoints (tuple) – A tuple containing the gRPC and HTTP endpoints.
model_interface (ModelInterface) – The model interface implementation to use.
auth_token (str, optional) – Authorization token for HTTP requests (default: None).
infer_protocol (str, optional) – The protocol to use (“grpc” or “http”). If not specified, it is inferred from the endpoints.
timeout (float, optional) – The timeout for the request in seconds (default: 120.0).
max_retries (int, optional) – The maximum number of retries for the request (default: 10).
**kwargs (dict, optional) – Additional keyword arguments to pass to the NimClientManager.
- Returns:
The initialized NimClientManager.
- Return type:
- Raises:
ValueError – If an invalid infer_protocol is specified.
- nv_ingest_api.util.nim.infer_microservice(
- data,
- model_name: str | None = None,
- embedding_endpoint: str | None = None,
- nvidia_api_key: str | None = None,
- input_type: str = 'passage',
- truncate: str = 'END',
- batch_size: int = 8191,
- grpc: bool = False,
- input_names: list = ['text'],
- output_names: list = ['embeddings'],
- dtypes: list = ['BYTES'],
This function takes the input data and creates a list of embeddings using the NVIDIA embedding microservice.
- Parameters:
data (list) – The input data to be embedded.
model_name (str) – The name of the model to use.
embedding_endpoint (str) – The endpoint of the embedding microservice.
nvidia_api_key (str) – The API key for the NVIDIA embedding microservice.
input_type (str) – The type of input to be embedded.
truncate (str) – The truncation of the input data.
batch_size (int) – The batch size of the input data.
grpc (bool) – Whether to use gRPC or HTTP.
input_names (list) – The names of the input data.
output_names (list) – The names of the output data.
dtypes (list) – The data types of the input data.
- Returns:
The list of embeddings.
- Return type:
list