Client APIs#
The following reference provides detailed documentation for the synchronous and asynchronous clients of the NeMo Microservices Python SDK.
Synchronous Client#
- class nemo_microservices.NeMoMicroservices#
- __init__(
- *,
- base_url: str | httpx.URL | None = None,
- inference_base_url: str | httpx.URL | None = None,
- timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
- max_retries: int = 2,
Constructs a new synchronous
NeMoMicroservices
client instance. The following code snippet shows how to create a client instance.from nemo_microservices import NeMoMicroservices client = NeMoMicroservices( base_url="http://nemo.test", inference_base_url="http://nim.test" )
- Parameters:
base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the
NEMO_MICROSERVICES_BASE_URL
environment variable is defined; and if it is not set, the client sets the value tohttp://nemo.test/
.inference_base_url (Optional[str]) –
Sets the base URL of a microservice for inference. You can specify one of the following API endpoints:
The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices.
Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option.
The endpoints from build.nvidia.com.
timeout (Optional[float | Timeout]) –
Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class (
SyncAPIClient
/AsyncAPIClient
) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.Accepted values:
A float (seconds)
A
Timeout
object (imported fromhttpx
)None (no timeout)
NotGiven (use default)
max_retries (Optional[int]) –
Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times.
Usage Examples:
# Custom retry count client = NeMoMicroservices(max_retries=5) # 5 retries # No retries client = NeMoMicroservices(max_retries=0) # Override for specific requests client.with_options(max_retries=3).chat.completions.create(...)
- property chat: ChatResource#
- property completions: CompletionsResource#
- property models: ModelsResource#
- property customization: CustomizationResource#
- property evaluation: EvaluationResource#
- property datasets: DatasetsResource#
- property embeddings: EmbeddingsResource#
- property namespaces: NamespacesResource#
- property projects: ProjectsResource#
- property deployment: DeploymentResource#
- property guardrail: GuardrailResource#
- property inference: InferenceResource#
Asynchronous Client#
- class nemo_microservices.AsyncNeMoMicroservices#
- __init__(
- *,
- base_url: str | httpx.URL | None = None,
- inference_base_url: str | httpx.URL | None = None,
- timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
- max_retries: int = 2,
Constructs a new asynchronous
NeMoMicroservices
client instance. The following code snippet shows how to create a client instance.import asyncio from nemo_microservices import AsyncNeMoMicroservices client = AsyncNeMoMicroservices( base_url="http://nemo.test", inference_base_url="http://nim.test" ) # Sample API call async def main() -> None: page = await client.namespaces.list() print(page.data) asyncio.run(main())
- Parameters:
base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the
NEMO_MICROSERVICES_BASE_URL
environment variable is defined; and if it is not set, the client sets the value tohttp://nemo.test/
.inference_base_url (Optional[str]) –
Sets the base URL of a microservice for inference. You can specify one of the following API endpoints:
The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices.
Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option.
The endpoints from build.nvidia.com.
timeout (Optional[float | Timeout]) –
Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class (
SyncAPIClient
/AsyncAPIClient
) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.Accepted values:
A float (seconds)
A
Timeout
object (imported fromhttpx
)None (no timeout)
NotGiven (use default)
max_retries (Optional[int]) –
Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times.
Usage Examples:
# Custom retry count client = AsyncNeMoMicroservices(max_retries=5) # 5 retries # No retries client = AsyncNeMoMicroservices(max_retries=0) # Override for specific requests client.with_options(max_retries=3).chat.completions.create(...)
- property chat: AsyncChatResource#
- property completions: AsyncCompletionsResource#
- property models: AsyncModelsResource#
- property customization: AsyncCustomizationResource#
- property evaluation: AsyncEvaluationResource#
- property datasets: AsyncDatasetsResource#
- property embeddings: AsyncEmbeddingsResource#
- property namespaces: AsyncNamespacesResource#
- property projects: AsyncProjectsResource#
- property deployment: AsyncDeploymentResource#
- property guardrail: AsyncGuardrailResource#
- property inference: AsyncInferenceResource#
Client Attributes#
The NeMo microservices clients provide access to various API resources through the following attributes.
Chat Resources#
chat: Access to chat completion functionality
completions: Access to text completion functionality
Model Management#
models: Manage models and model configurations
customization: Handle model customization and fine-tuning
evaluation: Evaluate model performance
Data Management#
datasets: Manage datasets and data sources
embeddings: Generate and manage embeddings
namespaces: Organize resources in namespaces
projects: Manage projects and project configurations
Deployment & Operations#
deployment: Manage model deployments
guardrail: Configure and manage guardrails
inference: Direct inference operations
Response Handling#
with_raw_response: Access raw HTTP response data
with_streaming_response: Handle streaming responses