Client APIs#
The following reference provides detailed documentation for the synchronous and asynchronous clients of the NeMo Microservices Python SDK.
Synchronous Client#
- class nemo_microservices.NeMoMicroservices#
- __init__(
- *,
- base_url: str | httpx.URL | None = None,
- inference_base_url: str | httpx.URL | None = None,
- timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
- max_retries: int = 2,
- Constructs a new synchronous - NeMoMicroservicesclient instance. The following code snippet shows how to create a client instance.- from nemo_microservices import NeMoMicroservices client = NeMoMicroservices( base_url="http://nemo.test", inference_base_url="http://nim.test" ) - Parameters:
- base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the - NEMO_MICROSERVICES_BASE_URLenvironment variable is defined; and if it is not set, the client sets the value to- http://nemo.test/.
- inference_base_url (Optional[str]) – - Sets the base URL of a microservice for inference. You can specify one of the following API endpoints: - The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices. 
- Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option. 
- The endpoints from build.nvidia.com. 
 
- timeout (Optional[float | Timeout]) – - Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class ( - SyncAPIClient/- AsyncAPIClient) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.- Accepted values: - A float (seconds) 
- A - Timeoutobject (imported from- httpx)
- None (no timeout) 
- NotGiven (use default) 
 
- max_retries (Optional[int]) – - Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times. - Usage Examples: - # Custom retry count client = NeMoMicroservices(max_retries=5) # 5 retries # No retries client = NeMoMicroservices(max_retries=0) # Override for specific requests client.with_options(max_retries=3).chat.completions.create(...) 
 
 
 - property chat: ChatResource#
 - property completions: CompletionsResource#
 - property models: ModelsResource#
 - property customization: CustomizationResource#
 - property evaluation: EvaluationResource#
 - property datasets: DatasetsResource#
 - property embeddings: EmbeddingsResource#
 - property namespaces: NamespacesResource#
 - property projects: ProjectsResource#
 - property deployment: DeploymentResource#
 - property guardrail: GuardrailResource#
 - property inference: InferenceResource#
 
Asynchronous Client#
- class nemo_microservices.AsyncNeMoMicroservices#
- __init__(
- *,
- base_url: str | httpx.URL | None = None,
- inference_base_url: str | httpx.URL | None = None,
- timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
- max_retries: int = 2,
- Constructs a new asynchronous - NeMoMicroservicesclient instance. The following code snippet shows how to create a client instance.- import asyncio from nemo_microservices import AsyncNeMoMicroservices client = AsyncNeMoMicroservices( base_url="http://nemo.test", inference_base_url="http://nim.test" ) # Sample API call async def main() -> None: page = await client.namespaces.list() print(page.data) asyncio.run(main()) - Parameters:
- base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the - NEMO_MICROSERVICES_BASE_URLenvironment variable is defined; and if it is not set, the client sets the value to- http://nemo.test/.
- inference_base_url (Optional[str]) – - Sets the base URL of a microservice for inference. You can specify one of the following API endpoints: - The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices. 
- Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option. 
- The endpoints from build.nvidia.com. 
 
- timeout (Optional[float | Timeout]) – - Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class ( - SyncAPIClient/- AsyncAPIClient) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.- Accepted values: - A float (seconds) 
- A - Timeoutobject (imported from- httpx)
- None (no timeout) 
- NotGiven (use default) 
 
- max_retries (Optional[int]) – - Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times. - Usage Examples: - # Custom retry count client = AsyncNeMoMicroservices(max_retries=5) # 5 retries # No retries client = AsyncNeMoMicroservices(max_retries=0) # Override for specific requests client.with_options(max_retries=3).chat.completions.create(...) 
 
 
 - property chat: AsyncChatResource#
 - property completions: AsyncCompletionsResource#
 - property models: AsyncModelsResource#
 - property customization: AsyncCustomizationResource#
 - property evaluation: AsyncEvaluationResource#
 - property datasets: AsyncDatasetsResource#
 - property embeddings: AsyncEmbeddingsResource#
 - property namespaces: AsyncNamespacesResource#
 - property projects: AsyncProjectsResource#
 - property deployment: AsyncDeploymentResource#
 - property guardrail: AsyncGuardrailResource#
 - property inference: AsyncInferenceResource#
 
Client Attributes#
The NeMo microservices clients provide access to various API resources through the following attributes.
Chat Resources#
- chat: Access to chat completion functionality 
- completions: Access to text completion functionality 
Model Management#
- models: Manage models and model configurations 
- customization: Handle model customization and fine-tuning 
- evaluation: Evaluate model performance 
Data Management#
- datasets: Manage datasets and data sources 
- embeddings: Generate and manage embeddings 
- namespaces: Organize resources in namespaces 
- projects: Manage projects and project configurations 
Deployment & Operations#
- deployment: Manage model deployments 
- guardrail: Configure and manage guardrails 
- inference: Direct inference operations 
Response Handling#
- with_raw_response: Access raw HTTP response data 
- with_streaming_response: Handle streaming responses