Client APIs#

The following reference provides detailed documentation for the synchronous and asynchronous clients of the NeMo Microservices Python SDK.

Synchronous Client#

class nemo_microservices.NeMoMicroservices#
__init__(
*,
base_url: str | httpx.URL | None = None,
inference_base_url: str | httpx.URL | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = 2,
)#

Constructs a new synchronous NeMoMicroservices client instance. The following code snippet shows how to create a client instance.

from nemo_microservices import NeMoMicroservices
client = NeMoMicroservices(
    base_url="http://nemo.test",
    inference_base_url="http://nim.test"
)
Parameters:
  • base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the NEMO_MICROSERVICES_BASE_URL environment variable is defined; and if it is not set, the client sets the value to http://nemo.test/.

  • inference_base_url (Optional[str]) –

    Sets the base URL of a microservice for inference. You can specify one of the following API endpoints:

    • The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices.

    • Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option.

    • The endpoints from build.nvidia.com.

  • timeout (Optional[float | Timeout]) –

    Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class (SyncAPIClient/AsyncAPIClient) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.

    Accepted values:

    • A float (seconds)

    • A Timeout object (imported from httpx)

    • None (no timeout)

    • NotGiven (use default)

  • max_retries (Optional[int]) –

    Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times.

    Usage Examples:

    # Custom retry count
    client = NeMoMicroservices(max_retries=5)  # 5 retries
    
    # No retries
    client = NeMoMicroservices(max_retries=0)
    
    # Override for specific requests
    client.with_options(max_retries=3).chat.completions.create(...)
    

property chat: ChatResource#
property completions: CompletionsResource#
property models: ModelsResource#
property customization: CustomizationResource#
property evaluation: EvaluationResource#
property datasets: DatasetsResource#
property embeddings: EmbeddingsResource#
property namespaces: NamespacesResource#
property projects: ProjectsResource#
property deployment: DeploymentResource#
property guardrail: GuardrailResource#
property inference: InferenceResource#

Asynchronous Client#

class nemo_microservices.AsyncNeMoMicroservices#
__init__(
*,
base_url: str | httpx.URL | None = None,
inference_base_url: str | httpx.URL | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN,
max_retries: int = 2,
)#

Constructs a new asynchronous NeMoMicroservices client instance. The following code snippet shows how to create a client instance.

import asyncio
from nemo_microservices import AsyncNeMoMicroservices

client = AsyncNeMoMicroservices(
    base_url="http://nemo.test",
    inference_base_url="http://nim.test"
)

# Sample API call
async def main() -> None:
    page = await client.namespaces.list()
    print(page.data)

asyncio.run(main())
Parameters:
  • base_url (Optional[str]) – Sets the base URL of the NeMo microservices API endpoints. This must be configured by the cluster administrator in your organization, following the instructions in the ingress setup guide. By default, the client checks if the NEMO_MICROSERVICES_BASE_URL environment variable is defined; and if it is not set, the client sets the value to http://nemo.test/.

  • inference_base_url (Optional[str]) –

    Sets the base URL of a microservice for inference. You can specify one of the following API endpoints:

    • The NeMo NIM Proxy microservice endpoint. This is the recommended endpoint because this microservice serves as a proxy for multiple NIM microservices.

    • Individual NIM microservice endpoints you deployed to your Kubernetes cluster. If you want to use only one specific NIM microservice, use this option.

    • The endpoints from build.nvidia.com.

  • timeout (Optional[float | Timeout]) –

    Sets the HTTP request timeout for all API calls made by the client. The timeout is passed to the parent class (SyncAPIClient/AsyncAPIClient) during client construction. Individual API methods can also accept a timeout parameter to override the client-level timeout for specific requests.

    Accepted values:

    • A float (seconds)

    • A Timeout object (imported from httpx)

    • None (no timeout)

    • NotGiven (use default)

  • max_retries (Optional[int]) –

    Sets the maximum number of automatic retries for failed HTTP requests. When an HTTP request fails with certain status codes, the client automatically retries the request up to the specified number of times.

    Usage Examples:

    # Custom retry count
    client = AsyncNeMoMicroservices(max_retries=5)  # 5 retries
    
    # No retries
    client = AsyncNeMoMicroservices(max_retries=0)
    
    # Override for specific requests
    client.with_options(max_retries=3).chat.completions.create(...)
    

property chat: AsyncChatResource#
property completions: AsyncCompletionsResource#
property models: AsyncModelsResource#
property customization: AsyncCustomizationResource#
property evaluation: AsyncEvaluationResource#
property datasets: AsyncDatasetsResource#
property embeddings: AsyncEmbeddingsResource#
property namespaces: AsyncNamespacesResource#
property projects: AsyncProjectsResource#
property deployment: AsyncDeploymentResource#
property guardrail: AsyncGuardrailResource#
property inference: AsyncInferenceResource#

Client Attributes#

The NeMo microservices clients provide access to various API resources through the following attributes.

Chat Resources#

  • chat: Access to chat completion functionality

  • completions: Access to text completion functionality

Model Management#

  • models: Manage models and model configurations

  • customization: Handle model customization and fine-tuning

  • evaluation: Evaluate model performance

Data Management#

  • datasets: Manage datasets and data sources

  • embeddings: Generate and manage embeddings

  • namespaces: Organize resources in namespaces

  • projects: Manage projects and project configurations

Deployment & Operations#

  • deployment: Manage model deployments

  • guardrail: Configure and manage guardrails

  • inference: Direct inference operations

Response Handling#

  • with_raw_response: Access raw HTTP response data

  • with_streaming_response: Handle streaming responses