models.resources#
Extended ModelsResource with high-level helper methods.
Module Contents#
Classes#
Extended AsyncModelsResource with high-level helper methods. |
|
Extended ModelsResource with high-level helper methods. |
API#
- class models.resources.AsyncModelsResource(client: nemo_platform._client.AsyncNeMoPlatform)#
Bases:
nemo_platform.resources.models.AsyncModelsResourceExtended AsyncModelsResource with high-level helper methods.
All existing async methods (create, retrieve, list, etc.) work unchanged. Adds convenience methods for OpenAI integration and deployment management.
URL builder methods are synchronous (no I/O) and safe to call from async code. Methods that perform I/O are properly async.
Example
>>> sdk = AsyncNeMoPlatform(base_url="http://nmp-host", workspace="default") >>> sdk.nemo_platform.models.get_openai_route_base_url() >>> await sdk.nemo_platform.models.wait_for_status("my-deployment", "READY")
Initialization
- get_async_openai_client(*, workspace: str | None = None)#
Get an async OpenAI client configured for NMP’s inference gateway.
This method returns an AsyncOpenAI client with the base_url set to the OpenAI proxy route for the specified workspace.
- Parameters:
workspace – The workspace identifier
- Returns:
An AsyncOpenAI client configured for the inference gateway
Example
>>> client = sdk.nemo_platform.models.get_async_openai_client() >>> response = await client.chat.completions.create( ... model="default/meta_llama-3.2-1b-instruct", ... messages=[{"role": "user", "content": "Hello!"}] ... )
- get_client_default_headers() dict[str, str]#
Get string-only default headers for third-party client libraries.
Use this helper when constructing external clients (for example OpenAI SDK or LiteLLM) so auth and identity headers from the SDK are forwarded. This is required for successful inference requests when platform auth/ authorization is enabled.
- get_model_entity_route_openai_url(
- model_entity: nemo_platform.types.models.ModelEntity,
Generate an OpenAI SDK-compatible URL for the model entity proxy route.
This is a synchronous method (no I/O) and safe to call from async code.
- Parameters:
model_entity – The ModelEntity object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
- get_openai_route_base_url(*, workspace: str | None = None) str#
Generate the base URL for the OpenAI proxy route.
This is a synchronous method (no I/O) and safe to call from async code.
- Parameters:
workspace – The workspace identifier
- Returns:
A URL string suitable for use as OpenAI client’s base_url
- get_provider_route_openai_url(
- provider: nemo_platform.types.inference.ModelProvider,
Generate an OpenAI SDK-compatible URL for the provider proxy route.
This is a synchronous method (no I/O) and safe to call from async code.
- Parameters:
provider – The ModelProvider object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
- async get_provider_route_openai_url_for_deployment(
- deployment: nemo_platform.types.inference.ModelDeployment,
Generate an OpenAI SDK-compatible URL for a deployment’s model provider.
This is an async method that fetches the ModelProvider associated with the deployment and returns the provider route URL.
- Parameters:
deployment – The ModelDeployment object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
- Raises:
ValueError – If the deployment has no associated model_provider_id
Example
>>> deployment = await sdk.inference.deployments.retrieve("my-deployment", workspace="default") >>> base_url = await sdk.nemo_platform.models.get_provider_route_openai_url_for_deployment(deployment) >>> openai_client = AsyncOpenAI(base_url=base_url)
- async wait_for_deployment_status(
- deployment_name: str,
- desired_status: str,
- *,
- workspace: str | None = None,
- timeout: int = 1200,
Wait for a ModelDeployment to reach the desired status (async version).
For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.
- Parameters:
deployment_name – Name of the deployment
desired_status – Target status (“READY”, “DELETED”, etc.)
workspace – Workspace of the deployment
timeout – Maximum time to wait in seconds
- Returns:
True if desired status reached, False if timeout
- async wait_for_gateway(
- provider_name: str,
- *,
- workspace: str | None = None,
- timeout: int = 60,
Wait for the inference gateway to be able to route to a provider.
Polls the gateway’s /ready endpoint until it returns success, indicating the gateway has refreshed its cache and is aware of the provider.
- Parameters:
provider_name – Name of the model provider
workspace – Workspace of the provider
timeout – Maximum time to wait in seconds
- Returns:
True if gateway is ready, False if timeout
- async wait_for_provider(
- provider_name: str,
- desired_status: str = 'READY',
- *,
- workspace: str | None = None,
- timeout: int = 60,
- check_gateway: bool = True,
Wait for a provider to reach the desired status (async version).
This is useful for external providers (like NVIDIA Build or OpenAI) where you need to wait for the provider to be ready before making inference calls.
- Parameters:
provider_name – Name of the provider
desired_status – Target status (default: “READY”)
workspace – Workspace of the provider
timeout – Maximum time to wait in seconds
check_gateway – When True and desired_status is “READY”, also verify the gateway can route to the provider before returning (default: True).
- Returns:
True if desired status reached, False if timeout
- async wait_for_status(
- deployment_name: str,
- desired_status: str,
- *,
- workspace: str | None = None,
- timeout: int = 1200,
- check_gateway: bool = True,
Wait for a ModelDeployment and ModelProvider to reach the desired status.
For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.
- Parameters:
deployment_name – Name of the deployment
desired_status – Target status (“READY”, “DELETED”, etc.)
workspace – Workspace of the deployment
timeout – Maximum time to wait in seconds
check_gateway – When True and desired_status is “READY”, verify the gateway can route to the provider before returning (default: True).
- Returns:
True if desired status reached, False if timeout
- class models.resources.ModelsResource(client: nemo_platform._client.NeMoPlatform)#
Bases:
nemo_platform.resources.models.ModelsResourceExtended ModelsResource with high-level helper methods.
All existing methods (create, retrieve, list, etc.) work unchanged. Adds convenience methods for OpenAI integration and deployment management.
Example
>>> sdk = NeMoPlatform(base_url="http://nmp-host", workspace="default") >>> sdk.nemo_platform.models.get_openai_route_base_url() >>> sdk.nemo_platform.models.wait_for_status("my-deployment", "READY")
Initialization
- get_client_default_headers() dict[str, str]#
Get string-only default headers for third-party client libraries.
Use this helper when constructing external clients (for example OpenAI SDK or LiteLLM) so auth and identity headers from the SDK are forwarded. This is required for successful inference requests when platform auth/ authorization is enabled.
- get_model_entity_route_openai_url(
- model_entity: nemo_platform.types.models.ModelEntity,
Generate an OpenAI SDK-compatible URL for the model entity proxy route.
Always appends /v1 suffix since the client doesn’t interact directly with the provider’s host_url.
- Parameters:
model_entity – The ModelEntity object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
Example
>>> entity = sdk.nemo_platform.models.retrieve("my-model", workspace="default") >>> base_url = sdk.nemo_platform.models.get_model_entity_route_openai_url(entity) >>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/model/my-model/-/v1 >>> openai_client = OpenAI(base_url=base_url)
- get_openai_client(*, workspace: str | None = None)#
Get a sync OpenAI client configured for NMP’s inference gateway.
This method returns an OpenAI client with the base_url set to the OpenAI proxy route for the specified workspace. The client can be used directly with the standard OpenAI SDK interface.
- Parameters:
workspace – The workspace identifier
- Returns:
An OpenAI client configured for the inference gateway
Example
>>> client = sdk.nemo_platform.models.get_openai_client() >>> response = client.chat.completions.create( ... model="default/meta_llama-3.2-1b-instruct", ... messages=[{"role": "user", "content": "Hello!"}] ... )
- get_openai_route_base_url(*, workspace: str | None = None) str#
Generate the base URL for the OpenAI proxy route.
This route uses the model field in the request body for routing, formatted as workspace/model_entity_name.
- Parameters:
workspace – The workspace identifier
- Returns:
A URL string suitable for use as OpenAI client’s base_url
Example
>>> base_url = sdk.nemo_platform.models.get_openai_route_base_url() >>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/openai/-/v1 >>> openai_client = OpenAI(base_url=base_url)
- get_provider_route_openai_url(
- provider: nemo_platform.types.inference.ModelProvider,
Generate an OpenAI SDK-compatible URL for the provider proxy route.
Handles the conditional /v1 suffix based on the provider’s host_url: - If host_url ends with /v1, no suffix is added - Otherwise, /v1 is appended
- Parameters:
provider – The ModelProvider object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
Example
>>> provider = sdk.inference.providers.retrieve("my-provider", workspace="default") >>> base_url = sdk.nemo_platform.models.get_provider_route_openai_url(provider) >>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/provider/my-provider/-/v1 >>> openai_client = OpenAI(base_url=base_url)
- get_provider_route_openai_url_for_deployment(
- deployment: nemo_platform.types.inference.ModelDeployment,
Generate an OpenAI SDK-compatible URL for a deployment’s model provider.
This is a convenience method that fetches the ModelProvider associated with the deployment and returns the provider route URL.
- Parameters:
deployment – The ModelDeployment object from the SDK
- Returns:
A URL string suitable for use as OpenAI client’s base_url
- Raises:
ValueError – If the deployment has no associated model_provider_id
Example
>>> deployment = sdk.inference.deployments.retrieve("my-deployment", workspace="default") >>> base_url = sdk.nemo_platform.models.get_provider_route_openai_url_for_deployment(deployment) >>> openai_client = OpenAI(base_url=base_url)
- wait_for_deployment_status(
- deployment_name: str,
- desired_status: str,
- *,
- workspace: str | None = None,
- timeout: int = 1200,
Wait for a ModelDeployment to reach the desired status.
For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.
- Parameters:
deployment_name – Name of the deployment
desired_status – Target status (“READY”, “DELETED”, etc.)
workspace – Workspace of the deployment
timeout – Maximum time to wait in seconds
- Returns:
True if desired status reached, False if timeout
- wait_for_gateway(
- provider_name: str,
- *,
- workspace: str | None = None,
- timeout: int = 60,
Wait for the inference gateway to be able to route to a provider.
Polls the gateway’s /ready endpoint until it returns success, indicating the gateway has refreshed its cache and is aware of the provider.
- Parameters:
provider_name – Name of the model provider
workspace – Workspace of the provider
timeout – Maximum time to wait in seconds
- Returns:
True if gateway is ready, False if timeout
- wait_for_provider(
- provider_name: str,
- desired_status: str = 'READY',
- *,
- workspace: str | None = None,
- timeout: int = 60,
- check_gateway: bool = True,
Wait for a provider to reach the desired status.
This is useful for external providers (like NVIDIA Build or OpenAI) where you need to wait for the provider to be ready before making inference calls.
- Parameters:
provider_name – Name of the provider
desired_status – Target status (default: “READY”)
workspace – Workspace of the provider
timeout – Maximum time to wait in seconds
check_gateway – When True and desired_status is “READY”, also verify the gateway can route to the provider before returning (default: True).
- Returns:
True if desired status reached, False if timeout
- wait_for_status(
- deployment_name: str,
- desired_status: str,
- *,
- workspace: str | None = None,
- timeout: int = 1200,
- check_gateway: bool = True,
Wait for a ModelDeployment to reach the desired status.
For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.
- Parameters:
deployment_name – Name of the deployment
desired_status – Target status (“READY”, “DELETED”, etc.)
workspace – Workspace of the deployment
timeout – Maximum time to wait in seconds
check_gateway – When True and desired_status is “READY”, verify the gateway can route to the provider before returning (default: True).
- Returns:
True if desired status reached, False if timeout