models.resources#

Extended ModelsResource with high-level helper methods.

Module Contents#

Classes#

AsyncModelsResource

Extended AsyncModelsResource with high-level helper methods.

ModelsResource

Extended ModelsResource with high-level helper methods.

API#

class models.resources.AsyncModelsResource(client: nemo_platform._client.AsyncNeMoPlatform)#

Bases: nemo_platform.resources.models.AsyncModelsResource

Extended AsyncModelsResource with high-level helper methods.

All existing async methods (create, retrieve, list, etc.) work unchanged. Adds convenience methods for OpenAI integration and deployment management.

URL builder methods are synchronous (no I/O) and safe to call from async code. Methods that perform I/O are properly async.

Example

>>> sdk = AsyncNeMoPlatform(base_url="http://nmp-host", workspace="default")
>>> sdk.nemo_platform.models.get_openai_route_base_url()
>>> await sdk.nemo_platform.models.wait_for_status("my-deployment", "READY")

Initialization

get_async_openai_client(*, workspace: str | None = None)#

Get an async OpenAI client configured for NMP’s inference gateway.

This method returns an AsyncOpenAI client with the base_url set to the OpenAI proxy route for the specified workspace.

Parameters:

workspace – The workspace identifier

Returns:

An AsyncOpenAI client configured for the inference gateway

Example

>>> client = sdk.nemo_platform.models.get_async_openai_client()
>>> response = await client.chat.completions.create(
...     model="default/meta_llama-3.2-1b-instruct",
...     messages=[{"role": "user", "content": "Hello!"}]
... )
get_client_default_headers() dict[str, str]#

Get string-only default headers for third-party client libraries.

Use this helper when constructing external clients (for example OpenAI SDK or LiteLLM) so auth and identity headers from the SDK are forwarded. This is required for successful inference requests when platform auth/ authorization is enabled.

get_model_entity_route_openai_url(
model_entity: nemo_platform.types.models.ModelEntity,
) str#

Generate an OpenAI SDK-compatible URL for the model entity proxy route.

This is a synchronous method (no I/O) and safe to call from async code.

Parameters:

model_entity – The ModelEntity object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

get_openai_route_base_url(*, workspace: str | None = None) str#

Generate the base URL for the OpenAI proxy route.

This is a synchronous method (no I/O) and safe to call from async code.

Parameters:

workspace – The workspace identifier

Returns:

A URL string suitable for use as OpenAI client’s base_url

get_provider_route_openai_url(
provider: nemo_platform.types.inference.ModelProvider,
) str#

Generate an OpenAI SDK-compatible URL for the provider proxy route.

This is a synchronous method (no I/O) and safe to call from async code.

Parameters:

provider – The ModelProvider object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

async get_provider_route_openai_url_for_deployment(
deployment: nemo_platform.types.inference.ModelDeployment,
) str#

Generate an OpenAI SDK-compatible URL for a deployment’s model provider.

This is an async method that fetches the ModelProvider associated with the deployment and returns the provider route URL.

Parameters:

deployment – The ModelDeployment object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

Raises:

ValueError – If the deployment has no associated model_provider_id

Example

>>> deployment = await sdk.inference.deployments.retrieve("my-deployment", workspace="default")
>>> base_url = await sdk.nemo_platform.models.get_provider_route_openai_url_for_deployment(deployment)
>>> openai_client = AsyncOpenAI(base_url=base_url)
async wait_for_deployment_status(
deployment_name: str,
desired_status: str,
*,
workspace: str | None = None,
timeout: int = 1200,
) bool#

Wait for a ModelDeployment to reach the desired status (async version).

For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.

Parameters:
  • deployment_name – Name of the deployment

  • desired_status – Target status (“READY”, “DELETED”, etc.)

  • workspace – Workspace of the deployment

  • timeout – Maximum time to wait in seconds

Returns:

True if desired status reached, False if timeout

async wait_for_gateway(
provider_name: str,
*,
workspace: str | None = None,
timeout: int = 60,
) bool#

Wait for the inference gateway to be able to route to a provider.

Polls the gateway’s /ready endpoint until it returns success, indicating the gateway has refreshed its cache and is aware of the provider.

Parameters:
  • provider_name – Name of the model provider

  • workspace – Workspace of the provider

  • timeout – Maximum time to wait in seconds

Returns:

True if gateway is ready, False if timeout

async wait_for_provider(
provider_name: str,
desired_status: str = 'READY',
*,
workspace: str | None = None,
timeout: int = 60,
check_gateway: bool = True,
) bool#

Wait for a provider to reach the desired status (async version).

This is useful for external providers (like NVIDIA Build or OpenAI) where you need to wait for the provider to be ready before making inference calls.

Parameters:
  • provider_name – Name of the provider

  • desired_status – Target status (default: “READY”)

  • workspace – Workspace of the provider

  • timeout – Maximum time to wait in seconds

  • check_gateway – When True and desired_status is “READY”, also verify the gateway can route to the provider before returning (default: True).

Returns:

True if desired status reached, False if timeout

async wait_for_status(
deployment_name: str,
desired_status: str,
*,
workspace: str | None = None,
timeout: int = 1200,
check_gateway: bool = True,
) bool#

Wait for a ModelDeployment and ModelProvider to reach the desired status.

For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.

Parameters:
  • deployment_name – Name of the deployment

  • desired_status – Target status (“READY”, “DELETED”, etc.)

  • workspace – Workspace of the deployment

  • timeout – Maximum time to wait in seconds

  • check_gateway – When True and desired_status is “READY”, verify the gateway can route to the provider before returning (default: True).

Returns:

True if desired status reached, False if timeout

class models.resources.ModelsResource(client: nemo_platform._client.NeMoPlatform)#

Bases: nemo_platform.resources.models.ModelsResource

Extended ModelsResource with high-level helper methods.

All existing methods (create, retrieve, list, etc.) work unchanged. Adds convenience methods for OpenAI integration and deployment management.

Example

>>> sdk = NeMoPlatform(base_url="http://nmp-host", workspace="default")
>>> sdk.nemo_platform.models.get_openai_route_base_url()
>>> sdk.nemo_platform.models.wait_for_status("my-deployment", "READY")

Initialization

get_client_default_headers() dict[str, str]#

Get string-only default headers for third-party client libraries.

Use this helper when constructing external clients (for example OpenAI SDK or LiteLLM) so auth and identity headers from the SDK are forwarded. This is required for successful inference requests when platform auth/ authorization is enabled.

get_model_entity_route_openai_url(
model_entity: nemo_platform.types.models.ModelEntity,
) str#

Generate an OpenAI SDK-compatible URL for the model entity proxy route.

Always appends /v1 suffix since the client doesn’t interact directly with the provider’s host_url.

Parameters:

model_entity – The ModelEntity object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

Example

>>> entity = sdk.nemo_platform.models.retrieve("my-model", workspace="default")
>>> base_url = sdk.nemo_platform.models.get_model_entity_route_openai_url(entity)
>>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/model/my-model/-/v1
>>> openai_client = OpenAI(base_url=base_url)
get_openai_client(*, workspace: str | None = None)#

Get a sync OpenAI client configured for NMP’s inference gateway.

This method returns an OpenAI client with the base_url set to the OpenAI proxy route for the specified workspace. The client can be used directly with the standard OpenAI SDK interface.

Parameters:

workspace – The workspace identifier

Returns:

An OpenAI client configured for the inference gateway

Example

>>> client = sdk.nemo_platform.models.get_openai_client()
>>> response = client.chat.completions.create(
...     model="default/meta_llama-3.2-1b-instruct",
...     messages=[{"role": "user", "content": "Hello!"}]
... )
get_openai_route_base_url(*, workspace: str | None = None) str#

Generate the base URL for the OpenAI proxy route.

This route uses the model field in the request body for routing, formatted as workspace/model_entity_name.

Parameters:

workspace – The workspace identifier

Returns:

A URL string suitable for use as OpenAI client’s base_url

Example

>>> base_url = sdk.nemo_platform.models.get_openai_route_base_url()
>>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/openai/-/v1
>>> openai_client = OpenAI(base_url=base_url)
get_provider_route_openai_url(
provider: nemo_platform.types.inference.ModelProvider,
) str#

Generate an OpenAI SDK-compatible URL for the provider proxy route.

Handles the conditional /v1 suffix based on the provider’s host_url: - If host_url ends with /v1, no suffix is added - Otherwise, /v1 is appended

Parameters:

provider – The ModelProvider object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

Example

>>> provider = sdk.inference.providers.retrieve("my-provider", workspace="default")
>>> base_url = sdk.nemo_platform.models.get_provider_route_openai_url(provider)
>>> # Returns: {base_url}/apis/inference-gateway/v2/workspaces/default/provider/my-provider/-/v1
>>> openai_client = OpenAI(base_url=base_url)
get_provider_route_openai_url_for_deployment(
deployment: nemo_platform.types.inference.ModelDeployment,
) str#

Generate an OpenAI SDK-compatible URL for a deployment’s model provider.

This is a convenience method that fetches the ModelProvider associated with the deployment and returns the provider route URL.

Parameters:

deployment – The ModelDeployment object from the SDK

Returns:

A URL string suitable for use as OpenAI client’s base_url

Raises:

ValueError – If the deployment has no associated model_provider_id

Example

>>> deployment = sdk.inference.deployments.retrieve("my-deployment", workspace="default")
>>> base_url = sdk.nemo_platform.models.get_provider_route_openai_url_for_deployment(deployment)
>>> openai_client = OpenAI(base_url=base_url)
wait_for_deployment_status(
deployment_name: str,
desired_status: str,
*,
workspace: str | None = None,
timeout: int = 1200,
) bool#

Wait for a ModelDeployment to reach the desired status.

For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.

Parameters:
  • deployment_name – Name of the deployment

  • desired_status – Target status (“READY”, “DELETED”, etc.)

  • workspace – Workspace of the deployment

  • timeout – Maximum time to wait in seconds

Returns:

True if desired status reached, False if timeout

wait_for_gateway(
provider_name: str,
*,
workspace: str | None = None,
timeout: int = 60,
) bool#

Wait for the inference gateway to be able to route to a provider.

Polls the gateway’s /ready endpoint until it returns success, indicating the gateway has refreshed its cache and is aware of the provider.

Parameters:
  • provider_name – Name of the model provider

  • workspace – Workspace of the provider

  • timeout – Maximum time to wait in seconds

Returns:

True if gateway is ready, False if timeout

wait_for_provider(
provider_name: str,
desired_status: str = 'READY',
*,
workspace: str | None = None,
timeout: int = 60,
check_gateway: bool = True,
) bool#

Wait for a provider to reach the desired status.

This is useful for external providers (like NVIDIA Build or OpenAI) where you need to wait for the provider to be ready before making inference calls.

Parameters:
  • provider_name – Name of the provider

  • desired_status – Target status (default: “READY”)

  • workspace – Workspace of the provider

  • timeout – Maximum time to wait in seconds

  • check_gateway – When True and desired_status is “READY”, also verify the gateway can route to the provider before returning (default: True).

Returns:

True if desired status reached, False if timeout

wait_for_status(
deployment_name: str,
desired_status: str,
*,
workspace: str | None = None,
timeout: int = 1200,
check_gateway: bool = True,
) bool#

Wait for a ModelDeployment to reach the desired status.

For “DELETED” status, this function waits for the resource to be fully garbage collected (404 NotFoundError), not just for the status to show as DELETED.

Parameters:
  • deployment_name – Name of the deployment

  • desired_status – Target status (“READY”, “DELETED”, etc.)

  • workspace – Workspace of the deployment

  • timeout – Maximum time to wait in seconds

  • check_gateway – When True and desired_status is “READY”, verify the gateway can route to the provider before returning (default: True).

Returns:

True if desired status reached, False if timeout