Inference Resource#

This resource corresponds to model inference endpoints provided by the NIM Proxy. Use the models sub-resource to list models available for inference and use completions or chat.completions resources for inference.

Sync Inference Resource#

class nemo_microservices.lib.custom_resources.inference.InferenceResource(client: NeMoMicroservices)

Bases: SyncAPIResource

property models: ModelsResource

property with_raw_response: InferenceResourceWithRawResponse

This property can be used as a prefix for any HTTP method call to return the raw response object instead of the parsed content.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#accessing-raw-response-data-e-g-headers

property with_streaming_response: InferenceResourceWithStreamingResponse

An alternative to .with_raw_response that doesn’t eagerly read the response body.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#with_streaming_response

create_from_dict(data: dict[str, object]) → object

Async Inference Resource#

class nemo_microservices.lib.custom_resources.inference.AsyncInferenceResource(client: AsyncNeMoMicroservices)

Bases: AsyncAPIResource

property models: AsyncModelsResource

property with_raw_response: AsyncInferenceResourceWithRawResponse

This property can be used as a prefix for any HTTP method call to return the raw response object instead of the parsed content.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#accessing-raw-response-data-e-g-headers

property with_streaming_response: AsyncInferenceResourceWithStreamingResponse

An alternative to .with_raw_response that doesn’t eagerly read the response body.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#with_streaming_response

create_from_dict( data: dict[str, object], ) → object