Data Designer NMP SDK Resources#

The data_designer.config module provides a consistent, context-agnostic experience for building Data Designer configs. Once you are ready to execute that config on the NMP Data Designer service, you use objects from the nemo_platform SDK. This page explains the NMP-specific objects used to interact with the Data Designer service.

DataDesignerResource#

The DataDesignerResource is the initial SDK object for working with Data Designer on NMP. It is analogous to the library’s data_designer.interface.DataDesigner object.

A DataDesignerResource is accessed directly from a NeMoPlatform instance:

import os
from nemo_platform import NeMoPlatform


sdk = NeMoPlatform(
    base_url=os.environ.get("NMP_BASE_URL", "http://localhost:8080"),
    workspace="default",
)
data_designer = sdk.data_designer  # this object is a DataDesignerResource

The DataDesignerResource is primarily used to make preview requests (preview) and create jobs (create), but exposes some additional useful methods:

Method

Description

get_default_model_providers()

Returns a list of model providers registered with the Models and Inference Gateway services that can be used in your Data Designer config.

get_job_resource(job_name: str)

Returns a DataDesignerJobResource for interacting with a job (see below).

DataDesignerJobResource#

The DataDesignerJobResource provides several helper methods for working with a job. It is returned by the DataDesignerResource#create method when you create a job; you can also use DataDesignerResource#get_job_resource to get an instance of this object for an existing job.

Some of the most useful methods are described below.

Method

Description

wait_until_done()

Polls the job service until the job reaches a terminal state. Prints job logs along the way.

get_logs()

Returns logs from the job as a list of dicts. Handles pagination automatically.

download_artifacts()

Downloads the job results as a tar archive. Returns a DataDesignerJobResults object (see below).

DataDesignerJobResults#

The DataDesignerJobResults object simplifies loading downloaded job results into memory.

Method

Description

load_analysis()

Returns a DatasetProfilerResults object (from the library) with an analysis of the dataset.

load_dataset()

Returns the output dataset as a Pandas DataFrame.

load_processor_dataset(processor_name: str)

Returns the named processor dataset as a Pandas DataFrame.