nemo_microservices.data_designer.client.data_designer_client
#
Module Contents#
Classes#
Client for interacting with the NeMo Data Designer service. |
Data#
API#
- nemo_microservices.data_designer.client.data_designer_client.DEFAULT_NUM_RECORDS_FOR_PREVIEW#
10
- nemo_microservices.data_designer.client.data_designer_client.DEFAULT_PREVIEW_TIMEOUT#
120
- class nemo_microservices.data_designer.client.data_designer_client.NeMoDataDesignerClient(
- *,
- client: nemo_microservices.NeMoMicroservices | None = None,
- base_url: str | None = None,
- **kwargs,
Client for interacting with the NeMo Data Designer service.
The NeMoDataDesignerClient provides a high-level interface for generating synthetic datasets using the NeMo Data Designer service. It supports creating batch data generation jobs, running data generation previews, and managing datasets through the datastore.
The client can be initialized with either an existing NeMoMicroservices client or a base URL to create a new connection.
Initialization
Initialize the NeMoDataDesignerClient.
Args: client: An existing NeMoMicroservices client instance. If provided, this will be used instead of creating a new client. Mutually exclusive with base_url. base_url: The base URL of the NeMo Microservices instance. Used to create a new NeMoMicroservices client if no client is provided. Mutually exclusive with client. **kwargs: Additional keyword arguments passed to NeMoMicroservices constructor when creating a new client. Ignored if client is provided.
Raises: DataDesignerClientError: If neither client nor base_url is provided.
Note: Either client or base_url must be provided, but not both. If both are provided, the client parameter takes precedence.
- create(
- config_builder: nemo_microservices.data_designer.config.config_builder.DataDesignerConfigBuilder,
- *,
- num_records: int = 100,
- wait_until_done: bool = False,
- name: str = 'nemo-data-designer-job',
- project: str = 'nemo-data-designer',
Create a Data Designer generation job.
Args: config_builder: Data Designer configuration builder. num_records: The number of records to generate. wait_until_done: Whether to halt your program until the job is done. name: Name label for the job within the NeMo Microservices project. project: Name of the NeMo Microservices project.
Returns: An object with methods for querying the job’s status and results.
- get_datastore_settings() nemo_microservices.data_designer.config.datastore.DatastoreSettings | None #
Get the current datastore settings.
Returns: The current datastore settings if it has been set, None otherwise.
- get_job_results(
- job_id: str,
Retrieve results for an existing data generation job.
Args: job_id: The unique identifier of the job to retrieve results for.
Returns: An object containing methods for querying job status, retrieving the generated dataset, and accessing job metadata.
Raises: ValueError: If the job ID provided is empty.
- preview(
- config_builder: nemo_microservices.data_designer.config.config_builder.DataDesignerConfigBuilder,
- *,
- num_records: int | None = None,
- timeout: int | None = None,
Generate a set of preview records based on your current Data Designer configuration.
This method is meant for fast iteration on your Data Designer configuration.
Args: config_builder: Data Designer configuration builder. num_records: The number of records to generate. Must be equal to or less than the max number of preview records set at deploy time. timeout: The timeout for the preview in seconds. If not provided, one will be set based on the model configs.
Returns: An object containing the preview dataset and tools for inspecting the results.
- upload_seed_dataset(
- dataset: str | pathlib.Path | pandas.DataFrame,
- repo_id: str,
- datastore_settings: nemo_microservices.data_designer.config.datastore.DatastoreSettings,
Upload a dataset to the datastore and return the reference for fetching the dataset.
This function handles different dataset input types and automatically manages temporary files for DataFrame uploads. For DataFrame inputs, a temporary parquet file is created and automatically cleaned up after upload.
Args: dataset: Dataset to upload. Can be: - pandas.DataFrame: Will be saved as a temporary parquet file. - str: Path to an existing dataset file. - Path: Path object pointing to an existing dataset file. repo_id: Repository ID for the datastore where the dataset will be uploaded. datastore_settings: Configuration settings for the datastore connection.
Returns: Seed dataset reference returned from the datastore upload.
- nemo_microservices.data_designer.client.data_designer_client.logger#
‘getLogger(…)’