Download Job Results#

Download the complete results from a completed data generation job as files.

Prerequisites#

Before you can download results from a data generation job, make sure that you have:

  • Obtained the base URL of your NeMo Data Designer service

  • Set the DATA_DESIGNER_BASE_URL environment variable to your NeMo Data Designer service endpoint

  • A completed data generation job

export DATA_DESIGNER_BASE_URL="https://your-data-designer-service-url"

To Download Results from a Data Generation Job#

Choose one of the following options to download complete results from a data generation job.

import os
from nemo_microservices import NeMoMicroservices

# Initialize the client
client = NeMoMicroservices(
    base_url=os.environ['DATA_DESIGNER_BASE_URL']
)

# First, get the list of available results
job_id = "job-abc123def456"
results = client.beta.data_designer.jobs.results.list(job_id)

# Download each result
for result in results:
    print(f"Downloading result: {result.id}")
    
    # Download the result data
    downloaded_data = client.beta.data_designer.jobs.results.download(
        result.id,
        job_id=job_id
    )
    
    # Save to file
    filename = f"{result.id}.{result.format}"
    with open(filename, 'wb') as f:
        f.write(downloaded_data)
    
    print(f"Saved as: {filename}")
JOB_ID="job-abc123def456"
RESULT_ID="result-123-csv"

# Download specific result
curl -X GET \
  "${DATA_DESIGNER_BASE_URL}/v1beta1/data-designer/jobs/${JOB_ID}/results/${RESULT_ID}/download" \
  -H 'Accept: application/octet-stream' \
  -o "synthetic_data.csv"

echo "Downloaded result to synthetic_data.csv"

Download Workflow#

  1. List Results: First get the available results for a job

  2. Select Result: Choose the result you want to download based on format and canonical status

  3. Download: Use the result ID and job ID to download the specific result

Working with Downloaded Data#

CSV Files#

import pandas as pd

# Load CSV data
df = pd.read_csv("result-123-csv.csv")
print(f"Loaded {len(df)} rows")
print(df.head())

# Basic analysis
print(f"Columns: {list(df.columns)}")
print(f"Data types:\n{df.dtypes}")

JSON Lines Files#

import json
import pandas as pd

# Load JSON Lines data
data = []
with open("result-123-json.json", "r") as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
print(f"Loaded {len(df)} rows from JSON")

Parquet Files#

import pandas as pd

# Load Parquet data (most efficient for large datasets)
df = pd.read_parquet("result-123-parquet.parquet")
print(f"Loaded {len(df)} rows from Parquet")

File Format Considerations#

  • CSV: Human-readable, largest file size

  • JSON: Structured format, medium file size

  • Parquet: Compressed columnar format, smallest file size for large datasets

For large datasets, Parquet format is recommended for optimal storage and loading performance.