Download Job Results#
Download the complete results from a completed data generation job as files.
Prerequisites#
Before you can download results from a data generation job, make sure that you have:
Obtained the base URL of your NeMo Data Designer service
Set the
DATA_DESIGNER_BASE_URL
environment variable to your NeMo Data Designer service endpointA completed data generation job
export DATA_DESIGNER_BASE_URL="https://your-data-designer-service-url"
To Download Results from a Data Generation Job#
Choose one of the following options to download complete results from a data generation job.
import os
from nemo_microservices import NeMoMicroservices
# Initialize the client
client = NeMoMicroservices(
base_url=os.environ['DATA_DESIGNER_BASE_URL']
)
# First, get the list of available results
job_id = "job-abc123def456"
results = client.beta.data_designer.jobs.results.list(job_id)
# Download each result
for result in results:
print(f"Downloading result: {result.id}")
# Download the result data
downloaded_data = client.beta.data_designer.jobs.results.download(
result.id,
job_id=job_id
)
# Save to file
filename = f"{result.id}.{result.format}"
with open(filename, 'wb') as f:
f.write(downloaded_data)
print(f"Saved as: {filename}")
JOB_ID="job-abc123def456"
RESULT_ID="result-123-csv"
# Download specific result
curl -X GET \
"${DATA_DESIGNER_BASE_URL}/v1beta1/data-designer/jobs/${JOB_ID}/results/${RESULT_ID}/download" \
-H 'Accept: application/octet-stream' \
-o "synthetic_data.csv"
echo "Downloaded result to synthetic_data.csv"
Download Workflow#
List Results: First get the available results for a job
Select Result: Choose the result you want to download based on format and canonical status
Download: Use the result ID and job ID to download the specific result
Working with Downloaded Data#
CSV Files#
import pandas as pd
# Load CSV data
df = pd.read_csv("result-123-csv.csv")
print(f"Loaded {len(df)} rows")
print(df.head())
# Basic analysis
print(f"Columns: {list(df.columns)}")
print(f"Data types:\n{df.dtypes}")
JSON Lines Files#
import json
import pandas as pd
# Load JSON Lines data
data = []
with open("result-123-json.json", "r") as f:
for line in f:
data.append(json.loads(line))
df = pd.DataFrame(data)
print(f"Loaded {len(df)} rows from JSON")
Parquet Files#
import pandas as pd
# Load Parquet data (most efficient for large datasets)
df = pd.read_parquet("result-123-parquet.parquet")
print(f"Loaded {len(df)} rows from Parquet")
File Format Considerations#
CSV: Human-readable, largest file size
JSON: Structured format, medium file size
Parquet: Compressed columnar format, smallest file size for large datasets
For large datasets, Parquet format is recommended for optimal storage and loading performance.