Person sampling in Data Designer allows you to generate synthetic person data for your datasets. There are two distinct approaches, each with different capabilities and use cases.
Data Designer provides two ways to generate synthetic people:
Uses the Faker library to generate random personal information. The data is basic and not demographically accurate, but is useful for quick testing, prototyping, or when realistic demographic distributions are not relevant for your use case.
For mor details, see the documentation for SamplerColumnConfig and PersonFromFakerSamplerParams.
Uses curated Nemotron-Personas datasets from NVIDIA GPU Cloud (NGC) to generate demographically accurate person data with rich personality profiles and behavioral characteristics.
The NGC datasets are extended versions of the open-source Nemotron-Personas datasets on HuggingFace, with additional fields and enhanced data quality.
Supported locales:
en_US: United Statesen_IN: India (English)en_SG: Singapore (English)fr_FR: France (French)hi_Deva_IN: India (Devanagari script)hi_Latn_IN: India (Latin script)ja_JP: Japanko_KR: South Korea (Korean)pt_BR: Brazil (Portuguese)To use the extended Nemotron-Personas datasets with Data Designer, you need to download them from NGC and move them to the Data Designer managed assets directory.
See below for step-by-step instructions.
To download the Nemotron-Personas datasets from NGC, you will need to obtain an NGC API key and install the NGC CLI.
Once you have the NGC CLI and your NGC API key set up, you can download the datasets via the Data Designer CLI.
You can pass the locales you want to download as arguments to the CLI command:
Or you can use the interactive mode to select the locales you want to download:
Use the NGC CLI to download the datasets:
Then move the downloaded dataset to the Data Designer managed assets directory:
For more details, see the documentation for SamplerColumnConfig and PersonSamplerParams.
Core Fields (all locales):
France-Specific Fields (fr_FR):
household_type - Household composition (e.g., single person, couple with/without children)monthly_income_eur - Estimated monthly income in eurosfirst_name_heritage - Cultural origin of the first namename_heritage - Cultural, linguistic, or geographic origin of the surnameis_first_gen_immigrant - Whether the individual is a first-generation immigrant to FranceJapan-Specific Fields (ja_JP):
areaKorea-Specific Fields (ko_KR):
economic_activity_status - Employment / economic activity statusfamily_type - Household / family composition typehousing_type - Dwelling type (apartment, detached home, etc.)housing_tenure - Owned vs rented, etc.income_bracket - Income rangemilitary_status - Military service statusdrinking_status - Drinking frequency / statussmoking_status - Smoking frequency / statusblood_pressure_status - Blood pressure health indicatorblood_sugar_status - Blood sugar health indicatorbmi_status - BMI health indicatorwaist_status - Waist-circumference health indicatorBrazil-Specific Fields (pt_BR):
race - Census-reported raceSingapore-Specific Fields (en_SG):
industry - Industry of employmentpreferred_english_name - Preferred English-form nameEnglish Locales Shared Fields (en_US, en_SG):
ethnic_background - Self-identified ethnic backgroundReligion Fields (en_IN, hi_Deva_IN, hi_Latn_IN, en_SG, pt_BR):
religion - Census-reported religionIndia Locales Fields (en_IN, hi_Deva_IN, hi_Latn_IN):
education_degree - Census-reported education degreefirst_language - Native languagesecond_language - Second language (if applicable)third_language - Third language (if applicable)zone - Urban vs ruralWith Synthetic Personas Enabled:
Japan-specific persona fields (ja_JP):
aspectsdigital_skillKorea-specific persona fields (ko_KR):
family_personaReligious persona fields (en_IN, hi_Deva_IN, hi_Latn_IN, en_SG, pt_BR):
religious_personareligious_backgroundIndia-locales persona fields (en_IN, hi_Deva_IN, hi_Latn_IN):
linguistic_personalinguistic_background