Generate Realistic Persons#
Data Designer’s sampler column type can be used to sample realistic person data and synthetic personas. Generated using Data Designer itself, as well as a Probabilistic Graphical Model trained on census data, the sampled datasets are grounded in real-world demographic, geographic and personality trait distributions to capture the diversity and richness of the population.
Person Objects in Data Designer#
Person samplers generate person entities with configurable attributes. Each sampler creates a different person object that you can reference throughout your data design. There are two types of person samplers: Person and PersonFromFaker.
The Person sampler generates the best quality person data by sampling from the Nemotron-Personas collection. Grounded in real-world demographic data, this sampler type is supported for the following locales: “en_US”, “ja_JP”, “hi_IN”, and “en_IN”. Person samplers can optionally include synthetic persona data by setting with_synthetic_personas=True. Persona generation adapts to cultural context based on the specified locale and demographic information.
For other locales not supported by the Person sampler, the PersonFromFaker sampler uses the Faker library to generate person data (synthetic personas are not supported). While Faker provides basic attributes like names and addresses, it doesn’t maintain the same demographic accuracy or attribute relationships as the Nemotron-Personas datasets.
Configuration Options#
Person and PersonFromFaker samplers accept these configuration parameters:
sex: Specify “Male” or “Female” (optional)locale: Language and region code (optional, e.g., “en_US”, “ja_JP”, “hi_IN”, “en_IN”, “fr_FR”, “de_DE”)Personsamplers only accept “en_US”, “ja_JP”, “hi_IN”, and “en_IN”
city: Filter on cities within the specified locale (optional)age_range: Age range for filtering (default: ages above 18 only)
Person samplers additionally support:
with_synthetic_personas(default: False): When set to True, samples detailed personality profiles, cultural backgrounds, skills, interests, and context-specific personas for comprehensive character modeling. The personas are sampled from NVIDIA’s Nemotron-Personas Collection, which currently includes Nemotron-Personas-USA, Nemotron-Personas-India and Nemotron-Personas-Japan.
builder.add_column(
name="customer",
column_type="sampler",
sampler_type="person",
params={
"locale": "en_US",
"sex": "Male",
"with_synthetic_personas": True
},
)
builder.add_column(
name="employee",
column_type="sampler",
sampler_type="person",
params={
"locale": "ja_JP",
"sex": "Female",
"with_synthetic_personas": False
},
)
builder.add_column(
name="random_person",
column_type="sampler",
sampler_type="person_from_faker",
params={
"locale": "fr_FR",
},
)
Person Data Structure#
Core Demographic Fields (Always Available)#
Field Name |
Type |
Description |
|---|---|---|
uuid |
str |
Unique identifier |
first_name |
str |
Person’s first name |
last_name |
str |
Person’s last name |
sex |
categorical |
Person’s sex (Male or Female) |
age |
int |
Person’s age |
country |
str |
Country name |
marital_status |
categorical |
None |
education_level |
categorical |
None |
bachelors_field |
categorical |
None |
occupation |
str |
None |
birth_date |
date |
Calculated birth date based on age |
email_address |
str |
Generated email address (None for age < 18) |
locale |
str |
Locale |
US-Specific Fields#
Field Name |
Type |
Description |
|---|---|---|
unit |
str |
Unit/apartment number |
street_number |
int |
Street number (numeric) |
street_name |
str |
Name of the street |
city |
str |
City name |
zipcode |
str |
Zipcode/Postal Code |
state |
str |
State |
county |
str |
County |
bachelors_field |
categorical |
Field of bachelor’s degree |
phone_number |
str |
Generated phone number based on zipcode (None for age < 18) |
ssn |
str |
Social Security Number |
Japan-Specific Fields#
Field Name |
Type |
Description |
|---|---|---|
area |
str |
Region of Japan |
India-Specific Fields#
Field Name |
Type |
Description |
|---|---|---|
zone |
str |
Level of urban development at address (Rural or Urban) |
education_degree |
str |
Education level and post-secondary degree, if applicable |
first_language |
str |
Persons’s native language |
second_language |
str |
Person’s second language, if applicable |
third_language |
str |
Person’s third language, if applicable |
Personality Traits (Available when with_synthetic_personas=True)#
Big Five personality model with t-scores and interpretive labels:
Field Name |
Type |
Description |
|---|---|---|
openness |
dict |
Openness to experience (t_score, label, description) |
conscientiousness |
dict |
Conscientiousness (t_score, label, description) |
extraversion |
dict |
Extraversion (t_score, label, description) |
agreeableness |
dict |
Agreeableness (t_score, label, description) |
neuroticism |
dict |
Neuroticism (t_score, label, description) |
Each personality trait contains:
t_score: Standardized score (typically 0-100)label: Interpretive label (“low”, “average”, “high”, “very high”)description: Detailed behavioral description
Synthetic Persona Fields (Available when with_synthetic_personas=True)#
Background and Development#
Field Name |
Type |
Description |
|---|---|---|
cultural_background |
str |
Detailed narrative about cultural influences and upbringing |
skills_and_expertise |
str |
Comprehensive description of professional and personal capabilities |
skills_and_expertise_list |
str |
List format of key skills and competencies |
hobbies_and_interests |
str |
Detailed description of personal interests and activities |
hobbies_and_interests_list |
str |
List format of hobbies and interests |
career_goals_and_ambitions |
str |
Professional aspirations and long-term objectives |
Persona Profile Fields#
Field Name |
Type |
Description |
|---|---|---|
persona |
str |
Brief summary personality profile |
detailed_persona |
str |
Comprehensive personality and behavioral description |
professional_persona |
str |
Work environment personality and career approach |
finance_persona |
str |
Financial decision-making style and money management approach |
healthcare_persona |
str |
Health and wellness attitudes and behaviors |
sports_persona |
str |
Sports interests and physical activity preferences |
arts_persona |
str |
Artistic tastes, cultural interests, and creative preferences |
travel_persona |
str |
Travel style, preferences, and exploration approach |
culinary_persona |
str |
Food interests, cooking style, and dining preferences |
Japan-Specific Persona Fields#
Field Name |
Type |
Description |
|---|---|---|
aspects |
str |
Cultural, generational, social and communication considerations |
digital_skills |
str |
Digital skill levels informed by population surveys |
India-Specific Persona Fields#
Field Name |
Type |
Description |
|---|---|---|
linguistic_background |
str |
Description of written and spoken language proficiency |
religious_background |
str |
Description of religious background and beliefs |
linguistic_persona |
str |
Linguistic background and language proficiency |
religious_persona |
str |
Religious background, beliefs, and practices |
Best Practices#
Choosing Configuration Options#
Use locales that are backed by a Nemotron-Personas dataset for maximum demographic accuracy and realism
Enable
with_synthetic_personas=Truewhen you need rich character development, personalized content generation, or comprehensive behavioral modelingDisable synthetic personas for basic demographic testing or when computational efficiency is prioritized
Effective Persona Usage#
Match persona depth to use case: Use basic personas for simple applications, detailed personas for comprehensive character modeling
Leverage context-specific personas: Use
professional_personafor workplace scenarios,culinary_personafor food-related applicationsCombine multiple persona fields in prompts for richer, more nuanced content generation
Performance Considerations#
Synthetic personas add processing time: Only enable when the additional data provides value
Cache person objects when using the same personas across multiple columns
Consider batch generation for large datasets requiring consistent persona quality
Quality Assurance#
Validate persona consistency: Ensure generated content aligns with personality traits and demographic information
Test across different locales to understand quality variations
Review persona coherence when using multiple context-specific personas for the same individual