Generate Realistic Persons#
Data Designer provides comprehensive capabilities for generating realistic person data with optional synthetic personas. This feature creates synthetic individuals with complete demographic profiles, personality traits, and context-specific personas suitable for testing, research, and application development.
Person Objects in Data Designer#
Creating Person Samplers#
Person samplers generate realistic person entities with configurable attributes and optional synthetic persona data. Create them using the with_person_samplers
method:
builder.with_person_samplers({
"customer": {"sex": "Female", "locale": "en_US", "with_synthetic_personas": True},
"employee": {"sex": "Female", "locale": "en_GB", "with_synthetic_personas": False},
"random_person": {} # Default settings
},
keep_person_columns=True # False by default
)
Each sampler creates a different person object that you can reference throughout your data design.
Configuration Options#
Person samplers accept these configuration parameters:
Basic Configuration:
sex
: Specify “Male” or “Female” (optional)locale
: Language and region code (optional, e.g., “en_US”, “fr_FR”, “de_DE”)city
: City within the specified locale (optional)age_range
: Age range for filtering (default: ages above 18 only)state
: US state code, only valid when locale is set to “en_US” (optional)keep_person_columns
(default: False): When set to False, all person columns will be dropped from the final dataset.
Synthetic Personas Configuration:
with_synthetic_personas
(default: False): When set to True, generates detailed personality profiles, cultural backgrounds, skills, interests, and context-specific personas for comprehensive character modeling.
Filtering Notes:
When using US locale (“en_US”), you can filter on age range, sex, city, and state
For non-US locales, filtering is limited to age range, sex, and city only
You can choose either city or state when filtering, not both
Locale Support and Data Quality#
US Locale (en_US): Data Designer uses Gretel’s proprietary Probabilistic Generative Model (PGM) trained on US census demographic data. This provides high-quality, realistic, and demographically accurate person data with preserved attribute relationships, resulting in coherent person profiles.
Other Locales: Data Designer uses the Faker library as fallback. While Faker provides basic attributes like names and addresses, it doesn’t maintain the same demographic accuracy or attribute relationships as the PGM. Data quality is notably lower than for US-based personas.
Synthetic Personas: Available for all locales when with_synthetic_personas=True
. Persona generation adapts to cultural context based on the specified locale and demographic information.
Person Data Structure#
Core Demographic Fields (Always Available)#
Field Name |
Type |
Description |
---|---|---|
first_name |
str |
Person’s first name |
middle_name |
str |
None |
last_name |
str |
Person’s last name |
sex |
Sex |
Person’s sex (enum type) |
age |
int |
Person’s age |
zipcode |
str |
Zipcode/Postal Code |
street_number |
int |
str |
street_name |
str |
Name of the street |
unit |
str |
Unit/apartment number (US locale only) |
city |
str |
City name |
state |
str |
None |
county |
str |
None |
country |
str |
Country name |
ethnic_background |
str |
None |
marital_status |
str |
None |
education_level |
str |
None |
bachelors_field |
str |
None |
occupation |
str |
None |
uuid |
str |
None |
locale |
str |
Locale setting |
phone_number |
str |
None |
email_address |
str |
None |
birth_date |
date |
Calculated birth date based on age |
ssn |
str |
None |
Personality Traits (Available when with_synthetic_personas=True
)#
Big Five personality model with t-scores and interpretive labels:
Field Name |
Type |
Description |
---|---|---|
openness |
dict |
Openness to experience (t_score, label, description) |
conscientiousness |
dict |
Conscientiousness (t_score, label, description) |
extraversion |
dict |
Extraversion (t_score, label, description) |
agreeableness |
dict |
Agreeableness (t_score, label, description) |
neuroticism |
dict |
Neuroticism (t_score, label, description) |
Each personality trait contains:
t_score
: Standardized score (typically 0-100)label
: Interpretive label (“low”, “average”, “high”, “very high”)description
: Detailed behavioral description
Synthetic Persona Fields (Available when with_synthetic_personas=True
)#
Background and Development#
Field Name |
Type |
Description |
---|---|---|
cultural_background |
str |
Detailed narrative about cultural influences and upbringing |
skills_and_expertise |
str |
Comprehensive description of professional and personal capabilities |
skills_and_expertise_list |
str |
List format of key skills and competencies |
hobbies_and_interests |
str |
Detailed description of personal interests and activities |
hobbies_and_interests_list |
str |
List format of hobbies and interests |
career_goals_and_ambitions |
str |
Professional aspirations and long-term objectives |
Persona Profiles#
Field Name |
Type |
Description |
---|---|---|
persona |
str |
Brief summary personality profile |
detailed_persona |
str |
Comprehensive personality and behavioral description |
professional_persona |
str |
Work environment personality and career approach |
finance_persona |
str |
Financial decision-making style and money management approach |
healthcare_persona |
str |
Health and wellness attitudes and behaviors |
sports_persona |
str |
Sports interests and physical activity preferences |
arts_persona |
str |
Artistic tastes, cultural interests, and creative preferences |
travel_persona |
str |
Travel style, preferences, and exploration approach |
culinary_persona |
str |
Food interests, cooking style, and dining preferences |
Usage Examples#
Basic Person Generation#
# Generate person with demographic and personality data only
builder.with_person_samplers({
"basic_customer": {"locale": "en_US", "sex": "Female"}
})
Enhanced Person Generation with Synthetic Personas#
# Generate comprehensive person with full persona profiles
builder.with_person_samplers({
"detailed_customer": {
"locale": "en_US",
"sex": "Female",
"with_synthetic_personas": True
}
})
Using Person Data in Columns#
Extracting Basic Attributes#
builder.add_column(
name="customer_name",
type="expression",
expr="{{customer.first_name}} {{customer.last_name}}"
)
builder.add_column(
name="customer_contact",
type="expression",
expr="{{customer.email_address}}"
)
Using Personality Traits#
builder.add_column(
name="personality_summary",
type="expression",
expr="Extraversion: {{customer.extraversion.label}}, Openness: {{customer.openness.label}}"
)
Leveraging Synthetic Personas in Prompts#
builder.add_column(
name="personalized_marketing_message",
prompt="""
Create a personalized marketing message for this customer:
Name: {{customer.first_name}} {{customer.last_name}}
Professional Background: {{customer.professional_persona}}
Interests: {{customer.hobbies_and_interests_list}}
Financial Style: {{customer.finance_persona}}
Tailor the message to their personality and interests while promoting our financial planning services.
"""
)
builder.add_column(
name="product_recommendations",
prompt="""
Based on this customer profile, recommend 3 relevant products:
Customer: {{customer.first_name}} {{customer.last_name}}
Age: {{customer.age}}
Occupation: {{customer.occupation}}
Personality: {{customer.detailed_persona}}
Interests: {{customer.hobbies_and_interests}}
Focus on products that align with their personality traits and lifestyle.
"""
)
Best Practices#
Choosing Configuration Options#
Use
en_US
locale for maximum demographic accuracy and realismEnable
with_synthetic_personas=True
when you need rich character development, personalized content generation, or comprehensive behavioral modelingDisable synthetic personas for basic demographic testing or when computational efficiency is prioritized
Effective Persona Usage#
Match persona depth to use case: Use basic personas for simple applications, detailed personas for comprehensive character modeling
Leverage context-specific personas: Use
professional_persona
for workplace scenarios,culinary_persona
for food-related applicationsCombine multiple persona fields in prompts for richer, more nuanced content generation
Performance Considerations#
Synthetic personas add processing time: Only enable when the additional data provides value
Cache person objects when using the same personas across multiple columns
Consider batch generation for large datasets requiring consistent persona quality
Quality Assurance#
Validate persona consistency: Ensure generated content aligns with personality traits and demographic information
Test across different locales to understand quality variations
Review persona coherence when using multiple context-specific personas for the same individual