Is this page helpful?

Generate Realistic Persons#

Data Designer’s sampler column type can be used to sample realistic person data and synthetic personas. Generated using Data Designer itself, as well as a Probabilistic Graphical Model trained on census data, the sampled datasets are grounded in real-world demographic, geographic and personality trait distributions to capture the diversity and richness of the population.

Person Objects in Data Designer#

Person samplers generate person entities with configurable attributes. Each sampler creates a different person object that you can reference throughout your data design. There are two types of person samplers: Person and PersonFromFaker.

The Person sampler generates the best quality person data by sampling from the Nemotron-Personas collection. Grounded in real-world demographic data, this sampler type is supported for the following locales: “en_US”, “ja_JP”, “hi_IN”, and “en_IN”. Person samplers can optionally include synthetic persona data by setting with_synthetic_personas=True. Persona generation adapts to cultural context based on the specified locale and demographic information.

For other locales not supported by the Person sampler, the PersonFromFaker sampler uses the Faker library to generate person data (synthetic personas are not supported). While Faker provides basic attributes like names and addresses, it doesn’t maintain the same demographic accuracy or attribute relationships as the Nemotron-Personas datasets.

Configuration Options#

Person and PersonFromFaker samplers accept these configuration parameters:

sex: Specify “Male” or “Female” (optional)
locale: Language and region code (optional, e.g., “en_US”, “ja_JP”, “hi_IN”, “en_IN”, “fr_FR”, “de_DE”)
- Person samplers only accept “en_US”, “ja_JP”, “hi_IN”, and “en_IN”
city: Filter on cities within the specified locale (optional)
age_range: Age range for filtering (default: ages above 18 only)

Person samplers additionally support:

with_synthetic_personas (default: False): When set to True, samples detailed personality profiles, cultural backgrounds, skills, interests, and context-specific personas for comprehensive character modeling. The personas are sampled from NVIDIA’s Nemotron-Personas Collection, which currently includes Nemotron-Personas-USA, Nemotron-Personas-India and Nemotron-Personas-Japan.

builder.add_column(
    name="customer",
    column_type="sampler",
    sampler_type="person",
    params={
        "locale": "en_US",
        "sex": "Male",
        "with_synthetic_personas": True
    },
)

builder.add_column(
    name="employee",
    column_type="sampler",
    sampler_type="person",
    params={
        "locale": "ja_JP",
        "sex": "Female",
        "with_synthetic_personas": False
    },
)

builder.add_column(
    name="random_person",
    column_type="sampler",
    sampler_type="person_from_faker",
    params={
        "locale": "fr_FR",
    },
)

Person Data Structure#

Core Demographic Fields (Always Available)#

Field Name	Type	Description
uuid	str	Unique identifier
first_name	str	Person’s first name
last_name	str	Person’s last name
sex	categorical	Person’s sex (Male or Female)
age	int	Person’s age
country	str	Country name
marital_status	categorical	None
education_level	categorical	None
bachelors_field	categorical	None
occupation	str	None
birth_date	date	Calculated birth date based on age
email_address	str	Generated email address (None for age < 18)
locale	str	Locale

US-Specific Fields#

Field Name	Type	Description
unit	str	Unit/apartment number
street_number	int	Street number (numeric)
street_name	str	Name of the street
city	str	City name
zipcode	str	Zipcode/Postal Code
state	str	State
county	str	County
bachelors_field	categorical	Field of bachelor’s degree
phone_number	str	Generated phone number based on zipcode (None for age < 18)
ssn	str	Social Security Number

Japan-Specific Fields#

Field Name	Type	Description
area	str	Region of Japan

India-Specific Fields#

Field Name	Type	Description
zone	str	Level of urban development at address (Rural or Urban)
education_degree	str	Education level and post-secondary degree, if applicable
first_language	str	Persons’s native language
second_language	str	Person’s second language, if applicable
third_language	str	Person’s third language, if applicable

Personality Traits (Available when `with_synthetic_personas=True`)#

Big Five personality model with t-scores and interpretive labels:

Field Name	Type	Description
openness	dict	Openness to experience (t_score, label, description)
conscientiousness	dict	Conscientiousness (t_score, label, description)
extraversion	dict	Extraversion (t_score, label, description)
agreeableness	dict	Agreeableness (t_score, label, description)
neuroticism	dict	Neuroticism (t_score, label, description)

Each personality trait contains:

t_score: Standardized score (typically 0-100)
label: Interpretive label (“low”, “average”, “high”, “very high”)
description: Detailed behavioral description

Synthetic Persona Fields (Available when `with_synthetic_personas=True`)#

Background and Development#

Field Name	Type	Description
cultural_background	str	Detailed narrative about cultural influences and upbringing
skills_and_expertise	str	Comprehensive description of professional and personal capabilities
skills_and_expertise_list	str	List format of key skills and competencies
hobbies_and_interests	str	Detailed description of personal interests and activities
hobbies_and_interests_list	str	List format of hobbies and interests
career_goals_and_ambitions	str	Professional aspirations and long-term objectives

Persona Profile Fields#

Field Name	Type	Description
persona	str	Brief summary personality profile
detailed_persona	str	Comprehensive personality and behavioral description
professional_persona	str	Work environment personality and career approach
finance_persona	str	Financial decision-making style and money management approach
healthcare_persona	str	Health and wellness attitudes and behaviors
sports_persona	str	Sports interests and physical activity preferences
arts_persona	str	Artistic tastes, cultural interests, and creative preferences
travel_persona	str	Travel style, preferences, and exploration approach
culinary_persona	str	Food interests, cooking style, and dining preferences

Japan-Specific Persona Fields#

Field Name	Type	Description
aspects	str	Cultural, generational, social and communication considerations
digital_skills	str	Digital skill levels informed by population surveys

India-Specific Persona Fields#

Field Name	Type	Description
linguistic_background	str	Description of written and spoken language proficiency
religious_background	str	Description of religious background and beliefs
linguistic_persona	str	Linguistic background and language proficiency
religious_persona	str	Religious background, beliefs, and practices

Best Practices#

Choosing Configuration Options#

Use locales that are backed by a Nemotron-Personas dataset for maximum demographic accuracy and realism
Enable with_synthetic_personas=True when you need rich character development, personalized content generation, or comprehensive behavioral modeling
Disable synthetic personas for basic demographic testing or when computational efficiency is prioritized

Effective Persona Usage#

Match persona depth to use case: Use basic personas for simple applications, detailed personas for comprehensive character modeling
Leverage context-specific personas: Use professional_persona for workplace scenarios, culinary_persona for food-related applications
Combine multiple persona fields in prompts for richer, more nuanced content generation

Performance Considerations#

Synthetic personas add processing time: Only enable when the additional data provides value
Cache person objects when using the same personas across multiple columns
Consider batch generation for large datasets requiring consistent persona quality

Quality Assurance#

Validate persona consistency: Ensure generated content aligns with personality traits and demographic information
Test across different locales to understand quality variations
Review persona coherence when using multiple context-specific personas for the same individual