Generate Realistic Persons#

Data Designer’s sampler column type can be used to sample realistic person data and synthetic personas. Generated using Data Designer itself, as well as a Probabilistic Graphical Model trained on census data, the sampled datasets are grounded in real-world demographic, geographic and personality trait distributions to capture the diversity and richness of the population.

Person Objects in Data Designer#

Person samplers generate person entities with configurable attributes. Each sampler creates a different person object that you can reference throughout your data design. There are two types of person samplers: Person and PersonFromFaker.

The Person sampler generates the best quality person data by sampling from the Nemotron-Personas collection. Grounded in real-world demographic data, this sampler type is supported for the following locales: “en_US”, “ja_JP”, “hi_IN”, and “en_IN”. Person samplers can optionally include synthetic persona data by setting with_synthetic_personas=True. Persona generation adapts to cultural context based on the specified locale and demographic information.

For other locales not supported by the Person sampler, the PersonFromFaker sampler uses the Faker library to generate person data (synthetic personas are not supported). While Faker provides basic attributes like names and addresses, it doesn’t maintain the same demographic accuracy or attribute relationships as the Nemotron-Personas datasets.

Configuration Options#

Person and PersonFromFaker samplers accept these configuration parameters:

  • sex: Specify “Male” or “Female” (optional)

  • locale: Language and region code (optional, e.g., “en_US”, “ja_JP”, “hi_IN”, “en_IN”, “fr_FR”, “de_DE”)

    • Person samplers only accept “en_US”, “ja_JP”, “hi_IN”, and “en_IN”

  • city: Filter on cities within the specified locale (optional)

  • age_range: Age range for filtering (default: ages above 18 only)

Person samplers additionally support:

builder.add_column(
    name="customer",
    column_type="sampler",
    sampler_type="person",
    params={
        "locale": "en_US",
        "sex": "Male",
        "with_synthetic_personas": True
    },
)

builder.add_column(
    name="employee",
    column_type="sampler",
    sampler_type="person",
    params={
        "locale": "ja_JP",
        "sex": "Female",
        "with_synthetic_personas": False
    },
)

builder.add_column(
    name="random_person",
    column_type="sampler",
    sampler_type="person_from_faker",
    params={
        "locale": "fr_FR",
    },
)

Person Data Structure#

Core Demographic Fields (Always Available)#

Field Name

Type

Description

uuid

str

Unique identifier

first_name

str

Person’s first name

last_name

str

Person’s last name

sex

categorical

Person’s sex (Male or Female)

age

int

Person’s age

country

str

Country name

marital_status

categorical

None

education_level

categorical

None

bachelors_field

categorical

None

occupation

str

None

birth_date

date

Calculated birth date based on age

email_address

str

Generated email address (None for age < 18)

locale

str

Locale

US-Specific Fields#

Field Name

Type

Description

unit

str

Unit/apartment number

street_number

int

Street number (numeric)

street_name

str

Name of the street

city

str

City name

zipcode

str

Zipcode/Postal Code

state

str

State

county

str

County

bachelors_field

categorical

Field of bachelor’s degree

phone_number

str

Generated phone number based on zipcode (None for age < 18)

ssn

str

Social Security Number

Japan-Specific Fields#

Field Name

Type

Description

area

str

Region of Japan

India-Specific Fields#

Field Name

Type

Description

zone

str

Level of urban development at address (Rural or Urban)

education_degree

str

Education level and post-secondary degree, if applicable

first_language

str

Persons’s native language

second_language

str

Person’s second language, if applicable

third_language

str

Person’s third language, if applicable

Personality Traits (Available when with_synthetic_personas=True)#

Big Five personality model with t-scores and interpretive labels:

Field Name

Type

Description

openness

dict

Openness to experience (t_score, label, description)

conscientiousness

dict

Conscientiousness (t_score, label, description)

extraversion

dict

Extraversion (t_score, label, description)

agreeableness

dict

Agreeableness (t_score, label, description)

neuroticism

dict

Neuroticism (t_score, label, description)

Each personality trait contains:

  • t_score: Standardized score (typically 0-100)

  • label: Interpretive label (“low”, “average”, “high”, “very high”)

  • description: Detailed behavioral description

Synthetic Persona Fields (Available when with_synthetic_personas=True)#

Background and Development#

Field Name

Type

Description

cultural_background

str

Detailed narrative about cultural influences and upbringing

skills_and_expertise

str

Comprehensive description of professional and personal capabilities

skills_and_expertise_list

str

List format of key skills and competencies

hobbies_and_interests

str

Detailed description of personal interests and activities

hobbies_and_interests_list

str

List format of hobbies and interests

career_goals_and_ambitions

str

Professional aspirations and long-term objectives

Persona Profile Fields#

Field Name

Type

Description

persona

str

Brief summary personality profile

detailed_persona

str

Comprehensive personality and behavioral description

professional_persona

str

Work environment personality and career approach

finance_persona

str

Financial decision-making style and money management approach

healthcare_persona

str

Health and wellness attitudes and behaviors

sports_persona

str

Sports interests and physical activity preferences

arts_persona

str

Artistic tastes, cultural interests, and creative preferences

travel_persona

str

Travel style, preferences, and exploration approach

culinary_persona

str

Food interests, cooking style, and dining preferences

Japan-Specific Persona Fields#

Field Name

Type

Description

aspects

str

Cultural, generational, social and communication considerations

digital_skills

str

Digital skill levels informed by population surveys

India-Specific Persona Fields#

Field Name

Type

Description

linguistic_background

str

Description of written and spoken language proficiency

religious_background

str

Description of religious background and beliefs

linguistic_persona

str

Linguistic background and language proficiency

religious_persona

str

Religious background, beliefs, and practices


Best Practices#

Choosing Configuration Options#

  • Use locales that are backed by a Nemotron-Personas dataset for maximum demographic accuracy and realism

  • Enable with_synthetic_personas=True when you need rich character development, personalized content generation, or comprehensive behavioral modeling

  • Disable synthetic personas for basic demographic testing or when computational efficiency is prioritized

Effective Persona Usage#

  • Match persona depth to use case: Use basic personas for simple applications, detailed personas for comprehensive character modeling

  • Leverage context-specific personas: Use professional_persona for workplace scenarios, culinary_persona for food-related applications

  • Combine multiple persona fields in prompts for richer, more nuanced content generation

Performance Considerations#

  • Synthetic personas add processing time: Only enable when the additional data provides value

  • Cache person objects when using the same personas across multiple columns

  • Consider batch generation for large datasets requiring consistent persona quality

Quality Assurance#

  • Validate persona consistency: Ensure generated content aligns with personality traits and demographic information

  • Test across different locales to understand quality variations

  • Review persona coherence when using multiple context-specific personas for the same individual