Regular Expression Detection#

Regular expressions find data that follows specific patterns, like phone numbers (555-123-4567) or Social Security numbers (123-45-6789). They provide fast and accurate detection for structured data with consistent formatting.

When to Use Regex Detection#

Regex detection works best when:

  • You have structured data like CSV files or databases

  • Your data has consistent formatting patterns

  • You need fast processing of large datasets

  • You want to find specific patterns like phone numbers, credit cards, or Social Security numbers

Supported Patterns#

Regex detection automatically finds these types of structured data:

Personal Identifiers#

  • Social Security Numbers: 123-45-6789, 123456789

  • ZIP Codes: 12345, 12345-6789

Contact Information#

  • Email Addresses: user@example.com

  • US Phone Numbers: (555) 123-4567, 555-123-4567, 555.123.4567

Financial Data#

  • Credit Card Numbers: 4111-1111-1111-1111, 4111111111111111

Technical Identifiers#

  • IP Addresses: 192.168.1.1, 2001:db8::1

  • URLs: https://example.com, http://website.org

How Regex Detection Works#

  1. Pattern Matching: Scans text using predefined regular expression patterns

  2. Format Validation: Verifies that detected patterns meet format requirements

  3. Context Filtering: Removes false positives based on surrounding context

  4. Entity Classification: Assigns appropriate entity types to detected patterns

Examples#

# Input CSV data
data = "John,john@email.com,555-123-4567,123-45-6789"

# Regex will detect:
# - "john@email.com" as EMAIL
# - "555-123-4567" as PHONE_NUMBER
# - "123-45-6789" as SSN
# Input text with multiple patterns
text = "Contact us at support@company.com or call (800) 555-0123. Our IP is 192.168.1.100."

# Regex will detect:
# - "support@company.com" as EMAIL
# - "(800) 555-0123" as PHONE_NUMBER
# - "192.168.1.100" as IP_ADDRESS