utils.llm_pii_utils#

Module Contents#

Classes#

Functions#

find_entity_spans

Find the start and end indexes for each entity in the given text.

fix_overlaps

Handle overlaps in entity spans

get_system_prompt

redact

Redact given entities from the original text

validate_entity

Validate entity

validate_keys

Validate that keys in entity dict match schema

Data#

API#

class utils.llm_pii_utils.EntitySpan#
end_position: int#

None

entity_type: str#

None

start_position: int#

None

utils.llm_pii_utils.JSON_SCHEMA#

None

utils.llm_pii_utils.PII_LABELS#

[‘medical_record_number’, ‘location’, ‘address’, ‘ssn’, ‘date_of_birth’, ‘date_time’, ‘name’, ‘email…

utils.llm_pii_utils.find_entity_spans(
text: str,
entities: list[dict[str, str]],
) list[utils.llm_pii_utils.EntitySpan]#

Find the start and end indexes for each entity in the given text.

Args: text (str): The input text string. entities (list): A list of entities, where each entity is a dictionary containing the entity text and its type.

Returns: list: A list of EntitySpan objects, where each contains the entity type, start position, and end position.

utils.llm_pii_utils.fix_overlaps(
spans: list[utils.llm_pii_utils.EntitySpan],
) list[utils.llm_pii_utils.EntitySpan]#

Handle overlaps in entity spans

utils.llm_pii_utils.get_system_prompt(pii_labels: list[str] = PII_LABELS) str#
utils.llm_pii_utils.redact(full_text: str, pii_entities: list[dict[str, str]]) str#

Redact given entities from the original text

utils.llm_pii_utils.validate_entity(
entity: dict[str, str],
text: str,
min_length: int = 2,
) bool#

Validate entity

utils.llm_pii_utils.validate_keys(entity_dict: dict) bool#

Validate that keys in entity dict match schema