pii.algorithm#

Module Contents#

Classes#

PiiDeidentifier

Cleans PII from an unstructured text

API#

class pii.algorithm.PiiDeidentifier(
language: str = DEFAULT_LANGUAGE,
supported_entities: list[str] | None = None,
anonymize_action: str = 'replace',
**kwargs,
)#

Cleans PII from an unstructured text

Initialization

Parameters:

Parameters:
  • (str) (anonymize_action) – 2-digit language code

  • (List[str]) (supported_entities) – List of entities to consider while doing deidentification

  • (str) – String that determines what to do for anonymization. Options are: redact, hash, replace, mask and custom

kwargs are additional anonymization related arguments. if anonymize_action is ‘replace’, ‘new_value’ can be provided as a substitution string if anonymize_action is ‘hash’, ‘hash_type’ can be provided (sha256, sha512 or md5) if anonymize_action is ‘mask’, ‘chars_to_mask’ and ‘masking_char’ can be provided if anonymize_action is ‘custom’, ‘lambda’ function can be provided

add_custom_operator(
entity: str,
operator: presidio_anonymizer.entities.OperatorConfig,
) None#

Use a custom cleaning operation for a specific entity types

add_custom_recognizer(
recognizer: presidio_analyzer.EntityRecognizer,
) None#

Add a custom recognizer to detect entities based on user-defined logic

analyze_text(
text: str,
entities: list[str] | None = None,
language: str = 'en',
) list[list[presidio_analyzer.RecognizerResult]]#
analyze_text_batch(
texts: list[str],
entities: list[str] | None = None,
language: str = 'en',
batch_size: int = 32,
) list[list[presidio_analyzer.RecognizerResult]]#

For processing batches, use batch analyzer

Parameters: texts (List[str]): List of texts to perform deidentification on batch_size (int): The number of texts to handle in a batch. This parameter is useful when using spacy models.

Returns: List(str): list of deidentified text

deidentify_text(text: str) str#

Cleans PII data from text

Parameters: text (str): Text that may contain personally-identifiable information

Returns: str: Returns anonymized text

deidentify_text_batch(
texts: list[str],
batch_size: int = 32,
) list[str]#

For processing batches, use batch analyzer

Parameters: texts (List[str]): List of texts to perform deidentification on batch_size (int): The number of texts to handle in a batch. This parameter is useful when using spacy models.

Returns: List(str): list of deidentified text

static from_config(
config: collections.abc.Mapping[str, Any],
) pii.algorithm.PiiDeidentifier#
static from_default_config() pii.algorithm.PiiDeidentifier#
static from_yaml_file(
path: pathlib.Path | str,
) pii.algorithm.PiiDeidentifier#
list_operators() dict[str, presidio_anonymizer.entities.OperatorConfig]#

List all operators used to clean PII entities

list_supported_entities() list[str]#

List all entities that are detected while cleaning a text