pii.algorithm
#
Module Contents#
Classes#
Cleans PII from an unstructured text |
API#
- class pii.algorithm.PiiDeidentifier(
- language: str = DEFAULT_LANGUAGE,
- supported_entities: list[str] | None = None,
- anonymize_action: str = 'replace',
- **kwargs,
Cleans PII from an unstructured text
Initialization
Parameters:
- Parameters:
(str) (anonymize_action) – 2-digit language code
(List[str]) (supported_entities) – List of entities to consider while doing deidentification
(str) – String that determines what to do for anonymization. Options are: redact, hash, replace, mask and custom
kwargs are additional anonymization related arguments. if anonymize_action is ‘replace’, ‘new_value’ can be provided as a substitution string if anonymize_action is ‘hash’, ‘hash_type’ can be provided (sha256, sha512 or md5) if anonymize_action is ‘mask’, ‘chars_to_mask’ and ‘masking_char’ can be provided if anonymize_action is ‘custom’, ‘lambda’ function can be provided
- add_custom_operator(
- entity: str,
- operator: presidio_anonymizer.entities.OperatorConfig,
Use a custom cleaning operation for a specific entity types
- add_custom_recognizer(
- recognizer: presidio_analyzer.EntityRecognizer,
Add a custom recognizer to detect entities based on user-defined logic
- analyze_text(
- text: str,
- entities: list[str] | None = None,
- language: str = 'en',
- analyze_text_batch(
- texts: list[str],
- entities: list[str] | None = None,
- language: str = 'en',
- batch_size: int = 32,
For processing batches, use batch analyzer
Parameters: texts (List[str]): List of texts to perform deidentification on batch_size (int): The number of texts to handle in a batch. This parameter is useful when using spacy models.
Returns: List(str): list of deidentified text
- deidentify_text(text: str) str #
Cleans PII data from text
Parameters: text (str): Text that may contain personally-identifiable information
Returns: str: Returns anonymized text
- deidentify_text_batch(
- texts: list[str],
- batch_size: int = 32,
For processing batches, use batch analyzer
Parameters: texts (List[str]): List of texts to perform deidentification on batch_size (int): The number of texts to handle in a batch. This parameter is useful when using spacy models.
Returns: List(str): list of deidentified text
- static from_config(
- config: collections.abc.Mapping[str, Any],
- static from_default_config() pii.algorithm.PiiDeidentifier #
- static from_yaml_file(
- path: pathlib.Path | str,
- list_operators() dict[str, presidio_anonymizer.entities.OperatorConfig] #
List all operators used to clean PII entities
- list_supported_entities() list[str] #
List all entities that are detected while cleaning a text