filters.models.qe_models#

Module Contents#

Classes#

COMETQEModel

Wrapper class for any COMET quality estimation models (https://github.com/Unbabel/COMET).

PyMarianQEModel

Abstract model for all quality estimation models for bitext.

QEModel

Abstract model for all quality estimation models for bitext.

Data#

API#

class filters.models.qe_models.COMETQEModel(
name: str,
model: collections.abc.Callable,
gpu: bool = False,
)#

Bases: filters.models.qe_models.QEModel

Wrapper class for any COMET quality estimation models (https://github.com/Unbabel/COMET).

Initialization

Args: name (str): A string named of the model. Not directly tied to MODEL_NAME_TO_HF_PATH as defined in some subclasses but it is suggested. model: A loaded model object. The type of the object depends on the loaded model type. gpu (bool, optional): Whether inference is on GPU. Defaults to False.

MODEL_NAME_TO_HF_PATH: Final[dict[str, str]]#

None

classmethod load_model(
model_name: str,
gpu: bool = False,
) filters.models.qe_models.COMETQEModel#

See parent class docstring for details on functionality and arguments.

predict(input_list: list) list[float]#

Implements quality estimation score prediction for COMET model.

Args: input_list (List): A list of bitext pairs wrapped as dictionaries.

Returns: List[float]: List of quality scores.

static wrap_qe_input(
src: str,
tgt: str,
reverse: bool = False,
) dict[str, str]#

See parent class docstring for details on functionality and arguments.

filters.models.qe_models.COMET_IMPORT_MSG#

‘To run QE filtering with COMET, you need to install from PyPI with: pip install unbabel-comet. Mor…’

filters.models.qe_models.PYMARIAN_IMPORT_MSG#

‘To run QE filtering with Cometoid/PyMarian, you need to install PyMarian. More information at https:…’

class filters.models.qe_models.PyMarianQEModel(
name: str,
model: collections.abc.Callable,
gpu: bool = False,
)#

Bases: filters.models.qe_models.QEModel

Abstract model for all quality estimation models for bitext.

Initialization

Args: name (str): A string named of the model. Not directly tied to MODEL_NAME_TO_HF_PATH as defined in some subclasses but it is suggested. model: A loaded model object. The type of the object depends on the loaded model type. gpu (bool, optional): Whether inference is on GPU. Defaults to False.

MARIAN_CPU_ARGS#

‘ –cpu-threads 1 -w 2000’

MARIAN_GPU_ARGS#

‘ -w 8000 –mini-batch 32 -d 0’

MODEL_NAME_TO_HF_PATH: Final[dict[str, str]]#

None

SHARD_SIZE#

5000

classmethod load_model(
model_name: str,
gpu: bool = False,
) filters.models.qe_models.PyMarianQEModel#

See parent class docstring for details on functionality and arguments.

predict(input_list: list) list[float]#

Implements quality estimation score prediction for Cometoid/PyMarian model.

Args: input_list (List): A list of bitext pairs wrapped as dictionaries.

Returns: List[float]: List of quality scores.

static wrap_qe_input(src: str, tgt: str, reverse: bool = False) list[str]#

See parent class docstring for details on functionality and arguments.

class filters.models.qe_models.QEModel(name: str, model: collections.abc.Callable, gpu: bool = False)#

Bases: abc.ABC

Abstract model for all quality estimation models for bitext.

Initialization

Args: name (str): A string named of the model. Not directly tied to MODEL_NAME_TO_HF_PATH as defined in some subclasses but it is suggested. model: A loaded model object. The type of the object depends on the loaded model type. gpu (bool, optional): Whether inference is on GPU. Defaults to False.

abstractmethod classmethod load_model(model_name: str) filters.models.qe_models.QEModel#

An abstract method that loads the model according to a model name.

Args: model_name (str): The name of the model to be loaded. Could be a huggingface model name, a path, or something else, depending on the implementation.

abstractmethod predict(**kwargs) list[float]#

An abstract method that calls the underlying model to produce estimated quality scores.

Returns: List[float]: List of quality scores.

abstractmethod static wrap_qe_input(src: str, tgt: str, reverse: bool = False) list[str]#

An abstract method that implements the following: given the individual source and target string of the bitext, wrap them into proper format that can be accepted by the underlying model.

Args: src (str): Source side string of the bitext. tgt (str): Target side string of the bitext. reverse (bool, optional): Whether to reverse the source and target side of the bitext. Defaults to False.

filters.models.qe_models.comet#

‘safe_import(…)’

filters.models.qe_models.pymarian#

‘safe_import(…)’