core.datasets.retro.config.bert_embedders#
Container dataclass for holding both in-memory and on-disk Bert embedders.
Module Contents#
Classes#
Base class for all Bert embedders. |
|
Container dataclass for in-memory and on-disk Bert embedders. |
API#
- class core.datasets.retro.config.bert_embedders.Embedder#
Bases:
abc.ABCBase class for all Bert embedders.
All embedders should be able to embed either an entire text dataset (to a 2D numpy array), or a single text string (to a 1D numpy array).
- abstractmethod embed_text_dataset(
- text_dataset: torch.utils.data.Dataset,
Embed a text dataset.
- Parameters:
text_dataset (torch.utils.data.Dataset) – Text dataset to embed. Each sample of the text dataset should output a dict with a key ‘text’ and a string value.
- Returns:
A 2D ndarray with shape (len(text_dataset), dimension(embedder)).
- abstractmethod embed_text(text: str) numpy.ndarray#
Embed a simple string of text.
- Parameters:
text (str) – A single text sample.
- Returns:
A 1D ndarray with shape (dimensions(embedder),).
- class core.datasets.retro.config.bert_embedders.RetroBertEmbedders#
Container dataclass for in-memory and on-disk Bert embedders.