core.datasets.retro.index.indexes.faiss_base#
This class implements a simple, un-optimized wrapper around a Faiss index, that implements the Index interface (see ..index.py). While this class is instantiable, it is meant to be extended with optimizations in classes that inherit from this class (see FaissParAddIndex, for an example).
Module Contents#
Classes#
Base class for Faiss-base indexes. |
API#
- class core.datasets.retro.index.indexes.faiss_base.FaissBaseIndex#
Bases:
megatron.core.datasets.retro.index.index.IndexBase class for Faiss-base indexes.
This class wraps a Faiss index, and adds additional functionality for training and adding codes. This base class performs a naive sequential code adding, while the optimized FaissParallelAddIndex class performs a parallel index.add().
- _train(
- config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
Train index (rank 0’s method).
- Parameters:
config (RetroPreprocessingConfig) – Retro preprocessing config.
- train(
- config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
Train index.
- Parameters:
config (RetroPreprocessingConfig) – Retro preprocessing config.
- _add(
- config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
- text_dataset: megatron.core.datasets.retro.utils.GPTToTextDataset,
Add to index (rank 0’s method).
- Parameters:
config (RetroPreprocessingConfig) – Retro preprocessing config.
text_dataset (GPTToTextDataset) – Text dataset that will be embedded and added to the index.
- add(
- config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
- text_dataset: megatron.core.datasets.retro.utils.GPTToTextDataset,
Add to index.
- Parameters:
config (RetroPreprocessingConfig) – Retro preprocessing config.
text_dataset (GPTToTextDataset) – Text dataset that will be embedded and added to the index.
- Returns:
File path to the populated index.