core.datasets.retro.index.indexes.faiss_base#

This class implements a simple, un-optimized wrapper around a Faiss index, that implements the Index interface (see ..index.py). While this class is instantiable, it is meant to be extended with optimizations in classes that inherit from this class (see FaissParAddIndex, for an example).

Module Contents#

Classes#

FaissBaseIndex

Base class for Faiss-base indexes.

API#

class core.datasets.retro.index.indexes.faiss_base.FaissBaseIndex#

Bases: megatron.core.datasets.retro.index.index.Index

Base class for Faiss-base indexes.

This class wraps a Faiss index, and adds additional functionality for training and adding codes. This base class performs a naive sequential code adding, while the optimized FaissParallelAddIndex class performs a parallel index.add().

_train(
config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
) None#

Train index (rank 0’s method).

Parameters:

config (RetroPreprocessingConfig) – Retro preprocessing config.

train(
config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
) None#

Train index.

Parameters:

config (RetroPreprocessingConfig) – Retro preprocessing config.

_add(
config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
text_dataset: megatron.core.datasets.retro.utils.GPTToTextDataset,
) None#

Add to index (rank 0’s method).

Parameters:
add(
config: megatron.core.datasets.retro.config.RetroPreprocessingConfig,
text_dataset: megatron.core.datasets.retro.utils.GPTToTextDataset,
) str#

Add to index.

Parameters:
Returns:

File path to the populated index.