core.models.retro.config#

Configuration dataclass for a RetroModel.

Module Contents#

Classes#

RetroConfig

Configuration object for Retro models.

API#

class core.models.retro.config.RetroConfig#

Bases: megatron.core.transformer.TransformerConfig

Configuration object for Retro models.

retro_project_dir: str#

None

Retro project directory, which contains the preprocessed data for for pretraining. This directory is built during preprocessing (see tools/retro/README.md), and contains subdirectories for the chunk database and pretraining neighbors.

retro_block_size: int#

None

Number of records to load per data file, as saved during preprocessing. Block processing is used for efficient data preprocessing.

retro_chunk_length: int#

None

Chunk length used for performing chunked- cross-attention (CCA).

retro_encoder_num_layers: int#

2

Number of layers to use for the retrieval encoder.

retro_encoder_hidden_dropout: float#

0.1

Hidden dropout for retrieval encoder.

retro_encoder_attention_dropout: float#

0.1

Attention dropout for retrieval encoder.

retro_neighbor_dirs: dict#

None

Directory names of saved neighbor id files for train, valid, and test datasets.

retro_num_neighbors: int#

2

Number of neighbors to retrieve during pretraining.

retro_num_retrieved_chunks: int#

2

Number of chunks to retrieve from the retrieval database.

retro_retrieved_length: int#

None

Cached value of retro_num_retrieved_chunks * retro_chunk_length (i.e., the total number of retrieved tokens; neighbor + continuation).

retro_split_preprocessing: str#

None

Data split used during data preprocessing.

retro_verify_neighbor_count: bool#

True

Verify that len(GPT dataset) == len(saved neighbors).

__post_init__() None#

Validate Retro config.