core.models.retro.config#
Configuration dataclass for a RetroModel.
Module Contents#
Classes#
Configuration object for Retro models. |
API#
- class core.models.retro.config.RetroConfig#
Bases:
megatron.core.transformer.TransformerConfigConfiguration object for Retro models.
- retro_project_dir: str#
None
Retro project directory, which contains the preprocessed data for for pretraining. This directory is built during preprocessing (see tools/retro/README.md), and contains subdirectories for the chunk database and pretraining neighbors.
- retro_block_size: int#
None
Number of records to load per data file, as saved during preprocessing. Block processing is used for efficient data preprocessing.
- retro_chunk_length: int#
None
Chunk length used for performing chunked- cross-attention (CCA).
- retro_encoder_num_layers: int#
2
Number of layers to use for the retrieval encoder.
0.1
Hidden dropout for retrieval encoder.
- retro_encoder_attention_dropout: float#
0.1
Attention dropout for retrieval encoder.
- retro_neighbor_dirs: dict#
None
Directory names of saved neighbor id files for train, valid, and test datasets.
- retro_num_neighbors: int#
2
Number of neighbors to retrieve during pretraining.
- retro_num_retrieved_chunks: int#
2
Number of chunks to retrieve from the retrieval database.
- retro_retrieved_length: int#
None
Cached value of retro_num_retrieved_chunks * retro_chunk_length (i.e., the total number of retrieved tokens; neighbor + continuation).
- retro_split_preprocessing: str#
None
Data split used during data preprocessing.
- retro_verify_neighbor_count: bool#
True
Verify that len(GPT dataset) == len(saved neighbors).
- __post_init__() None#
Validate Retro config.