core.models.retro.decoder_spec#
Specs for Retro decoder.
Module Contents#
Functions#
Retro decoder TE spec (uses Transformer Engine components). |
|
Retro decoder local spec (uses Megatron-Core components). |
|
Retro decoder block spec. |
API#
- core.models.retro.decoder_spec.get_retro_decoder_layer_te_spec(
- encoder_block_spec: Union[megatron.core.transformer.ModuleSpec, megatron.core.transformer.transformer_block.TransformerBlockSubmodules, None] = None,
Retro decoder TE spec (uses Transformer Engine components).
A Retro decoder layer uses custom attention and bias-dropout-add operators to perform chunked-cross attention. Additionally, the first Retro decoder layer instantiates an entire encoder transformer block. As such, the decoder cross attention module takes an optional encoder block spec, which is only provided for the first Retro decoder layer.
- Parameters:
encoder_block_spec (ModuleSpec) – Retro encoder block spec, to be provided for the first Retro decoder layer.
- Returns:
A module spec with Transformer Engine modules.
- core.models.retro.decoder_spec.get_retro_decoder_layer_local_spec(
- encoder_block_spec: Optional[megatron.core.transformer.ModuleSpec] = None,
Retro decoder local spec (uses Megatron-Core components).
A Retro decoder layer uses custom attention and bias-dropout-add operators to perform chunked-cross attention. Additionally, the first Retro decoder layer instantiates an entire encoder transformer block. As such, the decoder cross attention module takes an optional encoder block spec, which is only provided for the first Retro decoder layer.
- Parameters:
encoder_block_spec (ModuleSpec) – Retro encoder block spec, to be provided for the first Retro decoder layer.
- Returns:
A module spec with local modules.
- core.models.retro.decoder_spec.get_retro_decoder_block_spec(
- config: megatron.core.models.retro.config.RetroConfig,
- use_transformer_engine: bool,
- vp_stage: Optional[int] = None,
- pp_rank: Optional[int] = None,
Retro decoder block spec.
Retro decoder block implementation details:
The retro decoder block consists of interleaved GPT layers and customized Retro decoder layers.
The Retro decoder layers are spaced three layers apart, and start on layer 6 or 9 (depending on the total number of layers).
The first decoder layer instantiates an encoder block, and it therefore passes in an encoder_block_spec.
- Parameters:
config (RetroConfig) – Retro config.
use_transformer_engine (bool) – If True, use Transformer Engine (instead of local modules.
vp_stage (Optional[int]) – Virtual pipeline stage number.
pp_rank (Optional[int]) – Pipeline parallel rank.
- Returns:
Transformer block submodules for the given spec.