bridge.models.llama.llama_provider#

Module Contents#

Classes#

LlamaModelProvider

Configuration class for Llama models.

Llama2ModelProvider7B

Configuration for a 7B parameter Llama 2 model.

Llama2ModelProvider13B

Configuration for a 13B parameter Llama 2 model.

Llama2ModelProvider70B

Configuration for a 70B parameter Llama 2 model.

Llama3ModelProvider

Configuration for Llama 3 models.

Llama31ModelProvider

Configuration for Llama 3.1 models.

Llama3ModelProvider8B

Configuration for an 8B parameter Llama 3 model.

Llama3ModelProvider70B

Configuration for a 70B parameter Llama 3 model.

Llama31ModelProvider8B

Configuration for an 8B parameter Llama 3.1 model.

Llama31ModelProvider70B

Configuration for a 70B parameter Llama 3.1 model.

Llama31ModelProvider405B

Configuration for a 405B parameter Llama 3.1 model.

Llama32ModelProvider1B

Configuration for a 1B parameter Llama 3.2 model.

Llama32ModelProvider3B

Configuration for a 3B parameter Llama 3.2 model.

CodeLlamaModelProvider7B

Configuration for a 7B parameter CodeLlama model.

CodeLlamaModelProvider13B

Configuration for a 13B parameter CodeLlama model.

CodeLlamaModelProvider34B

Configuration for a 34B parameter CodeLlama model.

CodeLlamaModelProvider70B

Configuration for a 70B parameter CodeLlama model.

Llama4ModelProvider

Configuration for Llama4 language model.

Llama4Experts16ModelProvider

Configuration for llama4 16-experts model.

Llama4Experts128ModelProvider

Configuration for llama4 128-experts model.

Functions#

apply_rope_scaling

Apply RoPE scaling for extending context length in Llama models.

Data#

API#

bridge.models.llama.llama_provider.logger#

‘getLogger(…)’

class bridge.models.llama.llama_provider.LlamaModelProvider#

Bases: megatron.bridge.models.gpt_provider.GPTModelProvider

Configuration class for Llama models.

Extends GPTConfig with specific settings optimized for Llama architectures. Includes configurations for normalization, activation functions, and various architecture-specific options.

normalization: str#

‘RMSNorm’

activation_func: Callable#

None

gated_linear_unit: bool#

True

position_embedding_type: str#

‘rope’

add_bias_linear: bool#

False

seq_length: int#

4096

attention_dropout: float#

0.0

hidden_dropout: float#

0.0

share_embeddings_and_output_weights: bool#

False

bias_activation_fusion: bool#

True

masked_softmax_fusion: bool#

‘field(…)’

bias_dropout_fusion: bool#

‘field(…)’

apply_rope_fusion: bool#

‘field(…)’

use_transformer_engine_op_fuser: Optional[bool]#

None

class bridge.models.llama.llama_provider.Llama2ModelProvider7B#

Bases: bridge.models.llama.llama_provider.LlamaModelProvider

Configuration for a 7B parameter Llama 2 model.

Specific configuration for the 7B Llama 2 model with 32 layers, 4096 hidden size, and 32 attention heads.

num_layers: int#

32

hidden_size: int#

4096

num_attention_heads: int#

32

num_query_groups: int#

32

ffn_hidden_size: int#

11008

class bridge.models.llama.llama_provider.Llama2ModelProvider13B#

Bases: bridge.models.llama.llama_provider.LlamaModelProvider

Configuration for a 13B parameter Llama 2 model.

Specific configuration for the 13B Llama 2 model with 40 layers, 5120 hidden size, and 40 attention heads.

num_layers: int#

40

hidden_size: int#

5120

num_attention_heads: int#

40

num_query_groups: int#

40

ffn_hidden_size: int#

13824

class bridge.models.llama.llama_provider.Llama2ModelProvider70B#

Bases: bridge.models.llama.llama_provider.LlamaModelProvider

Configuration for a 70B parameter Llama 2 model.

Specific configuration for the 70B Llama 2 model with 80 layers, 8192 hidden size, and 64 attention heads with 8 query groups.

num_layers: int#

80

hidden_size: int#

8192

num_attention_heads: int#

64

num_query_groups: int#

8

ffn_hidden_size: int#

28672

class bridge.models.llama.llama_provider.Llama3ModelProvider#

Bases: bridge.models.llama.llama_provider.LlamaModelProvider

Configuration for Llama 3 models.

Base configuration for Llama 3 architecture with common settings across different model sizes, including group query attention (GQA) and architecture-specific settings.

num_query_groups: int#

8

hidden_dropout: float#

0.0

attention_dropout: float#

0.0

normalization: str#

‘RMSNorm’

init_method_std: float#

0.01

layernorm_epsilon: float#

1e-05

add_bias_linear: bool#

False

activation_func: Callable#

None

gated_linear_unit: bool#

True

bias_activation_fusion: bool#

True

masked_softmax_fusion: bool#

‘field(…)’

bias_dropout_fusion: bool#

‘field(…)’

apply_rope_fusion: bool#

‘field(…)’

share_embeddings_and_output_weights: bool#

False

position_embedding_type: str#

‘rope’

rotary_percent: float#

1.0

class bridge.models.llama.llama_provider.Llama31ModelProvider#

Bases: bridge.models.llama.llama_provider.Llama3ModelProvider

Configuration for Llama 3.1 models.

Extends Llama3ModelProvider with specific settings for Llama 3.1 models, including RoPE scaling parameters.

scale_factor: float#

8.0

low_freq_factor: float#

1.0

high_freq_factor: float#

4.0

old_context_len: int#

8192

init_method_std: float#

0.02

provide(
pre_process=None,
post_process=None,
vp_stage=None,
tokenizer=None,
) megatron.core.models.gpt.GPTModel#

Configure and instantiate a Megatron Core Llama 3.1 model.

Extends the base configuration with Llama 3.1 specific RoPE scaling.

Parameters:
  • pre_process – Whether to include pre-processing in the model

  • post_process – Whether to include post-processing in the model

  • vp_stage – Virtual pipeline stage

  • tokenizer – Tokenizer used with the model

Returns:

Configured Megatron Core GPT model instance

Return type:

MCoreGPTModel

class bridge.models.llama.llama_provider.Llama3ModelProvider8B#

Bases: bridge.models.llama.llama_provider.Llama3ModelProvider

Configuration for an 8B parameter Llama 3 model.

Specific configuration for the 8B Llama 3 model with 32 layers, 4096 hidden size, and 32 attention heads.

rotary_base: int#

500000

seq_length: int#

8192

num_layers: int#

32

hidden_size: int#

4096

ffn_hidden_size: int#

14336

num_attention_heads: int#

32

class bridge.models.llama.llama_provider.Llama3ModelProvider70B#

Bases: bridge.models.llama.llama_provider.Llama3ModelProvider

Configuration for a 70B parameter Llama 3 model.

Specific configuration for the 70B Llama 3 model with 80 layers, 8192 hidden size, and 64 attention heads.

rotary_base: int#

500000

seq_length: int#

8192

num_layers: int#

80

hidden_size: int#

8192

ffn_hidden_size: int#

28672

num_attention_heads: int#

64

init_method_std: float#

0.008944

make_vocab_size_divisible_by: int#

128

class bridge.models.llama.llama_provider.Llama31ModelProvider8B#

Bases: bridge.models.llama.llama_provider.Llama31ModelProvider

Configuration for an 8B parameter Llama 3.1 model.

Specific configuration for the 8B Llama 3.1 model with 32 layers, 4096 hidden size, and 32 attention heads, supporting a longer context length of 131K tokens.

rotary_base: int#

500000

seq_length: int#

131072

num_layers: int#

32

hidden_size: int#

4096

ffn_hidden_size: int#

14336

num_attention_heads: int#

32

class bridge.models.llama.llama_provider.Llama31ModelProvider70B#

Bases: bridge.models.llama.llama_provider.Llama31ModelProvider

Configuration for a 70B parameter Llama 3.1 model.

Specific configuration for the 70B Llama 3.1 model with 80 layers, 8192 hidden size, and 64 attention heads, supporting a longer context length of 131K tokens.

rotary_base: int#

500000

seq_length: int#

131072

num_layers: int#

80

hidden_size: int#

8192

ffn_hidden_size: int#

28672

num_attention_heads: int#

64

make_vocab_size_divisible_by: int#

128

class bridge.models.llama.llama_provider.Llama31ModelProvider405B#

Bases: bridge.models.llama.llama_provider.Llama31ModelProvider

Configuration for a 405B parameter Llama 3.1 model.

Specific configuration for the 405B Llama 3.1 model with 126 layers, 16384 hidden size, and 128 attention heads, supporting a longer context length of 131K tokens.

rotary_base: int#

500000

seq_length: int#

131072

num_layers: int#

126

hidden_size: int#

16384

ffn_hidden_size: int#

53248

num_attention_heads: int#

128

make_vocab_size_divisible_by: int#

128

class bridge.models.llama.llama_provider.Llama32ModelProvider1B#

Bases: bridge.models.llama.llama_provider.Llama31ModelProvider

Configuration for a 1B parameter Llama 3.2 model.

Specific configuration for the 1B Llama 3.2 model with 16 layers, 2048 hidden size, and 32 attention heads (8 query groups).

scale_factor: float#

32.0

share_embeddings_and_output_weights: bool#

True

rotary_base: int#

500000

num_layers: int#

16

hidden_size: int#

2048

ffn_hidden_size: int#

8192

num_attention_heads: int#

32

num_query_groups: int#

8

make_vocab_size_divisible_by: int#

128

class bridge.models.llama.llama_provider.Llama32ModelProvider3B#

Bases: bridge.models.llama.llama_provider.Llama31ModelProvider

Configuration for a 3B parameter Llama 3.2 model.

Specific configuration for the 3B Llama 3.2 model with 28 layers, 3072 hidden size, and 24 attention heads (8 query groups).

scale_factor: int#

32

share_embeddings_and_output_weights: bool#

True

rotary_base: int#

500000

num_layers: int#

28

hidden_size: int#

3072

ffn_hidden_size: int#

8192

num_attention_heads: int#

24

num_query_groups: int#

8

make_vocab_size_divisible_by: int#

128

class bridge.models.llama.llama_provider.CodeLlamaModelProvider7B#

Bases: bridge.models.llama.llama_provider.Llama2ModelProvider7B

Configuration for a 7B parameter CodeLlama model.

Extends Llama2ModelProvider7B with modified settings specifically for code generation, including longer context length and different rotary base.

rotary_base: int#

1000000

seq_length: int#

16384

class bridge.models.llama.llama_provider.CodeLlamaModelProvider13B#

Bases: bridge.models.llama.llama_provider.Llama2ModelProvider13B

Configuration for a 13B parameter CodeLlama model.

Extends Llama2ModelProvider13B with modified settings specifically for code generation, including longer context length and different rotary base.

rotary_base: int#

1000000

seq_length: int#

16384

class bridge.models.llama.llama_provider.CodeLlamaModelProvider34B#

Bases: bridge.models.llama.llama_provider.LlamaModelProvider

Configuration for a 34B parameter CodeLlama model.

Specific configuration for the 34B CodeLlama model with 48 layers, 8192 hidden size, and 64 attention heads (8 query groups).

num_layers: int#

48

hidden_size: int#

8192

num_attention_heads: int#

64

num_query_groups: int#

8

ffn_hidden_size: int#

22016

rotary_base: int#

1000000

seq_length: int#

16384

class bridge.models.llama.llama_provider.CodeLlamaModelProvider70B#

Bases: bridge.models.llama.llama_provider.Llama2ModelProvider70B

Configuration for a 70B parameter CodeLlama model.

Extends Llama2ModelProvider70B with settings specifically for code generation.

class bridge.models.llama.llama_provider.Llama4ModelProvider#

Bases: bridge.models.llama.llama_provider.Llama3ModelProvider

Configuration for Llama4 language model.

rotary_base: int#

500000

seq_length: int#

8192

num_layers: int#

48

hidden_size: int#

5120

ffn_hidden_size: int#

16384

num_attention_heads: int#

40

vocab_size: int#

None

add_bias_linear: bool#

False

gated_linear_unit: bool#

True

rotary_interleaved: bool#

True

apply_rope_fusion: bool#

False

nope_layer_interval: int#

4

transformer_layer_spec: Union[megatron.core.transformer.ModuleSpec, Callable[[bridge.models.llama.llama_provider.LlamaModelProvider], megatron.core.transformer.ModuleSpec]]#

‘field(…)’

moe_grouped_gemm: bool#

True

moe_shared_expert_intermediate_size: int#

8192

moe_ffn_hidden_size: int#

8192

moe_router_topk: int#

1

moe_router_pre_softmax: bool#

False

moe_router_score_function: str#

‘sigmoid’

moe_token_dispatcher_type: str#

‘alltoall’

moe_router_dtype: Optional[str]#

None

moe_apply_probs_on_input: bool#

True

moe_shared_expert_overlap: bool#

True

moe_permute_fusion: bool#

False

qk_l2_norm: bool#

True

rope_scaling: bool#

True

rope_scaling_factor: float#

8.0

attention_chunk_size: int#

8192

class bridge.models.llama.llama_provider.Llama4Experts16ModelProvider#

Bases: bridge.models.llama.llama_provider.Llama4ModelProvider

Configuration for llama4 16-experts model.

num_moe_experts: int#

16

rope_scaling: bool#

True

rope_scaling_factor: float#

8.0

qk_l2_norm: bool#

True

class bridge.models.llama.llama_provider.Llama4Experts128ModelProvider#

Bases: bridge.models.llama.llama_provider.Llama4ModelProvider

Configuration for llama4 128-experts model.

num_moe_experts: int#

128

rope_scaling: bool#

False

moe_layer_freq: Union[int, List[int]]#

‘field(…)’

qk_l2_norm: bool#

False

bridge.models.llama.llama_provider.apply_rope_scaling(
inv_freq,
factor: float = 8.0,
low_freq_factor: float = 1.0,
high_freq_factor: float = 4.0,
old_context_len: int = 8192,
)#

Apply RoPE scaling for extending context length in Llama models.

This implements the NTK-aware RoPE scaling method used in Llama 3.1 models to extend context length beyond the original training length.

Parameters:
  • inv_freq – Original inverse frequency tensor

  • factor – Scaling factor for context length extension

  • low_freq_factor – Factor for low frequency components

  • high_freq_factor – Factor for high frequency components

  • old_context_len – Original context length

Returns:

Modified inverse frequency tensor for extended context

Return type:

torch.Tensor