bridge.models.conversion.transformers_compat#

Compatibility utilities for HuggingFace transformers 5.0+ configs.

Module Contents#

Functions#

rope_theta_from_hf

Extract rope_theta from a HuggingFace config.

rope_local_base_freq_from_hf

Extract rope_local_base_freq from a HuggingFace config.

rope_scaling_factor_from_hf

Extract rope scaling factor from a HuggingFace config.

full_attention_interval_from_hf

Extract the full-attention interval from a Qwen3-Next-style HuggingFace config.

API#

bridge.models.conversion.transformers_compat.rope_theta_from_hf(config) float#

Extract rope_theta from a HuggingFace config.

This utility method handles the extraction of rope_theta (rotary position embedding base frequency) from HuggingFace configs, supporting both the legacy format (direct rope_theta attribute) and the new transformers 5.0+ format (rope_parameters dictionary).

Parameters:

config – HuggingFace configuration object.

Returns:

The rope_theta value for rotary embeddings.

Return type:

float

Raises:

ValueError – If rope_theta is not found in either format.

bridge.models.conversion.transformers_compat.rope_local_base_freq_from_hf(config) float#

Extract rope_local_base_freq from a HuggingFace config.

Similar to rope_theta_from_hf but for the local base frequency parameter used by some models (e.g., Gemma3).

Parameters:

config – HuggingFace configuration object.

Returns:

The rope_local_base_freq value.

Return type:

float

Raises:

ValueError – If rope_local_base_freq is not found in either format.

bridge.models.conversion.transformers_compat.rope_scaling_factor_from_hf(config, default: float = 1.0) float#

Extract rope scaling factor from a HuggingFace config.

This utility method handles the extraction of the rope scaling factor from HuggingFace configs, supporting both the legacy format (rope_scaling dict) and the new transformers 5.0+ format (rope_parameters dictionary).

Parameters:
  • config – HuggingFace configuration object.

  • default – Default value to return if no scaling factor is found.

Returns:

The rope scaling factor value, or default if not found.

Return type:

float

bridge.models.conversion.transformers_compat.full_attention_interval_from_hf(config, default: int = 4) int#

Extract the full-attention interval from a Qwen3-Next-style HuggingFace config.

In transformers <5.5 the interval was stored directly as config.full_attention_interval. In transformers >=5.5 the field was removed; the kwarg is consumed in __post_init__ and converted into config.layer_types (a list whose i-th entry is "linear_attention" or "full_attention" according to (i + 1) % interval). This helper handles both layouts.

Parameters:
  • config – HuggingFace configuration object (e.g. Qwen3NextConfig).

  • default – Value to return if neither layout is present.

Returns:

The interval at which standard attention layers appear.

Return type:

int