nemo_automodel.components.models.step3p7.configuration_step3p7#
Module Contents#
Classes#
Configuration for the Step robotics vision encoder. |
|
Configuration for the Step3.7 language backbone. |
|
Top-level configuration for Step3.7 vision-language checkpoints. |
|
Compatibility config for original Step VLM checkpoints using |
Functions#
Convert config values that are valid in-memory but not JSON serializable. |
|
API#
- nemo_automodel.components.models.step3p7.configuration_step3p7._json_safe_value(value: Any) Any#
Convert config values that are valid in-memory but not JSON serializable.
- class nemo_automodel.components.models.step3p7.configuration_step3p7.StepRoboticsVisionEncoderConfig(
- width=1536,
- layers=47,
- heads=16,
- num_channels=3,
- image_size=728,
- mlp_ratio=8960 / 1536,
- patch_size=14,
- hidden_act='quick_gelu',
- layer_norm_eps=1e-05,
- ues_cls_token=False,
- use_cls_token: Optional[bool] = None,
- use_ln_pre=True,
- use_ln_post=False,
- use_abs_posemb=True,
- use_rope2d=True,
- ls_init_value=0.1,
- **kwargs,
Bases:
transformers.configuration_utils.PretrainedConfigConfiguration for the Step robotics vision encoder.
Initialization
- model_type#
‘perception_encoder’
- class nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7TextConfig(
- hidden_size: int = 4096,
- intermediate_size: int = 11264,
- num_attention_heads: int = 64,
- num_attention_groups: int = 8,
- num_hidden_layers: int = 45,
- num_nextn_predict_layers: int = 0,
- mtp_base_layer_idx: Optional[int] = None,
- max_seq_len: int = 128000,
- vocab_size: int = 128815,
- rms_norm_eps: float = 1e-05,
- moe_intermediate_size: int = 1280,
- moe_num_experts: int = 288,
- moe_top_k: int = 8,
- rope_theta: float = 10000,
- rope_scaling: Optional[dict[str, Any]] = None,
- max_position_embeddings: int = 128000,
- share_expert_dims: int = 1280,
- share_expert_dim: Optional[int] = None,
- head_dim: int = 128,
- norm_expert_weight: bool = True,
- layer_types: list[str] = None,
- sliding_window: Optional[int] = None,
- pad_token_id: int = 1,
- attention_dropout: float = 0.0,
- use_head_wise_attn_gate: bool = False,
- use_moe_router_bias: bool = False,
- moe_router_activation: str = 'softmax',
- moe_router_scaling_factor: float = 1.0,
- need_fp32_gate: bool = False,
- attention_other_setting: Optional[dict[str, Any]] = None,
- swiglu_limits: Optional[list[Optional[float]]] = None,
- swiglu_limits_shared: Optional[list[Optional[float]]] = None,
- use_rope_layers: Optional[list[bool]] = None,
- yarn_only_types: Optional[list[str]] = None,
- moe_layers_enum: tuple[int] = (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44),
- **kwargs,
Bases:
transformers.configuration_utils.PretrainedConfigConfiguration for the Step3.7 language backbone.
Initialization
- model_type#
‘step3p5’
- architectures#
[‘Step3p5ForCausalLM’]
- to_dict()#
- nemo_automodel.components.models.step3p7.configuration_step3p7._normalize_per_layer_values(
- values: Optional[Sequence[Any]],
- num_hidden_layers: int,
- nemo_automodel.components.models.step3p7.configuration_step3p7._slice_mtp_per_layer_values(
- values: Optional[Sequence[Any]],
- num_hidden_layers: int,
- num_nextn_predict_layers: int,
- default: Any,
- class nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7Config(
- vision_config: Optional[Union[dict, nemo_automodel.components.models.step3p7.configuration_step3p7.StepRoboticsVisionEncoderConfig]] = None,
- text_config: Optional[Union[dict, nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7TextConfig]] = None,
- understand_projector_stride: int = 2,
- projector_bias: bool = False,
- image_token_id: int = 151679,
- **kwargs,
Bases:
transformers.configuration_utils.PretrainedConfigTop-level configuration for Step3.7 vision-language checkpoints.
Initialization
- model_type#
‘step3p7’
- to_dict()#
- class nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p5VConfig(
- vision_config: Optional[Union[dict, nemo_automodel.components.models.step3p7.configuration_step3p7.StepRoboticsVisionEncoderConfig]] = None,
- text_config: Optional[Union[dict, nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7TextConfig]] = None,
- understand_projector_stride: int = 2,
- projector_bias: bool = False,
- image_token_id: int = 151679,
- **kwargs,
Bases:
nemo_automodel.components.models.step3p7.configuration_step3p7.Step3p7ConfigCompatibility config for original Step VLM checkpoints using
step3p5v.Initialization
- model_type#
‘step3p5v’