nemo_rl.models.megatron.converters.deepseek
#
Module Contents#
Functions#
Modify source state_dict before conversion. |
API#
- nemo_rl.models.megatron.converters.deepseek.get_export_mapping(source, source_config)#
- nemo_rl.models.megatron.converters.deepseek.get_export_transforms()#
- nemo_rl.models.megatron.converters.deepseek.get_source_fn(
- source_state_dict: dict[str, Any],
- source_config: dict[str, Any],
Modify source state_dict before conversion.
In deepseek, HF weight
model.layers.*.post_attention_layernorm.weight
is mapped to mcore weight a)decoder.layers.*.mlp.linear_fc1.layer_norm_weight
, if the layer is dense b)decoder.layers.*.pre_mlp_layernorm.weight
, if the layer is MoEWe rename decoder.layers.*.mlp.linear_fc1.layer_norm_weight in the first case to unify key names