nemo_rl.models.megatron.converters.deepseek#

Module Contents#

Functions#

get_export_mapping

get_export_transforms

get_source_fn

Modify source state_dict before conversion.

API#

nemo_rl.models.megatron.converters.deepseek.get_export_mapping(source, source_config)#
nemo_rl.models.megatron.converters.deepseek.get_export_transforms()#
nemo_rl.models.megatron.converters.deepseek.get_source_fn(
source_state_dict: dict[str, Any],
source_config: dict[str, Any],
) nemo.lightning.io.state._ModelState#

Modify source state_dict before conversion.

In deepseek, HF weight model.layers.*.post_attention_layernorm.weight is mapped to mcore weight a) decoder.layers.*.mlp.linear_fc1.layer_norm_weight, if the layer is dense b) decoder.layers.*.pre_mlp_layernorm.weight, if the layer is MoE

We rename decoder.layers.*.mlp.linear_fc1.layer_norm_weight in the first case to unify key names