nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter#

Module Contents#

Classes#

KimiK25VLStateDictAdapter

State dict adapter for KimiK25VL checkpoints.

Functions#

dequantize_int4

Dequantize INT4 packed weights to bfloat16.

quantize_to_int4

Quantize bfloat16/float16 weights to INT4 packed format.

Data#

API#

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.LOGGER#

‘getLogger(…)’

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.dequantize_int4(
weight_packed: torch.Tensor,
weight_scale: torch.Tensor,
weight_shape: torch.Tensor,
group_size: int = 32,
device: str = 'cuda',
) torch.Tensor#

Dequantize INT4 packed weights to bfloat16.

Extracts local tensors from DTensors before unpacking (bitwise ops don’t work on DTensor). Both weight_packed and weight_scale should have matching sharding so .to_local() gives corresponding slices automatically.

Parameters:
  • weight_packed – INT4 packed weights [out_features, in_features // 8], may be DTensor

  • weight_scale – Per-group scales [out_features, num_groups], should be DTensor with same sharding

  • weight_shape – Original shape [2], stores global dimensions

  • group_size – Elements per scale group (default 32)

  • device – Target device for computation

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.quantize_to_int4(
weight: torch.Tensor,
group_size: int = 32,
) tuple[torch.Tensor, torch.Tensor, torch.Tensor]#

Quantize bfloat16/float16 weights to INT4 packed format.

Returns:

INT4 values packed into int32 (8 values per int32) weight_scale: Per-group scale factors (float16) weight_shape: Original tensor shape (int64)

Return type:

weight_packed

class nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter(
config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.float32,
)#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

State dict adapter for KimiK25VL checkpoints.

Initialization

to_hf(
state_dict: dict[str, Any],
exclude_key_regex: Optional[str] = None,
quantization: bool = False,
**kwargs,
) dict[str, Any]#

Convert from native model state dict to HuggingFace format.

If quantization=True, expert weights are quantized to INT4.

convert_single_tensor_to_hf(
fqn: str,
tensor: Any,
**kwargs,
) list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Parameters:
  • fqn – Fully qualified name of the tensor in native format

  • tensor – The tensor to convert

  • **kwargs – Additional arguments for conversion

Returns:

List of (fqn, tensor) tuples in HuggingFace format

_is_quantized_expert_key(key: str) bool#
_expand_quantized_keys(state_dict: dict) dict#

Expand expert ‘weight’ keys to INT4 triplets: _packed/_scale/*_shape.

MoE expert weights are known to be INT4 quantized in the HF checkpoint.

from_hf(state_dict: dict, **kwargs) dict#

Convert HF checkpoint state dict to model format.

This handles INT4 dequantization: _packed/_scale/*_shape -> weight