`nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter`#

Module Contents#

Classes#

KimiK25VLStateDictAdapter

State dict adapter for KimiK25VL checkpoints.

Functions#

`dequantize_int4`	Dequantize INT4 packed weights to bfloat16.
`quantize_to_int4`	Quantize bfloat16/float16 weights to INT4 packed format.

Data#

LOGGER

API#

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.LOGGER#: ‘getLogger(…)’

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.dequantize_int4( weight_packed: torch.Tensor, weight_scale: torch.Tensor, weight_shape: torch.Tensor, group_size: int = 32, device: str = 'cuda', ) → torch.Tensor#

Dequantize INT4 packed weights to bfloat16.

Extracts local tensors from DTensors before unpacking (bitwise ops don’t work on DTensor). Both weight_packed and weight_scale should have matching sharding so .to_local() gives corresponding slices automatically.

Parameters:

weight_packed – INT4 packed weights [out_features, in_features // 8], may be DTensor
weight_scale – Per-group scales [out_features, num_groups], should be DTensor with same sharding
weight_shape – Original shape [2], stores global dimensions
group_size – Elements per scale group (default 32)
device – Target device for computation

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.quantize_to_int4( weight: torch.Tensor, group_size: int = 32, ) → tuple[torch.Tensor, torch.Tensor, torch.Tensor]#

Quantize bfloat16/float16 weights to INT4 packed format.

Returns:: INT4 values packed into int32 (8 values per int32) weight_scale: Per-group scale factors (float16) weight_shape: Original tensor shape (int64)
Return type:: weight_packed

class nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter( config, moe_config: nemo_automodel.components.moe.config.MoEConfig, backend: nemo_automodel.components.models.common.BackendConfig, dtype: torch.dtype = torch.float32, )#

Bases: nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin, nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapter

State dict adapter for KimiK25VL checkpoints.

Initialization

to_hf(

state_dict: dict[str, Any],

exclude_key_regex: Optional[str] = None,

quantization: bool = False,

**kwargs,

) → dict[str, Any]#

Convert from native model state dict to HuggingFace format.

If quantization=True, expert weights are quantized to INT4.

convert_single_tensor_to_hf(

fqn: str,

tensor: Any,

**kwargs,

) → list[tuple[str, Any]]#

Convert a single tensor from native format to HuggingFace format.

Parameters:

fqn – Fully qualified name of the tensor in native format
tensor – The tensor to convert
**kwargs – Additional arguments for conversion

Returns:

List of (fqn, tensor) tuples in HuggingFace format

_is_quantized_expert_key(key: str) → bool#

_expand_quantized_keys(state_dict: dict) → dict#

Expand expert ‘weight’ keys to INT4 triplets: _packed/_scale/*_shape.

MoE expert weights are known to be INT4 quantized in the HF checkpoint.

from_hf(state_dict: dict, **kwargs) → dict#

Convert HF checkpoint state dict to model format.

This handles INT4 dequantization: _packed/_scale/*_shape -> weight

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter#