nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter#
Module Contents#
Classes#
State dict adapter for KimiK25VL checkpoints. |
Functions#
Dequantize INT4 packed weights to bfloat16. |
|
Quantize bfloat16/float16 weights to INT4 packed format. |
Data#
API#
- nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.LOGGER#
‘getLogger(…)’
- nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.dequantize_int4(
- weight_packed: torch.Tensor,
- weight_scale: torch.Tensor,
- weight_shape: torch.Tensor,
- group_size: int = 32,
- device: str = 'cuda',
Dequantize INT4 packed weights to bfloat16.
Extracts local tensors from DTensors before unpacking (bitwise ops don’t work on DTensor). Both weight_packed and weight_scale should have matching sharding so .to_local() gives corresponding slices automatically.
- Parameters:
weight_packed – INT4 packed weights [out_features, in_features // 8], may be DTensor
weight_scale – Per-group scales [out_features, num_groups], should be DTensor with same sharding
weight_shape – Original shape [2], stores global dimensions
group_size – Elements per scale group (default 32)
device – Target device for computation
- nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.quantize_to_int4(
- weight: torch.Tensor,
- group_size: int = 32,
Quantize bfloat16/float16 weights to INT4 packed format.
- Returns:
INT4 values packed into int32 (8 values per int32) weight_scale: Per-group scale factors (float16) weight_shape: Original tensor shape (int64)
- Return type:
weight_packed
- class nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter(
- config,
- moe_config: nemo_automodel.components.moe.config.MoEConfig,
- backend: nemo_automodel.components.models.common.BackendConfig,
- dtype: torch.dtype = torch.float32,
Bases:
nemo_automodel.components.moe.state_dict_mixin.MoESplitExpertsStateDictMixin,nemo_automodel.components.checkpoint.state_dict_adapter.StateDictAdapterState dict adapter for KimiK25VL checkpoints.
Initialization
- to_hf(
- state_dict: dict[str, Any],
- exclude_key_regex: Optional[str] = None,
- quantization: bool = False,
- **kwargs,
Convert from native model state dict to HuggingFace format.
If quantization=True, expert weights are quantized to INT4.
- convert_single_tensor_to_hf(
- fqn: str,
- tensor: Any,
- **kwargs,
Convert a single tensor from native format to HuggingFace format.
- Parameters:
fqn – Fully qualified name of the tensor in native format
tensor – The tensor to convert
**kwargs – Additional arguments for conversion
- Returns:
List of (fqn, tensor) tuples in HuggingFace format
- _is_quantized_expert_key(key: str) bool#
- _expand_quantized_keys(state_dict: dict) dict#
Expand expert ‘weight’ keys to INT4 triplets: _packed/_scale/*_shape.
MoE expert weights are known to be INT4 quantized in the HF checkpoint.
- from_hf(state_dict: dict, **kwargs) dict#
Convert HF checkpoint state dict to model format.
This handles INT4 dequantization: _packed/_scale/*_shape -> weight