nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter

View as Markdown

Module Contents

Classes

NameDescription
KimiK25VLStateDictAdapterState dict adapter for KimiK25VL checkpoints.

Functions

NameDescription
dequantize_int4Dequantize INT4 packed weights to bfloat16.
quantize_to_int4Quantize bfloat16/float16 weights to INT4 packed format.

Data

LOGGER

API

class nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter(
config,
moe_config: nemo_automodel.components.moe.config.MoEConfig,
backend: nemo_automodel.components.models.common.BackendConfig,
dtype: torch.dtype = torch.float32
)

Bases: MoESplitExpertsStateDictMixin, StateDictAdapter

State dict adapter for KimiK25VL checkpoints.

_last_expected_hf_keys
set[str] | None = None
_quant_shapes_cache
dict[str, tuple] | None = None
llm_adapter
nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter._expand_quantized_keys(
state_dict: dict
) -> dict

Expand expert ‘weight’ keys to INT4 triplets: _packed/_scale/*_shape.

MoE expert weights are known to be INT4 quantized in the HF checkpoint.

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter._is_quantized_expert_key(
key: str
) -> bool
nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.convert_single_tensor_to_hf(
fqn: str,
tensor: typing.Any,
kwargs = {}
) -> list[tuple[str, typing.Any]]

Convert a single tensor from native format to HuggingFace format.

Parameters:

fqn
str

Fully qualified name of the tensor in native format

tensor
Any

The tensor to convert

**kwargs
Defaults to {}

Additional arguments for conversion

Returns: list[tuple[str, Any]]

List of (fqn, tensor) tuples in HuggingFace format

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.from_hf(
state_dict: dict,
kwargs = {}
) -> dict

Convert HF checkpoint state dict to model format.

This handles INT4 dequantization: _packed/_scale/*_shape -> weight

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.to_hf(
state_dict: dict[str, typing.Any],
exclude_key_regex: typing.Optional[str] = None,
quantization: bool = False,
kwargs = {}
) -> dict[str, typing.Any]

Convert from native model state dict to HuggingFace format.

If quantization=True, expert weights are quantized to INT4.

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.dequantize_int4(
weight_packed: torch.Tensor,
weight_scale: torch.Tensor,
weight_shape: torch.Tensor,
group_size: int = 32,
device: str = 'cuda'
) -> torch.Tensor

Dequantize INT4 packed weights to bfloat16.

Extracts local tensors from DTensors before unpacking (bitwise ops don’t work on DTensor). Both weight_packed and weight_scale should have matching sharding so .to_local() gives corresponding slices automatically.

Parameters:

weight_packed
torch.Tensor

INT4 packed weights [out_features, in_features // 8], may be DTensor

weight_scale
torch.Tensor

Per-group scales [out_features, num_groups], should be DTensor with same sharding

weight_shape
torch.Tensor

Original shape [2], stores global dimensions

group_size
intDefaults to 32

Elements per scale group (default 32)

device
strDefaults to 'cuda'

Target device for computation

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.quantize_to_int4(
weight: torch.Tensor,
group_size: int = 32
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Quantize bfloat16/float16 weights to INT4 packed format.

Returns: torch.Tensor

INT4 values packed into int32 (8 values per int32)

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.LOGGER = logging.getLogger(__name__)