nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter

Module Contents

Classes

Name	Description
`KimiK25VLStateDictAdapter`	State dict adapter for KimiK25VL checkpoints.

Functions

Name	Description
`dequantize_int4`	Dequantize INT4 packed weights to bfloat16.
`quantize_to_int4`	Quantize bfloat16/float16 weights to INT4 packed format.

Data

LOGGER

API

class nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter(
    config,
    moe_config: nemo_automodel.components.moe.config.MoEConfig,
    backend: nemo_automodel.components.models.common.BackendConfig,
    dtype: torch.dtype = torch.float32
)

Bases: MoESplitExpertsStateDictMixin, StateDictAdapter

State dict adapter for KimiK25VL checkpoints.

_last_expected_hf_keys

set[str] | None = None

_quant_shapes_cache

dict[str, tuple] | None = None

llm_adapter

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter._expand_quantized_keys(
    state_dict: dict
) -> dict

Expand expert ‘weight’ keys to INT4 triplets: _packed/_scale/*_shape.

MoE expert weights are known to be INT4 quantized in the HF checkpoint.

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter._is_quantized_expert_key(
    key: str
) -> bool

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.convert_single_tensor_to_hf(
    fqn: str,
    tensor: typing.Any,
    kwargs = {}
) -> list[tuple[str, typing.Any]]

Convert a single tensor from native format to HuggingFace format.

Parameters:

fqn

str

Fully qualified name of the tensor in native format

tensor

Any

The tensor to convert

**kwargs

Defaults to {}

Additional arguments for conversion

Returns: list[tuple[str, Any]]

List of (fqn, tensor) tuples in HuggingFace format

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.from_hf(
    state_dict: dict,
    kwargs = {}
) -> dict

Convert HF checkpoint state dict to model format.

This handles INT4 dequantization: _packed/_scale/*_shape -> weight

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.KimiK25VLStateDictAdapter.to_hf(
    state_dict: dict[str, typing.Any],
    exclude_key_regex: typing.Optional[str] = None,
    quantization: bool = False,
    kwargs = {}
) -> dict[str, typing.Any]

Convert from native model state dict to HuggingFace format.

If quantization=True, expert weights are quantized to INT4.

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.dequantize_int4(
    weight_packed: torch.Tensor,
    weight_scale: torch.Tensor,
    weight_shape: torch.Tensor,
    group_size: int = 32,
    device: str = 'cuda'
) -> torch.Tensor

Dequantize INT4 packed weights to bfloat16.

Extracts local tensors from DTensors before unpacking (bitwise ops don’t work on DTensor). Both weight_packed and weight_scale should have matching sharding so .to_local() gives corresponding slices automatically.

Parameters:

weight_packed

torch.Tensor

INT4 packed weights [out_features, in_features // 8], may be DTensor

weight_scale

torch.Tensor

Per-group scales [out_features, num_groups], should be DTensor with same sharding

weight_shape

torch.Tensor

Original shape [2], stores global dimensions

group_size

intDefaults to 32

Elements per scale group (default 32)

device

strDefaults to 'cuda'

Target device for computation

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.quantize_to_int4(
    weight: torch.Tensor,
    group_size: int = 32
) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Quantize bfloat16/float16 weights to INT4 packed format.

Returns: torch.Tensor

INT4 values packed into int32 (8 values per int32)

nemo_automodel.components.models.kimi_k25_vl.state_dict_adapter.LOGGER = logging.getLogger(__name__)