core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter#

Module Contents#

Classes#

DistributedTRTLLMModelWeightsConverter

The TRTLLM Converter class used for GPU (on device) conversion

Functions#

str_dtype_to_torch

Get torch datatype from input datatype

API#

core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter.str_dtype_to_torch(dtype: megatron.core.export.data_type.DataType)#

Get torch datatype from input datatype

class core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter.DistributedTRTLLMModelWeightsConverter(
transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
dtype: megatron.core.export.data_type.DataType,
multi_query_mode: bool = False,
activation: str = 'gelu',
scales: Optional[dict] = None,
)#

The TRTLLM Converter class used for GPU (on device) conversion

This class is used to convert models sharded and on gpus. (It assumes that the model is already sharded appropriate to how you want to export it). (i.e) If you want to export to tp2pp2, then load the model in tp2pp2 setting and pass in their respective state dictionaries

Initialization

Constructor for the TRTLLMModelWeightsConverterGPU class

This class is responsible to convert the model weights to TRTLLM equivalent weights.

Parameters:
  • transformer_config (TransformerConfig) – The transformer config

  • dtype (DataType) – The data type or model precision

  • multi_query_mode (bool, optional) – Defaults to False.

  • activation (str, optional) – Defaults to “gelu”.

  • scales (dict, optional) – Dictionary with fp8 scaling factors.

_add_to_trtllm_model_weights(val: torch.Tensor, layer_name: str)#
_convert_transformer_layer(layer_name: str, val: torch.Tensor)#

Convert Transformer layers to TRTLLM weights

Transformer layers referes to layers within the transformber block. They have a layer number associated with them. Depending on the layer we either directly save it to trtllm_model_weights, or split it across some dimension and save the splits

Parameters:
  • model_state_dict (dict) – The input model state dictionary (All collected on CPU)

  • layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change

_convert_non_transformer_layer(
model_state_dict: dict,
layer_name: str,
)#

Convert Non Transformer layers to TRTLLM weights

Non transformer layers referes to layers that occur only once in the model (e.g Embedding , final output layer etc. ) They dont have any layer number associated with them. We remove this layer from the original state dict and cast it to storage type and convert to numpy and add it to trtllm_model_weights

Parameters:
  • model_state_dict (dict) – The input model state dictionary (All collected on CPU)

  • layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change

_get_remove_vocab_padding(
layer_name,
model_state_dict,
tokenizer_vocab_size,
)#
convert(
model_state_dict: dict,
trtllm_conversion_dict: dict,
tokenizer_vocab_size: int,
)#

Convert model weights to trtllm model weights

This method goes through each layer in the model state dict and converts to equivalent trtllm model weights. It also handles splitting across TP dimension , expert split etc.

Parameters:
  • model_state_dict (dict) – The full model state dict (all on CPU)

  • trtllm_conversion_dict (dict) – The conversion dictionary used to convert model layer names to trtllm layer names

  • tokenizer_vocab_size (int) – The vocab size of the tokenizer