core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter#

Module Contents#

Classes#

SingleDeviceTRTLLMModelWeightsConverter

Class to convert Model weights to TRTLLM weights on CPU

Functions#

pad_vocab_size

Pad vocab size based on inference size

str_dtype_to_torch

Get torch datatype from input datatype

API#

core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.pad_vocab_size(vocab_size: int, tp_size: int)#

Pad vocab size based on inference size

core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.str_dtype_to_torch(dtype: megatron.core.export.data_type.DataType)#

Get torch datatype from input datatype

class core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.SingleDeviceTRTLLMModelWeightsConverter(
export_config: megatron.core.export.export_config.ExportConfig,
transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
dtype: megatron.core.export.data_type.DataType,
multi_query_mode: bool = False,
activation: str = 'gelu',
scales: Optional[dict] = None,
)#

Class to convert Model weights to TRTLLM weights on CPU

Initialization

Constructor for the TRTLLMModelWeightsConverterCPU class

This class is responsible to convert the model weights to TRTLLM equivalent weights and also split them for each GPU rank and return as a list.

Parameters:
  • export_config (ExportConfig) – The export config with inference tp size, pp size etc.

  • transformer_config (TransformerConfig) – The transformer config

  • dtype (DataType) – The data type or model precision

  • multi_query_mode (bool, optional) – Defaults to False.

  • activation (str, optional) – Defaults to “gelu”.

  • scales (dict, optional) – Dictionary with fp8 scaling factors.

_convert_non_transformer_layer(
model_state_dict: dict,
layer_name: str,
)#

Convert Non Transformer layers to TRTLLM weights

Non transformer layers referes to layers that occur only once in the model (e.g Embedding , final output layer etc. ) They dont have any layer number associated with them. We remove this layer from the original state dict and cast it to storage type and convert to numpy and add it to trtllm_model_weights

Parameters:
  • model_state_dict (dict) – The input model state dictionary (All collected on CPU)

  • layer_name (str) – The TRTLLM Layer name that we want to convert

_cast_value(val: torch.Tensor, layer_name: str) torch.Tensor#

Casts weights to the expected datatype. When appropriate scaling factor is found inside self.scales, the weight gets scaled before the cast.

Parameters:
  • val (torch.Tensor) – Model weight

  • layer_name (str) – Layer name, used for determining the scaling factor dictionary key

Returns:

The casted weight

Return type:

torch.Tensor

_convert_transformer_layer(layer_name: str, val: torch.Tensor)#

Convert Transformer layers to TRTLLM weights

Transformer layers referes to layers within the transformber block. They have a layer number associated with them. Depending on the layer we either directly save it to trtllm_model_weights, or split it across some dimension and save the splits

Parameters:
  • model_state_dict (dict) – The input model state dictionary (All collected on CPU)

  • layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change

convert(
model_state_dict: dict,
trtllm_conversion_dict,
state_dict_split_by_layer_numbers=True,
)#

Convert model weights to trtllm model weights

This method goes through each layer in the model state dict and converts to equivalent trtllm model weights. It also handles splitting across TP dimension , expert split etc.

Parameters:
  • model_state_dict (dict) – The full model state dict (all on CPU)

  • trtllm_conversion_dict (dict) – The conversion dictionary used to convert model layer names to trtllm layer names

  • state_dict_split_by_layer_numbers (bool, optional) – Are the model layers split by layer numbers in state dict. For example : mlp.fc1.weight can be represented like mlp.fc1.weight of shape [num_layers, hidden_dim, ffn_hidden_dim]} or it can be like mlp.fc1.layers.0.weight of shape [hidden_dim, ffn_hidden_dim], then mlp.fc1.layers.1.weight … for all layers. If you use represenation 2 set this to True. Defaults to True

get_padded_vocab_size() int#

Return the paded vocab size

We extract the lm head and vocab embedding and use that to determine padded_vocab_size

Returns:

Padded vocab size

Return type:

int

get_local_model_weights_per_gpu(mapping, trtllm_model_config: dict)#

Get the trtllm model weights split per gpu

Given the trtllm mapping information (tp, pp rank etc) we split the model weights in a list, with each element of the list corresponding to the weights of each gpu rank

Parameters:
  • mapping – The trtllm mapping information

  • trtllm_model_config (dict) – The trtllm model config