core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter#
Module Contents#
Classes#
Class to convert Model weights to TRTLLM weights on CPU |
Functions#
Pad vocab size based on inference size |
|
Get torch datatype from input datatype |
API#
- core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.pad_vocab_size(vocab_size: int, tp_size: int)#
Pad vocab size based on inference size
- core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.str_dtype_to_torch(dtype: megatron.core.export.data_type.DataType)#
Get torch datatype from input datatype
- class core.export.trtllm.trtllm_weights_converter.single_device_trtllm_model_weights_converter.SingleDeviceTRTLLMModelWeightsConverter(
- export_config: megatron.core.export.export_config.ExportConfig,
- transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
- dtype: megatron.core.export.data_type.DataType,
- multi_query_mode: bool = False,
- activation: str = 'gelu',
- scales: Optional[dict] = None,
Class to convert Model weights to TRTLLM weights on CPU
Initialization
Constructor for the TRTLLMModelWeightsConverterCPU class
This class is responsible to convert the model weights to TRTLLM equivalent weights and also split them for each GPU rank and return as a list.
- Parameters:
export_config (ExportConfig) – The export config with inference tp size, pp size etc.
transformer_config (TransformerConfig) – The transformer config
dtype (DataType) – The data type or model precision
multi_query_mode (bool, optional) – Defaults to False.
activation (str, optional) – Defaults to “gelu”.
scales (dict, optional) – Dictionary with fp8 scaling factors.
- _convert_non_transformer_layer(
- model_state_dict: dict,
- layer_name: str,
Convert Non Transformer layers to TRTLLM weights
Non transformer layers referes to layers that occur only once in the model (e.g Embedding , final output layer etc. ) They dont have any layer number associated with them. We remove this layer from the original state dict and cast it to storage type and convert to numpy and add it to trtllm_model_weights
- Parameters:
model_state_dict (dict) – The input model state dictionary (All collected on CPU)
layer_name (str) – The TRTLLM Layer name that we want to convert
- _cast_value(val: torch.Tensor, layer_name: str) torch.Tensor#
Casts weights to the expected datatype. When appropriate scaling factor is found inside self.scales, the weight gets scaled before the cast.
- Parameters:
val (torch.Tensor) – Model weight
layer_name (str) – Layer name, used for determining the scaling factor dictionary key
- Returns:
The casted weight
- Return type:
torch.Tensor
- _convert_transformer_layer(layer_name: str, val: torch.Tensor)#
Convert Transformer layers to TRTLLM weights
Transformer layers referes to layers within the transformber block. They have a layer number associated with them. Depending on the layer we either directly save it to trtllm_model_weights, or split it across some dimension and save the splits
- Parameters:
model_state_dict (dict) – The input model state dictionary (All collected on CPU)
layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change
- convert(
- model_state_dict: dict,
- trtllm_conversion_dict,
- state_dict_split_by_layer_numbers=True,
Convert model weights to trtllm model weights
This method goes through each layer in the model state dict and converts to equivalent trtllm model weights. It also handles splitting across TP dimension , expert split etc.
- Parameters:
model_state_dict (dict) – The full model state dict (all on CPU)
trtllm_conversion_dict (dict) – The conversion dictionary used to convert model layer names to trtllm layer names
state_dict_split_by_layer_numbers (bool, optional) – Are the model layers split by layer numbers in state dict. For example : mlp.fc1.weight can be represented like mlp.fc1.weight of shape [num_layers, hidden_dim, ffn_hidden_dim]} or it can be like mlp.fc1.layers.0.weight of shape [hidden_dim, ffn_hidden_dim], then mlp.fc1.layers.1.weight … for all layers. If you use represenation 2 set this to True. Defaults to True
- get_padded_vocab_size() int#
Return the paded vocab size
We extract the lm head and vocab embedding and use that to determine padded_vocab_size
- Returns:
Padded vocab size
- Return type:
int
- get_local_model_weights_per_gpu(mapping, trtllm_model_config: dict)#
Get the trtllm model weights split per gpu
Given the trtllm mapping information (tp, pp rank etc) we split the model weights in a list, with each element of the list corresponding to the weights of each gpu rank
- Parameters:
mapping – The trtllm mapping information
trtllm_model_config (dict) – The trtllm model config