core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter#
Module Contents#
Classes#
The TRTLLM Converter class used for GPU (on device) conversion |
Functions#
Get torch datatype from input datatype |
API#
- core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter.str_dtype_to_torch(dtype: megatron.core.export.data_type.DataType)#
Get torch datatype from input datatype
- class core.export.trtllm.trtllm_weights_converter.distributed_trtllm_model_weights_converter.DistributedTRTLLMModelWeightsConverter(
- transformer_config: megatron.core.transformer.transformer_config.TransformerConfig,
- dtype: megatron.core.export.data_type.DataType,
- multi_query_mode: bool = False,
- activation: str = 'gelu',
- scales: Optional[dict] = None,
The TRTLLM Converter class used for GPU (on device) conversion
This class is used to convert models sharded and on gpus. (It assumes that the model is already sharded appropriate to how you want to export it). (i.e) If you want to export to tp2pp2, then load the model in tp2pp2 setting and pass in their respective state dictionaries
Initialization
Constructor for the TRTLLMModelWeightsConverterGPU class
This class is responsible to convert the model weights to TRTLLM equivalent weights.
- Parameters:
transformer_config (TransformerConfig) – The transformer config
dtype (DataType) – The data type or model precision
multi_query_mode (bool, optional) – Defaults to False.
activation (str, optional) – Defaults to “gelu”.
scales (dict, optional) – Dictionary with fp8 scaling factors.
- _add_to_trtllm_model_weights(val: torch.Tensor, layer_name: str)#
- _convert_transformer_layer(layer_name: str, val: torch.Tensor)#
Convert Transformer layers to TRTLLM weights
Transformer layers referes to layers within the transformber block. They have a layer number associated with them. Depending on the layer we either directly save it to trtllm_model_weights, or split it across some dimension and save the splits
- Parameters:
model_state_dict (dict) – The input model state dictionary (All collected on CPU)
layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change
- _convert_non_transformer_layer(
- model_state_dict: dict,
- layer_name: str,
Convert Non Transformer layers to TRTLLM weights
Non transformer layers referes to layers that occur only once in the model (e.g Embedding , final output layer etc. ) They dont have any layer number associated with them. We remove this layer from the original state dict and cast it to storage type and convert to numpy and add it to trtllm_model_weights
- Parameters:
model_state_dict (dict) – The input model state dictionary (All collected on CPU)
layer (TRTLLMLayerNames) – The TRTLLM Layer that we want to change
- _get_remove_vocab_padding(
- layer_name,
- model_state_dict,
- tokenizer_vocab_size,
- convert(
- model_state_dict: dict,
- trtllm_conversion_dict: dict,
- tokenizer_vocab_size: int,
Convert model weights to trtllm model weights
This method goes through each layer in the model state dict and converts to equivalent trtllm model weights. It also handles splitting across TP dimension , expert split etc.
- Parameters:
model_state_dict (dict) – The full model state dict (all on CPU)
trtllm_conversion_dict (dict) – The conversion dictionary used to convert model layer names to trtllm layer names
tokenizer_vocab_size (int) – The vocab size of the tokenizer