Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Model Export to TensorRT-LLM#

To enable the export stage with a NeVa model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the export field to point to the desired NeVa configuration file. For example, if you want to use the neva/export_neva configuration, change the export field to neva/export_neva.

    defaults:
      - export: neva/export_neva
      ...
    
  2. In the stages field of conf/config.yaml, make sure the export stage is included. For example,

    stages:
      - export
      ...
    
  3. Configure infer.max_input_len and infer.max_output_len of the conf/export/neva/export_neva.yaml file to set the max_input_len and max_output_len to use for NVIDIA TensorRT-LLM model.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/export/neve/export_neva.yaml.

  2. Only max_batch_size: 1 is supported for now.