Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Model Export to TensorRT-LLM

To enable the export stage with a NeVa model, configure the configuration files:

  1. In the defaults section of conf/config.yaml, update the export field to point to the desired NeVa configuration file. For example, if you want to use the neva/export_neva configuration, change the export field to neva/export_neva.

    defaults:
      - export: neva/export_neva
      ...
    
  2. In the stages field of conf/config.yaml, make sure the export stage is included. For example,

    stages:
      - export
      ...
    
  3. Configure infer.max_input_len and infer.max_output_len of the conf/export/neva/export_neva.yaml file to set the max_input_len and max_output_len to use for NVIDIA TensorRT-LLM model.

Remarks:

  1. To load a pretrained checkpoint for inference, set the restore_from_path field in the model section to the path of the pretrained checkpoint in .nemo format in conf/export/neve/export_neva.yaml.

  2. Only max_batch_size: 1 is supported for now.