Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Model Export to TensorRT-LLM#
To enable the export stage with a NeVa model, configure the configuration files:
In the
defaultssection ofconf/config.yaml, update theexportfield to point to the desired NeVa configuration file. For example, if you want to use theneva/export_nevaconfiguration, change theexportfield toneva/export_neva.defaults: - export: neva/export_neva ...
In the
stagesfield ofconf/config.yaml, make sure theexportstage is included. For example,stages: - export ...
Configure
infer.max_input_lenandinfer.max_output_lenof theconf/export/neva/export_neva.yamlfile to set the max_input_len and max_output_len to use for NVIDIA TensorRT-LLM model.
Remarks:
To load a pretrained checkpoint for inference, set the
restore_from_pathfield in themodelsection to the path of the pretrained checkpoint in.nemoformat inconf/export/neve/export_neva.yaml.Only
max_batch_size: 1is supported for now.