Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Model Export to TensorRT-LLM
To enable the export stage with a NeVa model, configure the configuration files:
In the
defaultssection ofconf/config.yaml, update theexportfield to point to the desired NeVa configuration file. For example, if you want to use theneva/export_nevaconfiguration, change theexportfield toneva/export_neva.defaults: - export: neva/export_neva ...
In the
stagesfield ofconf/config.yaml, make sure theexportstage is included. For example,stages: - export ...
Configure
infer.max_input_lenandinfer.max_output_lenof theconf/export/neva/export_neva.yamlfile to set the max_input_len and max_output_len to use for NVIDIA TensorRT-LLM model.
Remarks:
To load a pretrained checkpoint for inference, set the
restore_from_pathfield in themodelsection to the path of the pretrained checkpoint in.nemoformat inconf/export/neve/export_neva.yaml.Only
max_batch_size: 1is supported for now.