Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Model Export to TensorRT-LLM
To enable the export stage with a NeVa model, configure the configuration files:
In the
defaults
section ofconf/config.yaml
, update theexport
field to point to the desired NeVa configuration file. For example, if you want to use theneva/export_neva
configuration, change theexport
field toneva/export_neva
.defaults: - export: neva/export_neva ...
In the
stages
field ofconf/config.yaml
, make sure theexport
stage is included. For example,stages: - export ...
Configure
infer.max_input_len
andinfer.max_output_len
of theconf/export/neva/export_neva.yaml
file to set the max_input_len and max_output_len to use for NVIDIA TensorRT-LLM model.
Remarks:
To load a pretrained checkpoint for inference, set the
restore_from_path
field in themodel
section to the path of the pretrained checkpoint in.nemo
format inconf/export/neve/export_neva.yaml
.Only
max_batch_size: 1
is supported for now.