nemo_export.trt_llm.qnemo.qnemo_to_tensorrt_llm#

Module Contents#

Functions#

qnemo_to_tensorrt_llm

Build TensorRT-LLM engine with trtllm-build command in a subprocess.

API#

nemo_export.trt_llm.qnemo.qnemo_to_tensorrt_llm.qnemo_to_tensorrt_llm(
nemo_checkpoint_path: str,
engine_dir: str,
max_input_len: int,
max_seq_len: Optional[int],
max_batch_size: int,
max_prompt_embedding_table_size: int,
tensor_parallel_size: Optional[int] = None,
pipeline_parallel_size: Optional[int] = None,
use_parallel_embedding: bool = False,
paged_kv_cache: bool = True,
use_paged_context_fmha: bool = True,
remove_input_padding: bool = True,
use_lora_plugin: Optional[str] = None,
lora_target_modules: Optional[List[str]] = None,
max_lora_rank: int = 64,
max_num_tokens: Optional[int] = None,
opt_num_tokens: Optional[int] = None,
max_beam_width: int = 1,
multiple_profiles: bool = False,
reduce_fusion: bool = True,
)#

Build TensorRT-LLM engine with trtllm-build command in a subprocess.