nemo_export.trt_llm.qnemo.qnemo_to_tensorrt_llm#
Module Contents#
Functions#
Build TensorRT-LLM engine with trtllm-build command in a subprocess. |
API#
- nemo_export.trt_llm.qnemo.qnemo_to_tensorrt_llm.qnemo_to_tensorrt_llm(
- nemo_checkpoint_path: str,
- engine_dir: str,
- max_input_len: int,
- max_seq_len: Optional[int],
- max_batch_size: int,
- max_prompt_embedding_table_size: int,
- tensor_parallel_size: Optional[int] = None,
- pipeline_parallel_size: Optional[int] = None,
- use_parallel_embedding: bool = False,
- paged_kv_cache: bool = True,
- use_paged_context_fmha: bool = True,
- remove_input_padding: bool = True,
- use_lora_plugin: Optional[str] = None,
- lora_target_modules: Optional[List[str]] = None,
- max_lora_rank: int = 64,
- max_num_tokens: Optional[int] = None,
- opt_num_tokens: Optional[int] = None,
- max_beam_width: int = 1,
- multiple_profiles: bool = False,
- reduce_fusion: bool = True,
Build TensorRT-LLM engine with trtllm-build command in a subprocess.