`nemo_export.vllm_hf_exporter`#

Module Contents#

The Exporter class uses vLLM APIs to convert a HF model to vLLM and makes the class, deployable with Triton server.

class nemo_export.vllm_hf_exporter.vLLMHFExporter[source]#

Bases: nemo_deploy.ITritonDeployable

The Exporter class uses vLLM APIs to convert a HF model to vLLM and makes the class, deployable with Triton server.

.. rubric:: Example

from nemo_export import vLLMHFExporter from nemo_deploy import DeployPyTriton

exporter = vLLMHFExporter() exporter.export(model=”/path/to/model/”)

server = DeployPyTriton( model=exporter, triton_model_name=’model’ )

server.deploy() server.serve() server.stop()

Initialization

export(model, enable_lora: bool = False)[source]#

Exports the HF checkpoint to vLLM and initializes the engine.

forward( input_texts: List[str], max_output_len: int = 64, top_k: int = 1, top_p: float = 0.1, temperature: float = 1.0, lora_model_name: str = None, )[source]#