nemo_export.vllm_hf_exporter#

Module Contents#

Classes#

vLLMHFExporter

The Exporter class uses vLLM APIs to convert a HF model to vLLM and makes the class, deployable with Triton server.

API#

class nemo_export.vllm_hf_exporter.vLLMHFExporter[source]#

Bases: nemo_deploy.ITritonDeployable

The Exporter class uses vLLM APIs to convert a HF model to vLLM and makes the class, deployable with Triton server.

.. rubric:: Example

from nemo_export import vLLMHFExporter from nemo_deploy import DeployPyTriton

exporter = vLLMHFExporter() exporter.export(model=”/path/to/model/”)

server = DeployPyTriton( model=exporter, triton_model_name=’model’ )

server.deploy() server.serve() server.stop()

Initialization

export(model, enable_lora: bool = False)[source]#

Exports the HF checkpoint to vLLM and initializes the engine.

Parameters:

model (str) – model name or the path

add_lora_models(lora_model_name, lora_model)[source]#
property get_triton_input#
property get_triton_output#
triton_infer_fn(**inputs: numpy.ndarray)[source]#
forward(
input_texts: List[str],
max_output_len: int = 64,
top_k: int = 1,
top_p: float = 0.1,
temperature: float = 1.0,
lora_model_name: str = None,
)[source]#