ModelArtifact#

class nemo_microservices.types.shared.ModelArtifact(*args: Any, **kwargs: Any)

Bases: BaseModel

files_url: str

The location where the artifact files are stored.

status: Literal['created', 'upload_failed', 'upload_completed']

Model artifact status.

## Values

  • “created” - Artifact has been created

  • “upload_failed” - Artifact upload has failed

  • “upload_completed” - Artifact upload has completed successfully

backend_engine: Literal['nemo', 'trt_llm', 'vllm', 'faster_transformer', 'hugging_face'] | None = None

Type of backend engine.

## Values

  • “nemo” - NeMo framework engine

  • “trt_llm” - TensorRT-LLM engine

  • “vllm” - vLLM engine

  • “faster_transformer” - Faster Transformer engine

  • “hugging_face” - Hugging Face engine

gpu_arch: str | None = None

The GPU architecture the model is optimized for.

precision: Literal['int8', 'bf16', 'fp16', 'fp32', 'fp8-mixed', 'bf16-mixed'] | None = None

Type of model precision.

## Values

  • “int8” - 8-bit integer precision

  • “bf16” - Brain floating point precision

  • “fp16” - 16-bit floating point precision

  • “fp32” - 32-bit floating point precision

  • “fp8-mixed” - Mixed 8-bit floating point precision available on Hopper and later architectures.

  • “bf16-mixed” - Mixed Brain floating point precision

tensor_parallelism: int | None = None

The number of GPU devices to split and process the model’s neural network layers.