ModelArtifact#

class nemo_microservices.types.shared_params.ModelArtifact

Bases: TypedDict

files_url: Required[str]

The location where the artifact files are stored.

status: Required[Literal['created', 'upload_failed', 'upload_completed']]

Model artifact status.

## Values

  • “created” - Artifact has been created

  • “upload_failed” - Artifact upload has failed

  • “upload_completed” - Artifact upload has completed successfully

backend_engine: Literal['nemo', 'trt_llm', 'vllm', 'faster_transformer', 'hugging_face']

Type of backend engine.

## Values

  • “nemo” - NeMo framework engine

  • “trt_llm” - TensorRT-LLM engine

  • “vllm” - vLLM engine

  • “faster_transformer” - Faster Transformer engine

  • “hugging_face” - Hugging Face engine

gpu_arch: str

The GPU architecture the model is optimized for.

precision: Literal['int8', 'bf16', 'fp16', 'fp32', 'fp8-mixed', 'bf16-mixed']

Type of model precision.

## Values

  • “int8” - 8-bit integer precision

  • “bf16” - Brain floating point precision

  • “fp16” - 16-bit floating point precision

  • “fp32” - 32-bit floating point precision

  • “fp8-mixed” - Mixed 8-bit floating point precision available on Hopper and later architectures.

  • “bf16-mixed” - Mixed Brain floating point precision

tensor_parallelism: int

The number of GPU devices to split and process the model’s neural network layers.