ModelArtifact#
- class nemo_microservices.types.shared.ModelArtifact(*args: Any, **kwargs: Any)
Bases:
BaseModel
- files_url: str
The location where the artifact files are stored.
- status: Literal['created', 'upload_failed', 'upload_completed']
Model artifact status.
## Values
“created” - Artifact has been created
“upload_failed” - Artifact upload has failed
“upload_completed” - Artifact upload has completed successfully
- backend_engine: Literal['nemo', 'trt_llm', 'vllm', 'faster_transformer', 'hugging_face'] | None = None
Type of backend engine.
## Values
“nemo” - NeMo framework engine
“trt_llm” - TensorRT-LLM engine
“vllm” - vLLM engine
“faster_transformer” - Faster Transformer engine
“hugging_face” - Hugging Face engine
- gpu_arch: str | None = None
The GPU architecture the model is optimized for.
- precision: Literal['int8', 'bf16', 'fp16', 'fp32', 'fp8-mixed', 'bf16-mixed'] | None = None
Type of model precision.
## Values
“int8” - 8-bit integer precision
“bf16” - Brain floating point precision
“fp16” - 16-bit floating point precision
“fp32” - 32-bit floating point precision
“fp8-mixed” - Mixed 8-bit floating point precision available on Hopper and later architectures.
“bf16-mixed” - Mixed Brain floating point precision
- tensor_parallelism: int | None = None
The number of GPU devices to split and process the model’s neural network layers.