nemo_microservices.types.shared.model_artifact#

Module Contents#

Classes#

API#

class nemo_microservices.types.shared.model_artifact.ModelArtifact(/, **data: Any)#

Bases: nemo_microservices._models.BaseModel

backend_engine: nemo_microservices.types.shared.backend_engine_type.BackendEngineType | None#

None

Type of backend engine.

Values

  • "nemo" - NeMo framework engine

  • "trt_llm" - TensorRT-LLM engine

  • "vllm" - vLLM engine

  • "faster_transformer" - Faster Transformer engine

  • "hugging_face" - Hugging Face engine

files_url: str#

None

The location where the artifact files are stored.

gpu_arch: str | None#

None

The GPU architecture the model is optimized for.

precision: nemo_microservices.types.shared.model_precision.ModelPrecision | None#

None

Type of model precision.

Values

  • "int8" - 8-bit integer precision

  • "bf16" - Brain floating point precision

  • "fp16" - 16-bit floating point precision

  • "fp32" - 32-bit floating point precision

  • "fp8-mixed" - Mixed 8-bit floating point precision available on Hopper and later architectures.

  • "bf16-mixed" - Mixed Brain floating point precision

status: nemo_microservices.types.shared.artifact_status.ArtifactStatus#

None

Model artifact status.

Values

  • "created" - Artifact has been created

  • "upload_failed" - Artifact upload has failed

  • "upload_completed" - Artifact upload has completed successfully

tensor_parallelism: int | None#

None

The number of GPU devices to split and process the model’s neural network layers.