nemo_microservices.types.shared_params.model_artifact#
Module Contents#
Classes#
API#
- class nemo_microservices.types.shared_params.model_artifact.ModelArtifact#
Bases:
typing_extensions.TypedDict- backend_engine: nemo_microservices.types.shared.backend_engine_type.BackendEngineType#
None
Type of backend engine.
Values
"nemo"- NeMo framework engine"trt_llm"- TensorRT-LLM engine"vllm"- vLLM engine"faster_transformer"- Faster Transformer engine"hugging_face"- Hugging Face engine
- files_url: typing_extensions.Required[str]#
None
The location where the artifact files are stored.
- gpu_arch: str#
None
The GPU architecture the model is optimized for.
- precision: nemo_microservices.types.shared.model_precision.ModelPrecision#
None
Type of model precision.
Values
"int8"- 8-bit integer precision"bf16"- Brain floating point precision"fp16"- 16-bit floating point precision"fp32"- 32-bit floating point precision"fp8-mixed"- Mixed 8-bit floating point precision available on Hopper and later architectures."bf16-mixed"- Mixed Brain floating point precision
- status: typing_extensions.Required[nemo_microservices.types.shared.artifact_status.ArtifactStatus]#
None
Model artifact status.
Values
"created"- Artifact has been created"upload_failed"- Artifact upload has failed"upload_completed"- Artifact upload has completed successfully
- tensor_parallelism: int#
None
The number of GPU devices to split and process the model’s neural network layers.