nemo_deploy.multimodal.megatron_multimodal_deployable_ray#
Module Contents#
Classes#
Ray actor that loads and runs inference on a shard of the multimodal model. |
|
A Ray Serve deployment for distributed Megatron multimodal models. |
Data#
API#
- nemo_deploy.multimodal.megatron_multimodal_deployable_ray.LOGGER = 'getLogger(...)'#
- nemo_deploy.multimodal.megatron_multimodal_deployable_ray.app = 'FastAPI(...)'#
- class nemo_deploy.multimodal.megatron_multimodal_deployable_ray.ModelWorker(
- megatron_checkpoint_filepath: str,
- rank: int,
- world_size: int,
- tensor_model_parallel_size: int,
- pipeline_model_parallel_size: int,
- master_port: str,
- master_addr: Optional[str] = None,
- replica_id: int = 0,
- **model_config_kwargs,
Ray actor that loads and runs inference on a shard of the multimodal model.
Each ModelWorker is responsible for a specific rank in the model parallel setup.
Initialization
- infer(
- inputs: Dict[str, Any],
Run inference on the model shard.
- class nemo_deploy.multimodal.megatron_multimodal_deployable_ray.MegatronMultimodalRayDeployable(
- megatron_checkpoint_filepath: str,
- num_gpus: int = 1,
- tensor_model_parallel_size: int = 1,
- pipeline_model_parallel_size: int = 1,
- model_id: str = 'megatron-model',
- **model_config_kwargs,
A Ray Serve deployment for distributed Megatron multimodal models.
This class coordinates model parallelism across multiple GPUs and nodes, with each shard handled by a separate Ray actor.
Initialization
Initialize the distributed Megatron multimodal model deployment.
- Parameters:
megatron_checkpoint_filepath (str) – Path to the Megatron checkpoint directory.
num_gpus (int) – Number of GPUs to use for the deployment
tensor_model_parallel_size (int) – Size of tensor model parallelism.
pipeline_model_parallel_size (int) – Size of pipeline model parallelism.
model_id (str) – Identifier for the model in API responses.
**model_config_kwargs – Additional model configuration arguments.
- async chat_completions(request: Dict[Any, Any])#
Handle multimodal chat completion requests.
Supports two image content formats (normalized internally to format 1):
{“type”: “image”, “image”: “url_or_base64”}
{“type”: “image_url”, “image_url”: {“url”: “url_or_base64”}} (OpenAI-style, converted to format 1)
- async completions(request: Dict[Any, Any])#
Handle multimodal completion requests.
- async list_models()#
List available models.
- async health_check()#
Health check endpoint.