`nemo_deploy.multimodal.megatron_multimodal_deployable_ray`#

Module Contents#

Classes#

`ModelWorker`	Ray actor that loads and runs inference on a shard of the multimodal model.
`MegatronMultimodalRayDeployable`	A Ray Serve deployment for distributed Megatron multimodal models.

Data#

`LOGGER`
`app`

API#

nemo_deploy.multimodal.megatron_multimodal_deployable_ray.LOGGER = 'getLogger(...)'#

nemo_deploy.multimodal.megatron_multimodal_deployable_ray.app = 'FastAPI(...)'#

class nemo_deploy.multimodal.megatron_multimodal_deployable_ray.ModelWorker(

megatron_checkpoint_filepath: str,

rank: int,

world_size: int,

tensor_model_parallel_size: int,

pipeline_model_parallel_size: int,

master_port: str,

master_addr: Optional[str] = None,

replica_id: int = 0,

**model_config_kwargs,

)#

Ray actor that loads and runs inference on a shard of the multimodal model.

Each ModelWorker is responsible for a specific rank in the model parallel setup.

Initialization

infer( inputs: Dict[str, Any], ) → Dict[str, Any]#: Run inference on the model shard.

class nemo_deploy.multimodal.megatron_multimodal_deployable_ray.MegatronMultimodalRayDeployable(

megatron_checkpoint_filepath: str,

num_gpus: int = 1,

tensor_model_parallel_size: int = 1,

pipeline_model_parallel_size: int = 1,

model_id: str = 'megatron-model',

**model_config_kwargs,

)#

A Ray Serve deployment for distributed Megatron multimodal models.

This class coordinates model parallelism across multiple GPUs and nodes, with each shard handled by a separate Ray actor.

Initialization

Initialize the distributed Megatron multimodal model deployment.

Parameters:

megatron_checkpoint_filepath (str) – Path to the Megatron checkpoint directory.
num_gpus (int) – Number of GPUs to use for the deployment
tensor_model_parallel_size (int) – Size of tensor model parallelism.
pipeline_model_parallel_size (int) – Size of pipeline model parallelism.
model_id (str) – Identifier for the model in API responses.
**model_config_kwargs – Additional model configuration arguments.

async chat_completions(request: Dict[Any, Any])#

Handle multimodal chat completion requests.

Supports two image content formats (normalized internally to format 1):

{“type”: “image”, “image”: “url_or_base64”}
{“type”: “image_url”, “image_url”: {“url”: “url_or_base64”}} (OpenAI-style, converted to format 1)

async completions(request: Dict[Any, Any])#: Handle multimodal completion requests.

async list_models()#: List available models.

async health_check()#: Health check endpoint.

nemo_deploy.multimodal.megatron_multimodal_deployable_ray#

Module Contents#

Classes#

Data#

API#

`nemo_deploy.multimodal.megatron_multimodal_deployable_ray`#