nemo_automodel.components.launcher.skypilot.utils

View as Markdown

Module Contents

Functions

NameDescription
_get_cloudReturn a sky cloud object for the given cloud name string.
submit_skypilot_jobLaunch a training job on a cloud VM via SkyPilot.

Data

REMOTE_CONFIG_PATH

_CLOUD_CLASSES

_DEFAULT_SETUP

API

nemo_automodel.components.launcher.skypilot.utils._get_cloud(
cloud_name: str
)

Return a sky cloud object for the given cloud name string.

nemo_automodel.components.launcher.skypilot.utils.submit_skypilot_job(
config: nemo_automodel.components.launcher.skypilot.config.SkyPilotConfig,
job_dir: str
) -> int

Launch a training job on a cloud VM via SkyPilot.

The local job config written to job_dir/job_config.yaml is uploaded to REMOTE_CONFIG_PATH on the remote VM. The code in the current working directory is synced to ~/sky_workdir via SkyPilot’s workdir mechanism.

Parameters:

config
SkyPilotConfig

Populated SkyPilotConfig (including the training command).

job_dir
str

Local directory holding the job artifacts.

Returns: int

0 on successful submission.

nemo_automodel.components.launcher.skypilot.utils.REMOTE_CONFIG_PATH = '/tmp/automodel_job_config.yaml'
nemo_automodel.components.launcher.skypilot.utils._CLOUD_CLASSES = {'aws': 'AWS', 'gcp': 'GCP', 'azure': 'Azure', 'lambda': 'Lambda', 'kubernetes':...
nemo_automodel.components.launcher.skypilot.utils._DEFAULT_SETUP = 'cd ~/sky_workdir && pip install -e . --quiet'