nemo_automodel.components.launcher.skypilot.utils#

Module Contents#

Functions#

_get_cloud

Return a sky cloud object for the given cloud name string.

submit_skypilot_job

Launch a training job on a cloud VM via SkyPilot.

Data#

API#

nemo_automodel.components.launcher.skypilot.utils.REMOTE_CONFIG_PATH#

‘/tmp/automodel_job_config.yaml’

nemo_automodel.components.launcher.skypilot.utils._DEFAULT_SETUP#

‘cd ~/sky_workdir && pip install -e . –quiet’

nemo_automodel.components.launcher.skypilot.utils._CLOUD_CLASSES#

None

nemo_automodel.components.launcher.skypilot.utils._get_cloud(cloud_name: str)[source]#

Return a sky cloud object for the given cloud name string.

nemo_automodel.components.launcher.skypilot.utils.submit_skypilot_job(
config: nemo_automodel.components.launcher.skypilot.config.SkyPilotConfig,
job_dir: str,
) int[source]#

Launch a training job on a cloud VM via SkyPilot.

The local job config written to job_dir/job_config.yaml is uploaded to REMOTE_CONFIG_PATH on the remote VM. The code in the current working directory is synced to ~/sky_workdir via SkyPilot’s workdir mechanism.

Parameters:
  • config – Populated SkyPilotConfig (including the training command).

  • job_dir – Local directory holding the job artifacts.

Returns:

0 on successful submission.