NeMo-Run documentation#

NeMo-Run is a powerful tool designed to streamline the configuration, execution and management of Machine Learning experiments across various computing environments. NeMo Run has three core responsibilities:

  1. Configuration

  2. Execution

  3. Management

Please click into each link to learn more. This is also the typical order Nemo Run users will follow to setup and launch experiments.

Installation#

To install the project, use the following command:

pip install git+https://github.com/NVIDIA/NeMo-Run.git

To install Skypilot, we have optional features available.

pip install git+https://github.com/NVIDIA/NeMo-Run.git[skypilot] will install Skypilot w Kubernetes

pip install git+https://github.com/NVIDIA/NeMo-Run.git[skypilot-all] will install Skypilot w all clouds

You can also manually install Skypilot from https://skypilot.readthedocs.io/en/latest/getting-started/installation.html

Make sure you have pip installed and configured properly.

Tutorials#

The hello_world tutorial series provides a comprehensive introduction to NeMo Run, demonstrating its capabilities through a simple example. The tutorial covers:

  • Configuring Python functions using Partial and Config classes.

  • Executing configured functions locally and on remote clusters.

  • Visualizing configurations with graphviz.

  • Creating and managing experiments using run.Experiment.

You can find the tutorial series below:

  1. Part 1

  2. Part 2

  3. Part 3