Networking with Legate Wheels#
MPI Support#
The Legate wheels are built with UCX-based networking support, but MPI itself is not provided as a PyPI wheel. System-installed MPI is required to bootstrap the UCX networking. Currently, Legate supports OpenMPI 4.x or above and MPICH 3.x or above. You can install OpenMPI or MPICH using your system’s package manager. For example, on Ubuntu:
$ sudo apt-get install libopenmpi-dev
# =========== OR =========== #
$ sudo apt-get install libmpich-dev
Legate attempts to discover which MPI implementation is available on your system and use one of its bundled MPI wrappers. There are two wrappers, one for OpenMPI and one for MPICH. Most of the time this process works automatically and does not require user intervention. However, in rare cases, the user may need to manually specify the MPI wrapper to use. For example, the following error indicates that wrapper may have to be chosen explicitly
LEGATE ERROR: #0 std::runtime_error: dlopen("libmpi.so") failed: libmpi.so: cannot open shared object file: No such file or directory, please make sure MPI is installed and libmpi.so is in your LD_LIBRARY_PATH.
The wheel comes bundled with two wrappers:
liblegate_mpi_wrapper_mpich.so
for MPICH, andliblegate_mpi_wrapper_ompi.so
for OpenMPI.
Selecting the MPI wrapper can be done by setting the LEGATE_MPI_WRAPPER
environment variable to the path of the desired MPI wrapper. For example, if
you have installed MPICH, you can set the LEGATE_MPI_WRAPPER
environment
variable to the MPICH wrapper:
$ export LEGATE_MPI_WRAPPER=liblegate_mpi_wrapper_mpich.so
Note that only the name of the wrapper is needed because Legate is compiled with appropriate rpaths that allow it to find the wrapper in the location where it was installed by the wheel.
The NERSC’s Perlmutter system is one example of where setting the wrapper may be
necessary. Perlmutter has 3 different MPICH modules, cray-mpich
,
cray-mpich-abi
, and mpich
(this module is created by NERSC). Legate
will work with cray-mpich-abi
and mpich
since it requires the stable ABI
version of MPICH provided by these modules. However, the cray-mpich-abi
module does not provide libmpi.so
, which is required by the Legate MPI
detection code. So, when using this module, one must set the wrapper
explicitly. The mpich
module, on the other hand, provides libmpi.so
, so
the Legate MPI detection code will work without any user intervention.
If neither of the pre-built wrappers work for your system, check Installation of the Legate MPI wrapper for instructions on building the MPI wrapper from source.
UCX Support#
The Legate wheels are built with UCX-based networking support against UCX provided by the libucx wheel. This UCX build should be functional in most cases. However, it does not support all networking protocols that UCX can potentially support. If you rely on support for specific networking hardware, you may prefer to use system-installed UCX that is built with that support. To prefer dynamically loading UCX system libraries set the following variable:
$ export RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=1
Potential Issues#
Note that the framework of automatic detection and automatic support for
networking depends on multiple dynamic loads of the MPI and the UCX libraries. Sometimes, errors may occur when an incorrect or conflicting libraries are
loaded. If that happens and a crash occurs, Legate should print out a
backtrace showing which versions of the different libraries were loaded. If one
of the libraries is loaded from a different location than expected, you can set
the LD_LIBRARY_PATH
environment variable to point to the correct location of
the libraries or use LD_PRELOAD
to force the loading of the correct library:
$ export LD_PRELOAD=/path/to/correct/libmpi.so
# =========== OR =========== #
$ export LD_LIBRARY_PATH=/path/to/correct/lib/:${LD_LIBRARY_PATH}
This approach may be used for a fine-grained control over which libraries are
loaded across the networking stack. However, it is not recommended to set
LD_PRELOAD
or LD_LIBRARY_PATH
globally, as this may cause conflicts with
other libraries and applications. Instead, it is recommended to set these
variables only for the specific Legate application you are running.
Tip
Legate wheels depend on three main components:
UCX,
MPI,
and the CUDA toolkit.
As discussed previously, Legate installs a basic configuration of UCX as a
dependency. Configurations that are known to work with Legate wheels include
Ubuntu 20.x and above with Ubuntu-installed OpenMPI, and with CUDA toolkit
12.2 or higher. The Legate wheel was also tested on Perlmutter with the
cray-mpich-abi
module and the mpich
modules.
However, because wheels are not as self contained as the Conda ecosystem, it is possible that other configurations may not work as well. If you encounter problems with the Legate wheels, visit the Contact page for more information on how to get help.`