image image image image image

On This Page

The sources for SHMEM and OMPI can be found at $HPCX_HOME/sources/ .

Please refer to $HPCX_HOME/sources/ and HPC-X README file for more information on building details.

Profiling MPI API

To profile MPI API

$ export IPM_KEYFILE=$HPCX_IPM_DIR/etc/ipm_key_mpi
$ export IPM_LOG=FULL
$ export LD_PRELOAD=$HPCX_IPM_DIR/lib/libipm.so
$ mpirun -x LD_PRELOAD <...>
$ $HPCX_IPM_DIR/bin/ipm_parse -html outfile.xml

For further details on profiling MPI API, please refer to: http://ipm-hpc.org/

The NVIDIA®-supplied version of IPM contains an additional feature (Barrier before Collective), not found in the standard package, that allows end users to easily determine the extent of application imbalance in applications which use collectives. This feature instruments each collective so that it calls MPI_Barrier() before calling the collective operation itself. Time spent in this MPI_Barrier() is not counted as communication time, so by running an application with and without the Barrier before Collective feature, the extent to which application imbalance is a factor in performance can be assessed.

The instrumentation can be applied on a per-collective basis, and is controlled by the following environment variables:

$ export IPM_ADD_BARRIER_TO_REDUCE=1
$ export IPM_ADD_BARRIER_TO_ALLREDUCE=1
$ export IPM_ADD_BARRIER_TO_GATHER=1
$ export IPM_ADD_BARRIER_TO_ALL_GATHER=1
$ export IPM_ADD_BARRIER_TO_ALLTOALL=1
$ export IPM_ADD_BARRIER_TO_ALLTOALLV=1
$ export IPM_ADD_BARRIER_TO_BROADCAST=1
$ export IPM_ADD_BARRIER_TO_SCATTER=1
$ export IPM_ADD_BARRIER_TO_SCATTERV=1
$ export IPM_ADD_BARRIER_TO_GATHERV=1
$ export IPM_ADD_BARRIER_TO_ALLGATHERV=1
$ export IPM_ADD_BARRIER_TO_REDUCE_SCATTER=1

By default, all values are set to '0'.

Rebuilding Open MPI

Rebuilding Open MPI Using a Helper Script 

The $HPCX_ROOT/utils/hpcx_rebuild.sh script can rebuild OMPI and UCX from HPC-X using the same sources and configuration. It also takes into account HPC-X's environments: vanilla, MT and CUDA.

For details, run:

$HPCX_ROOT/utils/hpcx_rebuild.sh --help

Rebuilding Open MPI from HPC-X Sources

HPC-X package contains Open MPI sources that can be found in $HPCX_HOME/sources/ folder. Further information can be found in HPC-X README file.

To build Open MPI from sources:

$ HPCX_HOME=/path/to/extracted/hpcx
$ ./configure --prefix=${HPCX_HOME}/hpcx-ompi
           --with-hcoll=${HPCX_HOME}/hcoll \ --with-ucx=${HPCX_HOME}/ucx \
           --with-platform=contrib/platform/mellanox/optimized \
           --with-slurm --with-pmix
$ make -j9 all && make -j9 install


Open MPI and OpenSHMEM are pre-compiled with UCX and HCOLL, and use them by default.

If HPC-X is intended to be used with SLURM PMIx plugin, Open MPI should be built against external PMIx, Libevent and HWLOC and the same Libevent and PMIx libraries should be used for both SLURM and Open MPI.

Additional configuration options:

--with-pmix=<path-to-pmix>
--with-libevent=<path-to-libevent>
--with-hwloc=<path-to-hwloc>

Loading KNEM Module

UCX intra-node communication uses the KNEM module, which improves the performance significantly. Make sure this module is loaded on your system:

$ modprobe knem

On RHEL systems, to enable the KNEM module on machine boot, add these commands into the /etc/rc.modules script.

Making /dev/knem public accessible posses no security threat, as only the memory buffer that was explicitly made readable and/or writable can be accessed read and/or write through the 64bit cookie. Moreover, recent KNEM releases enforce by default that the attacker and the target process have the same UID which prevent any security issues.

Running MPI with HCOLL

HCOLL is enabled by default in HPC-X.

  • Running with default HCOLL configuration parameters:


    $ mpirun -mca coll_hcoll_enable 1 -x HCOLL_MAIN_IB=mlx4_0:1 <...>
  • Running OSHMEM with HCOLL:


    % oshrun -mca scoll_mpi_enable 1 -mca scoll basic,mpi -mca coll_hcoll_enable 1 <...>
    

IB-Router

As of v1.6, HPC-X supports ib-router to allow hosts that are located on different IB subnets to communicate with each other. This support is currently available when using the 'openib btl' in Open MPI.

To use ib-router, make sure MLNX_OFED v3.3-1.0.0.0 and above is installed and then recompile your Open MPI with '--enable-openib-rdmacm-ibaddr' (for further information of now to compile Open MPI, refer to .Running, Configuring and Rebuilding HPC-X® v2.7#Rebuilding Open MPI.)

To enable routing over IB, please follow these steps:

  1. Configure Open MPI with--enable-openib-rdmacm-ibaddr .

  2. Use rdmacm with openib btl from the command line.
  3. Set the btl_openib_allow_different_subnets parameter to 1.
    It is 0 by default.
  4. Set the btl_openib_gid_index parameter to 1.

For example - to run the IMB benchmark on host1 and host2 which are on separate subnets, i.e. have different subnet_prefix , use the following command line:

shell$ mpirun -np 2 --display-map --map-by node -H host1,host2 -mca pml ob1 -mca btl self,openib --mca btl_openib_cpc_include rdmacm -mca btl_openib_if_include mlx5_0:1 -mca btl_openib_gid_index 1 -mca btl_openib_allow_different_subnets 1 ./IMB/src/IMB-MPI1 pingpong


More information about how to enable and use ib-router is here - https://www.open-mpi.org/faq/?category=openfabrics#ib-router

When using “openib btl” , RoCE and IB router are mutually exclusive. The Open MPI inside HPC-X is not compiled with ib-router support, therefore it supports RoCE out-of-the-box.

Direct Launch of Open MPI and OpenSHMEM using SLURM 'srun'

If Open MPI was built with SLURM support, and SLURM has PMI2 or PMIx support, the Open MPI and OpenSHMEM applications can be launched directly using the "srun" command:

  • Open MPI:

    `env <MPI/OSHMEM-application-env> srun --mpi={pmi2|pmix} <srun-args> <mpi-app-args>`



All Open MPI/OpenSHMEM parameters that are supported by the mpirun/oshrun command line can be provided through environment variables using the following rule:

"-mca <param_name> <param-val>" => "export OMPI_MCA_<param_name>=<param-val>"

For example an alternative to "-mca coll_hcoll_enable 1" with 'mpirun' is
"export OMPI_MCA_coll_hcoll_enable=1" with 'srun '