Appendix: Common Tasks#

The convention for using MPS will vary between system environments. The Cray environment, for example, manages MPS in a way that is almost invisible to the user, whereas other Linux-based systems may require the user to manage activating the control daemon themselves. As a user you will need to understand which set of conventions is appropriate for the system you are running on. Some cases are described in this section.

Starting and Stopping MPS on Linux#

On a Multi-user System#

To cause all users of the system to run CUDA applications via MPS you will need to set up the MPS control daemon to run when the system starts.

Starting MPS control daemon#

As root, run the commands:

export CUDA_VISIBLE_DEVICES=0           # Select GPU 0.

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS    # Set GPU 0 to exclusive mode.

nvidia-cuda-mps-control -d              # Start the daemon.

This will start the MPS control daemon that will spawn a new MPS Server instance for any $UID starting an application and associate it with the GPU visible to the control daemon. Only one instance of the nvidia-cuda-mps-control daemon should be run per node. Note that CUDA_VISIBLE_DEVICES should not be set in the client process’s environment.

Shutting Down MPS Control Daemon#

To shut down the daemon, as root, run:

echo quit | nvidia-cuda-mps-control

Log Files#

You can view the status of the daemons by viewing the log files in

/var/log/nvidia-mps/control.log

/var/log/nvidia-mps/server.log

These are typically only visible to users with administrative privileges.

On a Single-User System#

When running as a single user, the control daemon must be launched with the same user ID as that of the client process.

Starting MPS Control Daemon#

As $UID, run the commands:

export CUDA_VISIBLE_DEVICES=0 # Select GPU 0.

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Select a location that's accessible to the given $UID

export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Select a location that's accessible to the given $UID

nvidia-cuda-mps-control -d # Start the daemon.

This will start the MPS control daemon that will spawn a new MPS Server instance for that $UID starting an application and associate it with GPU visible to the control daemon.

Starting MPS client application#

Set the following variables in the client process’s environment. Note that CUDA_VISIBLE_DEVICES should not be set in the client’s environment.

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Set to the same location as the MPS control daemon

export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Set to the same location as the MPS control daemon

Shutting Down MPS#

To shut down the daemon, as $UID, run:

echo quit | nvidia-cuda-mps-control

Log Files#

You can view the status of the daemons by viewing the log files in

$CUDA_MPS_LOG_DIRECTORY/control.log

$CUDA_MPS_LOG_DIRECTORY/server.log

Scripting a Batch Queuing System#

Basic Principles#

Chapters 3 and 4 describe the MPS components, software utilities, and the environment variables that control them. However, using MPS at this level puts a burden on the user since:

  1. At the application level, the user only cares whether MPS is engaged or not, and should not have to understand the details of environment settings and so on when they are unlikely to deviate from a fixed configuration.

  2. There may be consistency conditions that need to be enforced by the system itself, such as clearing CPU- and GPU- memory between application runs, or deleting zombie processes upon job completion.

  3. Root-access (or equivalent) is required to change the mode of the GPU.

We recommend you manage these details by building some sort of automatic provisioning abstraction on top of the basic MPS components. This section discusses how to implement a batch-submission flag in the PBS/Torque queuing environment and discusses MPS integration into a batch queuing system in general.

Per-Job MPS Control: A Torque/PBS Example#

Note

Torque installations are highly customized. Conventions for specifying job resources vary from site to site and we expect that, analogously, the convention for enabling MPS could vary from site to site as well. Check with your system’s administrator to find out if they already have a means to provision MPS on your behalf.

Tinkering with nodes outside the queuing convention is generally discouraged since jobs are usually dispatched as nodes are released by completing jobs. It is possible to enable MPS on a per-job basis by using the Torque prologue and epilogue scripts to start and stop the nvidia-cuda-mps-control daemon. In this example, we re-use the account parameter to request MPS for a job, so that the following command:

qsub -A "MPS=true" ...

will result in the prologue script starting MPS as shown:

# Activate MPS if requested by user

USER=$2
ACCTSTR=$7
echo $ACCTSTR | grep -i "MPS=true"
if [ $? -eq 0 ]; then
   nvidia-smi -c 3
   USERID=`id -u $USER`
   export CUDA_VISIBLE_DEVICES=0
   nvidia-cuda-mps-control -d && echo "MPS control daemon started"
   sleep 1
   echo "start_server -uid $USERID" | nvidia-cuda-mps-control && echo "MPS server started for $USER"
fi

and the epilogue script stopping MPS as shown:

# Reset compute mode to default
nvidia-smi -c 0

# Quit cuda MPS if it's running
ps aux | grep nvidia-cuda-mps-control | grep -v grep > /dev/null
if [ $? -eq 0 ]; then
   echo quit | nvidia-cuda-mps-control
fi

# Test for presence of MPS zombie
ps aux | grep nvidia-cuda-mps | grep -v grep > /dev/null
if [ $? -eq 0 ]; then
   logger "`hostname` epilogue: MPS refused to quit! Marking offline"
   pbsnodes -o -N "Epilogue check: MPS did not quit" `hostname`
fi

# Check GPU sanity, simple check
nvidia-smi > /dev/null
if [ $? -ne 0 ]; then
   logger "`hostname` epilogue: GPUs not sane! Marking `hostname` offline"
   pbsnodes -o -N "Epilogue check: nvidia-smi failed" `hostname`
fi

Best Practice for SM Partitioning#

Creating a context is a costly operation in terms of time, memory, and the hardware resources.

If a context with execution affinity is created at kernel launch time, the user will observe a sudden increase in latency and memory footprint as a result of the context creation. To avoid paying the latency of context creation and the abrupt increase in memory usage at kernel launch time, it is recommended that users create a pool of contexts with different SM partitions upfront and select context with the suitable SM partition on kernel launch:

int device = 0;
cudaDeviceProp prop;
const Int CONTEXT_POOL_SIZE = 4;
CUcontext contextPool[CONTEXT_POOL_SIZE];
int smCounts[CONTEXT_POOL_SIZE];
cudaSetDevice(device);
cudaGetDeviceProperties(&prop, device);
smCounts[0] = 1; smCounts[1] = 2;
smCounts[3] = (prop. multiProcessorCount - 3) / 3;
smCounts[4] = (prop. multiProcessorCount - 3) / 3 * 2;
for (int i = 0; i < CONTEXT_POOL_SIZE; i++) {
   CUexecAffinityParam affinity;
   affinity.type = CU_EXEC_AFFINITY_TYPE_SM_COUNT;
   affinity.param.smCount.val = smCounts[i];
   cuCtxCreate_v3(&contextPool[i], affinity, 1, 0, deviceOrdinal);
}

for (int i = 0; i < CONTEXT_POOL_SIZE; i++) {
   std::thread([i]() {
      int numSms = 0;
      int numBlocksPerSm = 0;
      int numThreads = 128;
      CUexecAffinityParam affinity;
      cuCtxSetCurrent(contextPool[i]);
      cuCtxGetExecAffinity(&affinity, CU_EXEC_AFFINITY_TYPE_SM_COUNT);
      numSms = affinity.param.smCount.val;
      cudaOccupancyMaxActiveBlocksPerMultiprocessor(
         &numBlocksPerSm, kernel, numThreads, 0);
      void *kernelArgs[] = { /* add kernel args */ };

      dim3 dimBlock(numThreads, 1, 1);
      dim3 dimGrid(numSms * numBlocksPerSm, 1, 1);
      cudaLaunchCooperativeKernel((void*)my_kernel, dimGrid, dimBlock, kernelArgs);
   };
}

The hardware resources needed for client CUDA contexts is limited and support up to 60 client CUDA contexts per-device on Volta MPS. The size of the context pool per-device is limited by the number of CUDA client contexts supported per-device. The memory footprint of each client CUDA context and the value of CUDA_DEVICE_MAX_CONNECTIONS may further reduce the number of available clients. Therefore, CUDA client contexts with different SM partitions should be created judiciously.

Using Static SM Partitioning#

Static SM partitioning mode allows users to create exclusive SM partitions for MPS clients on NVIDIA Ampere architecture and newer GPUs, providing deterministic resource allocation and improved isolation. The following example demonstrates the minimal workflow for configuring and using static partitions.

Basic Workflow#

# 1. Start the MPS control daemon with static partitioning enabled

nvidia-cuda-mps-control -d -S

# 2. Create an SM partition with 7 chunks. The first partitioning command
# will perform a lightweight CUDA initialization.

echo "sm_partition add GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65 7" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

# 3. View the current partitioning configuration

echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 3       7        24    56   -
GPU-74d43ed3  Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       7        -     56   -

# 4. Assign the partition to a client application. The MPS server will start
# on client application connection.

export CUDA_MPS_SM_PARTITION=GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
./nbody

# 5. After the application completes, remove the partition

echo "sm_partition rm GPU-74d43ed3 Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" | nvidia-cuda-mps-control

Multiple Partitions Example#

The following example demonstrates creating multiple partitions for different workloads:

# Start MPS control with static partitioning

nvidia-cuda-mps-control -d -S

# View the current partitioning configuration

echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 10       0       92    92   -

# Create three partitions with different sizes

echo "sm_partition add GPU-74d43ed3 5" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

echo "sm_partition add GPU-74d43ed3 3" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

echo "sm_partition add GPU-74d43ed3 2" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

# Run different applications on different partitions

CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./large_workload &
CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./medium_workload &
CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./small_workload &

# Check partition usage

echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 0       10       0     80   -
GPU-74d43ed3  Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       5        -     40   Yes
GPU-74d43ed3  Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       3        -     24   Yes
GPU-74d43ed3  Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       2        -     28   Yes

Troubleshooting Guide#

MPS Server Not Accepting New Client Requests#

Description This error indicates that the MPS (Multi-Process Service) server is currently unable to accept new client connections. This is typically a temporary condition caused by one of the following known issues.

Possible Causes and Recommended Actions Server Recovery or Automatic Restart

Cause: The MPS server is recovering from a previous error or automatically restarting due to a kernel-level application failure.

Action: Wait for the server to complete its recovery. If these errors occur repeatedly, investigate the client application for underlying issues.

Server Initialization in Progress

Cause: The server is performing initialization operations equivalent to cuInit() --> cuCtxCreate(). On systems with multiple GPUs and no persistence daemon, this can take a significant amount of time. Any client connections attempted during this period will receive this error.

Action: Allow the initialization to complete. To speed up future startups, consider enabling the NVIDIA Persistence Daemon.

Client Termination Cleanup

Cause: A terminate_client operation is in progress. During this time, the server restricts new connections to clean up resources from a specific client.

Action: Wait for the cleanup process to finish. This does not affect other connected clients.

User ID Mismatch Without Multi-User Mode

Cause: A client with a different user ID is attempting to connect while the server is not started with the -multi-user flag.

Action: Either start the MPS server with the -multi-user option or ensure that clients are using the same user ID. The default MPS behavior is to restart the server for each unique user ID.