Appendix: Common Tasks#

The convention for using MPS will vary between system environments. The Cray environment, for example, manages MPS in a way that is almost invisible to the user, whereas other Linux-based systems may require the user to manage activating the control daemon themselves. As a user you will need to understand which set of conventions is appropriate for the system you are running on. Some cases are described in this section.

Starting and Stopping MPS on Linux#

On a Multi-user System#

To cause all users of the system to run CUDA applications via MPS you will need to set up the MPS control daemon to run when the system starts.

Starting MPS control daemon#

As root, run the commands:

$ export CUDA_VISIBLE_DEVICES=0           # Select GPU 0.

$ nvidia-smi -i 0 -c EXCLUSIVE_PROCESS    # Set GPU 0 to exclusive mode.

$ nvidia-cuda-mps-control -d              # Start the daemon.

This will start the MPS control daemon that will spawn a new MPS Server instance for any $UID starting an application and associate it with the GPU visible to the control daemon. Only one instance of the nvidia-cuda-mps-control daemon should be run per node. Note that CUDA_VISIBLE_DEVICES should not be set in the client process’s environment.

Shutting Down MPS Control Daemon#

To shut down the daemon, as root, run:

$ echo quit | nvidia-cuda-mps-control

Log Files#

You can view the status of the daemons by viewing the log files in

/var/log/nvidia-mps/control.log

/var/log/nvidia-mps/server.log

These are typically only visible to users with administrative privileges.

On a Single-User System#

When running as a single user, the control daemon must be launched with the same user ID as that of the client process.

Starting MPS Control Daemon#

As $UID, run the commands:

export CUDA_VISIBLE_DEVICES=0 # Select GPU 0.

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Select a location that's accessible to the given $UID

export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Select a location that's accessible to the given $UID

nvidia-cuda-mps-control -d # Start the daemon.

This will start the MPS control daemon that will spawn a new MPS Server instance for that $UID starting an application and associate it with GPU visible to the control daemon.

Starting MPS client application#

Set the following variables in the client process’s environment. Note that CUDA_VISIBLE_DEVICES should not be set in the client’s environment.

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Set to the same location as the MPS control daemon

export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Set to the same location as the MPS control daemon

Shutting Down MPS#

To shut down the daemon, as $UID, run:

$ echo quit | nvidia-cuda-mps-control

Log Files#

You can view the status of the daemons by viewing the log files in

$CUDA_MPS_LOG_DIRECTORY/control.log

$CUDA_MPS_LOG_DIRECTORY/server.log

Scripting a Batch Queuing System#

Basic Principles#

We recommend you manage the various MPS specific details by building an automatic provisioning abstraction on top of the basic MPS components. This section discusses how to implement a batch-submission flag in the PBS/Torque queuing environment and discusses MPS integration into a batch queuing system in general.

Per-Job MPS Control: A Torque/PBS Example#

Note

Torque installations are highly customized. Conventions for specifying job resources vary from site to site and we expect that, analogously, the convention for enabling MPS could vary from site to site as well. Check with your system’s administrator to find out if they already have a means to provision MPS on your behalf.

Tinkering with nodes outside the queuing convention is generally discouraged since jobs are usually dispatched as nodes are released by completing jobs. It is possible to enable MPS on a per-job basis by using the Torque prologue and epilogue scripts to start and stop the nvidia-cuda-mps-control daemon. In this example, we re-use the account parameter to request MPS for a job, so that the following command:

qsub -A "MPS=true" ...

will result in the prologue script starting MPS as shown:

# Activate MPS if requested by user

USER=$2
ACCTSTR=$7
echo $ACCTSTR | grep -i "MPS=true"
if [ $? -eq 0 ]; then
   nvidia-smi -c 3
   USERID=`id -u $USER`
   export CUDA_VISIBLE_DEVICES=0
   nvidia-cuda-mps-control -d && echo "MPS control daemon started"
   sleep 1
   echo "start_server -uid $USERID" | nvidia-cuda-mps-control && echo "MPS server started for $USER"
fi

and the epilogue script stopping MPS as shown:

# Reset compute mode to default
nvidia-smi -c 0

# Quit cuda MPS if it's running
ps aux | grep nvidia-cuda-mps-control | grep -v grep > /dev/null
if [ $? -eq 0 ]; then
   echo quit | nvidia-cuda-mps-control
fi

# Test for presence of MPS zombie
ps aux | grep nvidia-cuda-mps | grep -v grep > /dev/null
if [ $? -eq 0 ]; then
   logger "`hostname` epilogue: MPS refused to quit! Marking offline"
   pbsnodes -o -N "Epilogue check: MPS did not quit" `hostname`
fi

# Check GPU sanity, simple check
nvidia-smi > /dev/null
if [ $? -ne 0 ]; then
   logger "`hostname` epilogue: GPUs not sane! Marking `hostname` offline"
   pbsnodes -o -N "Epilogue check: nvidia-smi failed" `hostname`
fi

Best Practices for SM Partitioning#

Creating a context is a costly operation in terms of time, memory, and the hardware resources.

If a context with execution affinity is created at kernel launch time, the user will observe a sudden increase in latency and memory footprint as a result of the context creation. To avoid paying the latency of context creation and the abrupt increase in memory usage at kernel launch time, it is recommended that users create a pool of contexts with different SM partitions upfront and select context with the suitable SM partition on kernel launch:

int device = 0;
cudaDeviceProp prop;
const Int CONTEXT_POOL_SIZE = 4;
CUcontext contextPool[CONTEXT_POOL_SIZE];
int smCounts[CONTEXT_POOL_SIZE];
cudaSetDevice(device);
cudaGetDeviceProperties(&prop, device);
smCounts[0] = 1; smCounts[1] = 2;
smCounts[3] = (prop. multiProcessorCount - 3) / 3;
smCounts[4] = (prop. multiProcessorCount - 3) / 3 * 2;
for (int i = 0; i < CONTEXT_POOL_SIZE; i++) {
   CUexecAffinityParam affinity;
   affinity.type = CU_EXEC_AFFINITY_TYPE_SM_COUNT;
   affinity.param.smCount.val = smCounts[i];
   cuCtxCreate_v3(&contextPool[i], affinity, 1, 0, deviceOrdinal);
}

for (int i = 0; i < CONTEXT_POOL_SIZE; i++) {
   std::thread([i]() {
      int numSms = 0;
      int numBlocksPerSm = 0;
      int numThreads = 128;
      CUexecAffinityParam affinity;
      cuCtxSetCurrent(contextPool[i]);
      cuCtxGetExecAffinity(&affinity, CU_EXEC_AFFINITY_TYPE_SM_COUNT);
      numSms = affinity.param.smCount.val;
      cudaOccupancyMaxActiveBlocksPerMultiprocessor(
         &numBlocksPerSm, kernel, numThreads, 0);
      void *kernelArgs[] = { /* add kernel args */ };

      dim3 dimBlock(numThreads, 1, 1);
      dim3 dimGrid(numSms * numBlocksPerSm, 1, 1);
      cudaLaunchCooperativeKernel((void*)my_kernel, dimGrid, dimBlock, kernelArgs);
   };
}

Using Static SM Partitioning#

Static SM partitioning mode allows users to create exclusive SM partitions for MPS clients on NVIDIA Ampere architecture and newer GPUs, providing deterministic resource allocation and improved isolation. The following example demonstrates the minimal workflow for configuring and using static partitions.

Basic Workflow#

# 1. Start the MPS control daemon with static partitioning enabled
$ nvidia-cuda-mps-control -d -S

# 2. Create an SM partition with 7 chunks. The first partitioning command
# will perform a lightweight CUDA initialization.
$ echo "sm_partition add GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65 7" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

# 3. View the current partitioning configuration
$ echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 3       7        24    56   -
GPU-74d43ed3  Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       7        -     56   -

# 4. Assign the partition to a client application. The MPS server will start
# on client application connection.
$ export CUDA_MPS_SM_PARTITION=GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
$ ./nbody

# 5. After the application completes, remove the partition
$ echo "sm_partition rm GPU-74d43ed3 Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" | nvidia-cuda-mps-control

Multiple Partitions Example#

The following example demonstrates creating multiple partitions for different workloads:

# Start MPS control with static partitioning
$ nvidia-cuda-mps-control -d -S

# View the current partitioning configuration
$ echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 10       0       92    92   -

# Create three partitions with different sizes
$ echo "sm_partition add GPU-74d43ed3 5" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

$ echo "sm_partition add GPU-74d43ed3 3" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

$ echo "sm_partition add GPU-74d43ed3 2" | nvidia-cuda-mps-control
GPU-74d43ed3-cdf7-e667-3644-bf5b4f46ed65/Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

# Run different applications on different partitions
$ CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./large_workload &
$ CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./medium_workload &
$ CUDA_MPS_SM_PARTITION=GPU-74d43ed3/Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ./small_workload &

# Check partition usage
$ echo "lspart" | nvidia-cuda-mps-control
GPU           Partition                             free    used    free  used  clients
                                                    chunk   chunk    SM    SM
GPU-74d43ed3      -                                 0       10       0     80   -
GPU-74d43ed3  Dx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       5        -     40   Yes
GPU-74d43ed3  Cx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       3        -     24   Yes
GPU-74d43ed3  Bx4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA  -       2        -     28   Yes

Troubleshooting Guide#

MPS Server Not Accepting New Client Requests#

This error indicates that the MPS (Multi-Process Service) server is currently unable to accept new client connections. This is typically a temporary condition caused by one of the following reasons.

Possible Causes and Recommended Actions Server Recovery or Automatic Restart

Cause: The MPS server is recovering from a previous error or automatically restarting due to a kernel-level application failure.

Action: Wait for the server to complete its recovery. If these errors occur repeatedly, investigate the client application for underlying issues.

Server Initialization in Progress

Cause: The server is performing initialization operations equivalent to cuInit() --> cuCtxCreate(). On systems with multiple GPUs and no persistence daemon, this can take a significant amount of time. Any client connections attempted during this period will receive this error.

Action: Allow the initialization to complete. To speed up future startups, consider enabling the NVIDIA Persistence Daemon.

Client Termination Cleanup

Cause: A terminate_client operation is in progress. During this time, the server restricts new connections to clean up resources from a specific client.

Action: Wait for the cleanup process to finish. This does not affect other connected clients.

User ID Mismatch Without Multi-User Mode

Cause: A client with a different user ID is attempting to connect while the server is not started with the -multi-user flag.

Action: Either start the MPS server with the -multi-user option or ensure that clients are using the same user ID. The default MPS behavior is to restart the server for each unique user ID.