Appendix: Tools and Interface Reference#

The following utility programs and environment variables are used to manage the MPS execution environment. They are described below, along with other relevant pieces of the standard CUDA programming environment.

Utilities and Daemons#

nvidia-cuda-mps-control#

Typically stored under /usr/bin on Linux and QNX systems and typically run with superuser privileges, this control daemon is used to manage the nvidia-cuda-mps-server described in the following section. These are the relevant use cases:

man nvidia-cuda-mps-control          # Describes usage of this utility.

nvidia-cuda-mps-control -d           # Start daemon in background process.

ps -ef | grep mps                    # Check if the MPS daemon is running, for Linux.

pidin  | grep mps                    # See if the MPS daemon is running, for QNX.

echo quit | nvidia-cuda-mps-control  # Shut the daemon down.

nvidia-cuda-mps-control -f           # Start daemon in foreground.

nvidia-cuda-mps-control -v           # Print version of control daemon executable (applicable on Tegra platforms only).

nvidia-cuda-mps-control --static-partitioning
nvidia-cuda-mps-control -S           # Start daemon with static partitioning mode enabled.

The control daemon creates a nvidia-cuda-mps-control.pid file that contains the PID of the control daemon process in the CUDA_MPS_PIPE_DIRECTORY. When there are multiple instances of the control daemon running in parallel, one can target a specific instance by looking up its PID in the corresponding CUDA_MPS_PIPE_DIRECTORY. If CUDA_MPS_PIPE_DIRECTORY is not set, the nvidia-cuda-mps-control.pid file will be created at the default pipe directory at /tmp/nvidia-mps.

When used in interactive mode, the available commands are:

  • get_server_list – prints out a list of all PIDs of server instances.

  • get_server_status <PID> – this will print out the status of the server with the given <PID>.

  • start_server -uid <user id> [-mlopart] – manually starts a new instance of nvidia-cuda-mps-server with the given user ID. If mlopart is specified, then clients will create MLOPart devices if supported.

  • get_client_list <PID> – lists the PIDs of client applications connected to a server instance assigned to the given PID.

  • quit – terminates the nvidia-cuda-mps-control daemon.

Commands available to Volta MPS control:

  • get_device_client_list [<PID>] – lists the devices and PIDs of client applications that enumerated this device. It optionally takes the server instance PID.

  • set_default_active_thread_percentage <percentage> – overrides the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server. The set value is lost if a quit command is executed. The default is 100.

  • get_default_active_thread_percentage – queries the current default available thread percentage.

  • set_active_thread_percentage <PID> <percentage> – overrides the active thread percentage for the MPS server instance of the given PID. All clients created with that server afterwards will observe the new limit. Existing clients are not affected.

  • get_active_thread_percentage <PID> – queries the current available thread percentage of the MPS server instance of the given PID.

  • set_default_device_pinned_mem_limit <dev> <value> – sets the default device pinned memory limit for each MPS client. If there is already a server spawned, this command will only affect the next server. The set value is lost if a quit command is executed. The dev argument may be a device UUID string or an integer ordinal. The value must be in the form of an integer followed by a qualifier, either “G” or “M” that specifies the value in Gigabyte or Megabyte respectively. For example, to set a limit of 10 gigabytes for device 0, use the following command:

    set_default_device_pinned_mem_limit 0 10G

    By default, there is no memory limit set.

    Note that for this command, the dev argument is not validated against available devices in the MPS server. Therefore, it is possible to set two memory limits for the same device: one by device UUID and another by ordinal. When an MPS server is started, whichever limit was set last will take effect. A limit set with an invalid device UUID or ordinal will be ignored when starting the MPS server.

  • get_default_device_pinned_mem_limit <dev> – queries the current default pinned memory limit for the device. The dev argument may be device UUID string or an integer ordinal.

    Note that this command does not translate between device UUIDs or ordinals and will return the limit that was set for each device identifier via the set_default_device_pinned_mem_limit command.

  • set_device_pinned_mem_limit <PID> <dev> <value> - overrides the device pinned memory limit for MPS servers. This sets the device pinned memory limit for each client of MPS server instance of the given PID for the device dev. All clients created with that server afterwards will observe the new limit. Existing clients are not affected. The dev argument may be a device UUID string or an integer ordinal. For example, to set a limit of 900MB for the server with pid 1024 for device 0, use the following command:

    set_device_pinned_mem_limit 1024 0 900M

  • get_device_pinned_mem_limit <PID> <dev> – queries the current device pinned memory limit of the MPS server instance of the given PID for the device dev. The dev argument may be a device UUID string or an integer ordinal.

  • terminate_client <server PID> <client PID> – terminates all the outstanding GPU work of the MPS client process <client PID> running on the MPS server denoted by <server PID>. For example, to terminate the outstanding GPU work for an MPS client process with PID 1024 running on an MPS server with PID 123, use the following command:

    terminate_client 123 1024

  • ps [-p PID] – reports a snapshot of the current client processes. It optionally takes the server instance PID. It displays the PID, the unique identifier assigned by the server, the partial UUID of the associated device, the PID of the connected server, the namespace PID, and the command line of the client.

  • set_default_client_priority [priority] – sets the default client priority that will be used for new clients. The value is not applied to existing clients. Priority values should be considered as hints to the CUDA Driver, not guarantees. Allowed values are 0 [NORMAL] and 1 [BELOW NORMAL]. The set value is lost if a quit command is executed. The default is 0 [NORMAL].

  • get_default_client_priority – queries the current priority value that will be used for new clients.

  • device_query [<server PID>] [--csv] – Queries the devices that are available to MPS clients. If a server PID is specified, then the command will output the device information for that server and ignore other servers. If csv is specified, then the command will output the device information in a comma-separated format.

  • sm_partition add <device UUID> <number of chunks> – creates an SM partition with the specified number of chunks on the given device. Upon successful creation, the full partition ID is displayed. This command accepts unique partial UUIDs of devices.

  • sm_partition rm <device UUID> <partition> – removes the specified SM partition from the given device.

  • lspart – displays the current SM partitioning configuration. The output includes the device UUID, partition IDs, free and used chunks, free and used SMs, and whether the partition is in use. The display uses unique partial UUIDs of devices.

nvidia-cuda-mps-server#

Typically stored under /usr/bin on Linux and QNX systems, this daemon is run under the same $UID as the client application running on the node. The nvidia-cuda-mps-server instances are created on-demand when client applications connect to the control daemon. The server binary should not be invoked directly, and instead the control daemon should be used to manage the startup and shutdown of servers.

The nvidia-cuda-mps-server process owns the CUDA context on the GPU and uses it to execute GPU operations for its client application processes. Due to this, when querying active processes via nvidia-smi (or any NVML-based application) nvidia-cuda-mps-server will appear as the active CUDA process rather than any of the client processes.

The version of the nvidia-cuda-mps-server executable can be printed with:

nvidia-cuda-mps-server -v

nvidia-smi#

Typically stored under /usr/bin on Linux systems, this is used to configure GPUs on a node. The following use cases are relevant to managing MPS:

man nvidia-smi                        # Describes usage of this utility.

nvidia-smi -L                         # List the GPU's on node.

nvidia-smi -q                         # List GPU state and configuration information.

nvidia-smi -q -d compute              # Show the compute mode of each GPU.

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS  # Set GPU 0 to exclusive mode, run as root.

nvidia-smi -i 0 -c DEFAULT            # Set GPU 0 to default mode, run as root. (SHARED_PROCESS)

nvidia-smi -i 0 -r                    # Reboot GPU 0 with the new setting.

Environment Variables#

CUDA_VISIBLE_DEVICES#

CUDA_VISIBLE_DEVICES is used to specify which GPU’s should be visible to a CUDA application. Only the devices whose index or UUID is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence.

When CUDA_VISIBLE_DEVICES is set before launching the control daemon, the devices will be remapped by the MPS server. This means that if your system has devices 0, 1 and 2, and if CUDA_VISIBLE_DEVICES is set to 0,2, then when a client connects to the server it will see the remapped devices – device 0 and a device 1. Therefore, keeping CUDA_VISIBLE_DEVICES set to 0,2 when launching the client would lead to an error.

The MPS control daemon will further filter-out any pre-Volta devices, if any visible device is Volta+.

To avoid this ambiguity, we recommend using UUIDs instead of indices. These can be viewed by launching nvidia-smi -q. When launching the server, or the application, you can set CUDA_VISIBLE_DEVICES to UUID_1,UUID_2, where UUID_1 and UUID_2 are the GPU UUIDs. It will also work when you specify the first few characters of the UUID (including GPU-) rather than the full UUID.

The MPS server will fail to start if incompatible devices are visible after the application of CUDA_VISIBLE_DEVICES.

CUDA_MPS_PIPE_DIRECTORY#

The MPS control daemon, the MPS server, and the associated MPS clients communicate with each other via named pipes and UNIX domain sockets. The default directory for these pipes and sockets is /tmp/nvidia-mps. The environment variable, CUDA_MPS_PIPE_DIRECTORY, can be used to override the location of these pipes and sockets. The value of this environment variable should be consistent across all MPS clients sharing the same MPS server, and the MPS control daemon.

The recommended location for the directory containing these named pipes and domain sockets is local folders such as /tmp. If the specified location exists in a shared, multi-node filesystem, the path must be unique for each node to prevent multiple MPS servers or MPS control daemons from using the same pipes and sockets. When provisioning MPS on a per-user basis, the directory should be set to a location such that different users will not end up using the same directory.

On Tegra platforms, there is no default directory setting for pipes and sockets. Users must set this environment variable such that only intended users have access to this location.

CUDA_MPS_LOG_DIRECTORY#

The MPS control daemon maintains a control.log file which contains the status of its MPS servers, user commands issued and their result, and startup and shutdown notices for the daemon. The MPS server maintains a server.log file containing its startup and shutdown information and the status of its clients.

By default these log files are stored in the directory /var/log/nvidia-mps. The CUDA_MPS_LOG_DIRECTORY environment variable can be used to override the default value. This environment variable should be set in the MPS control daemon’s environment and is automatically inherited by any MPS servers launched by that control daemon.

On Tegra platforms, there is no default directory setting for storing the log files. MPS will remain operational without the user setting this environment variable; however, in such instances, MPS logs will not be available. If logs are required to be captured, then the user must set this environment variable such that only intended users have access to this location.

CUDA_DEVICE_MAX_CONNECTIONS#

When encountered in the MPS client’s environment, CUDA_DEVICE_MAX_CONNECTIONS sets the preferred number of compute and copy engine concurrent connections (work queues) from the host to the device for that client. The number actually allocated by the driver may differ from what is requested based on hardware resource limitations or other considerations. Under MPS, each server’s clients share one pool of connections, whereas without MPS each CUDA context would be allocated its own separate connection pool. Volta MPS clients exclusively owns the connections set aside for the client in the shared pool, so setting this environment variable under Volta MPS may reduce the number of available clients. The default value is 2 for Volta MPS clients.

CUDA_MPS_ACTIVE_THREAD_PERCENTAGE#

On Volta GPUs, this environment variable sets the portion of the available threads that can be used by the client contexts. The limit can be configured at different levels.

MPS Control Daemon Level#

Setting this environment variable in an MPS control’s environment will configure the default active thread percentage when the MPS control daemon starts.

All the MPS servers spawned by the MPS control daemon will observe this limit. Once the MPS control daemon has started, changing this environment variable cannot affect the MPS servers.

Client Process Level#

Setting this environment variable in an MPS client’s environment will configure the active thread percentage when the client process starts. The new limit will only further constrain the limit set by the control daemon (via set_default_active_thread_percentage or set_active_thread_percentage control daemon commands or this environment variable at the MPS control daemon level). If the control daemon has a lower setting, the control daemon setting will be obeyed by the client process instead.

All the client CUDA contexts created within the client process will observe the new limit. Once the client process has started, changing the value of this environment variable cannot affect the client CUDA contexts.

Client CUDA Context Level#

By default, configuring the active thread percentage at the client CUDA context level is disabled. User must explicitly opt-in via environment variable CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING. Refer to CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING for more details.

Setting this environment variable within a client process will configure the active thread percentage when creating a new client CUDA context. The new limit will only further constraint the limit set at the control daemon level and the client process level. If the control daemon or the client process has a lower setting, the lower setting will be obeyed by the client CUDA context instead. All the client CUDA contexts created afterwards will observe the new limit. Existing client CUDA contexts are not affected.

CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING#

By default, users can only partition the available threads uniformly. An explicit opt-in via this environment variable is required to enable non-uniform partitioning capability. To enable non-uniform partitioning capability, this environment variable must be set before the client process starts.

When non-uniform partitioning capability is enabled in an MPS client’s environment, client CUDA contexts can have different active thread percentages within the same client process via setting CUDA_MPS_ACTIVE_THREAD_PERCENTAGE before context creations. The device attribute cudaDevAttrMultiProcessorCount will reflect the active thread percentage and return the portion of available SMs that can be used by the client CUDA context current to the calling thread.

CUDA_MPS_PINNED_DEVICE_MEM_LIMIT#

The pinned memory limit control limits the amount of GPU memory that is allocatable by CUDA APIs by the client process. On Volta GPUs, this environment variable sets a limit on pinned device memory that can be allocated by the client contexts. Setting this environment variable in an MPS client’s environment will set the device’s pinned memory limit when the client process starts. The new limit will only further constrain the limit set by the control daemon (via set_default_device_pinned_mem_limit or set_device_pinned_mem_limit control daemon commands or this environment variable at the MPS control daemon level). If the control daemon has a lower value, the control daemon setting will be obeyed by the client process instead. This environment variable will have the same semantics as CUDA_VISIBLE_DEVICES i.e. the value string can contain comma-separated device ordinals and/or device UUIDs with per device memory limit separated by an equals. Example usage:

$ export CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=''0=1G,1=512MB''

The following example highlights the hierarchy and usage of the MPS memory limiting functionality.

# Set the default device pinned mem limit to 3G for device 0. The default limit constrains the memory allocation limit of all the MPS clients of future MPS servers to 3G on device 0.

$ nvidia-cuda-mps-control set_default_device_pinned_mem_limit 0 3G

# Start daemon in background process

$ nvidia-cuda-mps-control -d

# Set device pinned mem limit to 2G for device 0 for the server instance of the
# given PID. All the MPS clients on this server will observe this new limit of 2G
# instead of the default limit of 3G when allocating pinned device memory on device 0.
# Note -- users are allowed to specify a server limit (via set_device_pinned_mem_limit)
# greater than the default limit previously set by set_default_device_pinned_mem_limit.

$ nvidia-cuda-mps-control set_device_pinned_mem_limit <pid> 0 2G

# Further constrain the device pinned mem limit for a particular MPS client to 1G for
# device 0. This ensures the maximum amount of memory allocated by this client is capped
# at 1G.
# Note - setting this environment variable to a value greater than value observed by the
# server for its clients (through set_default_device_pinned_mem_limit/ set_device_pinned_mem_limit)
* will not set the limit to the higher value and thus will be ineffective and the eventual
# limit observed by the client will be that observed by the server.

$ export CUDA_MPS_DEVICE_MEM_LIMIT="0=1G"

CUDA_MPS_CLIENT_PRIORITY#

The client priority level variable controls the initial default server value for the MPS Control Daemon if used to launch that, or the client priority level value for a given client if used in a client launch. The following examples demonstrate both usages.

# Set the default client priority level for new servers and clients to Below Normal

$ export CUDA_MPS_CLIENT_PRIORITY=1

$ nvidia-cuda-mps-control -d

# Set the client priority level for a single program to Normal without changing the priority level for future clients

$ CUDA_MPS_CLIENT_PRIORITY=0 <program>

Note

CUDA priority levels are not guarantees of execution order – they are only a performance hint to the CUDA driver.

MPS Logging Format#

Control Log#

Some of the example messages logged by the control daemon:

  • Startup and shutdown of MPS servers identified by their process IDs and the user ID with which they are being launched.

    [2013-08-05 12:50:23.347 Control 13894] Starting new server 13929 for user 500

    [2013-08-05 12:50:24.870 Control 13894] NEW SERVER 13929: Ready

    [2013-08-05 13:02:26.226 Control 13894] Server 13929 exited with status 0

  • New MPS client connections identified by the client process ID and the user ID of the user that launched the client process.

    [2013-08-05 13:02:10.866 Control 13894] NEW CLIENT 19276 from user 500: Server already exists

    [2013-08-05 13:02:10.961 Control 13894] Accepting connection...

  • User commands issued to the control daemon and their result.

    [2013-08-05 12:50:23.347 Control 13894] Starting new server 13929 for user 500

    [2013-08-05 12:50:24.870 Control 13894] NEW SERVER 13929: Ready

  • Error information such as failing to establish a connection with a client.

    [2013-08-05 13:02:10.961 Control 13894] Accepting connection...

    [2013-08-05 13:02:10.961 Control 13894] Unable to read new connection type information

Server Log#

Some of the example messages logged by the MPS server:

  • New MPS client connections and disconnections identified by the client process ID.

    [2013-08-05 13:00:09.269 Server 13929] New client 14781 connected

    [2013-08-05 13:00:09.270 Server 13929] Client 14777 disconnected

  • Error information such as the MPS server failing to start due to system requirements not being met.

    [2013-08-06 10:51:31.706 Server 29489] MPS server failed to start

    [2013-08-06 10:51:31.706 Server 29489] MPS is only supported on 64-bit Linux platforms, with an SM 3.5 or higher GPU.

  • Information about fatal GPU error containment on Volta+ MPS

    [2022-04-28 15:56:07.410 Other 11570] Volta MPS: status of client {11661, 1} is ACTIVE

    [2022-04-28 15:56:07.468 Other 11570] Volta MPS: status of client {11663, 1} is ACTIVE

    [2022-04-28 15:56:07.518 Other 11570] Volta MPS: status of client {11643, 2} is ACTIVE

    [2022-04-28 15:56:08.906 Other 11570] Volta MPS: Server is handling a fatal GPU error.

    [2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11641, 1} is INACTIVE

    [2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11643, 1} is INACTIVE

    [2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11643, 2} is INACTIVE

    [2022-04-28 15:56:08.906 Other 11570] Volta MPS: The following devices

    [2022-04-28 15:56:08.906 Other 11570] 0

    [2022-04-28 15:56:08.907 Other 11570] 1

    [2022-04-28 15:56:08.907 Other 11570] Volta MPS: The following clients have a sticky error set:

    [2022-04-28 15:56:08.907 Other 11570] 11641

    [2022-04-28 15:56:08.907 Other 11570] 11643

    [2022-04-28 15:56:09.200 Other 11570] Client {11641, 1} exit

    [2022-04-28 15:56:09.244 Other 11570] Client {11643, 1} exit

    [2022-04-28 15:56:09.244 Other 11570] Client {11643, 2} exit

    [2022-04-28 15:56:09.245 Other 11570] Volta MPS: Destroy server context on device 0

    [2022-04-28 15:56:09.269 Other 11570] Volta MPS: Destroy server context on device 1

    [2022-04-28 15:56:10.310 Other 11570] Volta MPS: Creating server context on device 0

    [2022-04-28 15:56:10.397 Other 11570] Volta MPS: Creating server context on device 1

MPS Known Issues#

  • Clients may fail to start, returning ERROR_OUT_OF_MEMORY when the first CUDA context is created, even though there are fewer client contexts than the hard limit of 16.

    Comments: When creating a context, the client tries to reserve virtual address space for the Unified Virtual Addressing memory range. On certain systems, this can clash with the system linker and the dynamic shared libraries loaded by it. Ensure that CUDA initialization (for example, cuInit() or any cuda*() Runtime API function) is one of the first functions called in your code. To provide a hint to the linker and to the Linux kernel that you want your dynamic shared libraries higher up in the VA space (where it won’t clash with CUDA’s UVA range), compile your code as PIC (Position Independent Code) and PIE (Position Independent Executable). Refer to your compiler manual for instructions on how to achieve this.

  • Memory allocation API calls (including context creation) may fail with the following message in the server log: MPS Server failed to create/open SHM segment.

    Comments: This is most likely due to exhausting the file descriptor limit on your system. Check the maximum number of open file descriptors allowed on your system and increase if necessary. We recommend setting it to 16384 and higher. Typically this information can be checked via the command ulimit -n; refer to your operating system instructions on how to change the limit.