Appendix: Tools and Interface Reference#
The following utility programs and environment variables are used to manage the MPS execution environment. They are described below, along with other relevant pieces of the standard CUDA programming environment.
Utilities and Daemons#
nvidia-cuda-mps-control#
Typically stored under /usr/bin on Linux and QNX systems and typically run with
superuser privileges, this control daemon is used to manage the nvidia-cuda-mps-server
described in the following section. These are the relevant use cases:
man nvidia-cuda-mps-control # Describes usage of this utility.
nvidia-cuda-mps-control -d # Start daemon in background process.
ps -ef | grep mps # Check if the MPS daemon is running, for Linux.
pidin | grep mps # See if the MPS daemon is running, for QNX.
echo quit | nvidia-cuda-mps-control # Shut the daemon down.
nvidia-cuda-mps-control -f # Start daemon in foreground.
nvidia-cuda-mps-control -v # Print version of control daemon executable (applicable on Tegra platforms only).
nvidia-cuda-mps-control --static-partitioning
nvidia-cuda-mps-control -S # Start daemon with static partitioning mode enabled.
The control daemon creates a nvidia-cuda-mps-control.pid file that contains the
PID of the control daemon process in the CUDA_MPS_PIPE_DIRECTORY. When there
are multiple instances of the control daemon running in parallel, one can target a specific
instance by looking up its PID in the corresponding CUDA_MPS_PIPE_DIRECTORY. If
CUDA_MPS_PIPE_DIRECTORY is not set, the nvidia-cuda-mps-control.pid file
will be created at the default pipe directory at /tmp/nvidia-mps.
When used in interactive mode, the available commands are:
get_server_list– prints out a list of all PIDs of server instances.get_server_status <PID>– this will print out the status of the server with the given <PID>.start_server -uid <user id> [-mlopart]– manually starts a new instance of nvidia-cuda-mps-server with the given user ID. Ifmlopartis specified, then clients will create MLOPart devices if supported.get_client_list <PID>– lists the PIDs of client applications connected to a server instance assigned to the given PID.quit– terminates thenvidia-cuda-mps-controldaemon.
Commands available to Volta MPS control:
get_device_client_list [<PID>]– lists the devices and PIDs of client applications that enumerated this device. It optionally takes the server instance PID.set_default_active_thread_percentage <percentage>– overrides the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server. The set value is lost if aquitcommand is executed. The default is 100.get_default_active_thread_percentage– queries the current default available thread percentage.set_active_thread_percentage <PID> <percentage>– overrides the active thread percentage for the MPS server instance of the given PID. All clients created with that server afterwards will observe the new limit. Existing clients are not affected.get_active_thread_percentage <PID>– queries the current available thread percentage of the MPS server instance of the given PID.set_default_device_pinned_mem_limit <dev> <value>– sets the default device pinned memory limit for each MPS client. If there is already a server spawned, this command will only affect the next server. The set value is lost if aquitcommand is executed. The dev argument may be a device UUID string or an integer ordinal. The value must be in the form of an integer followed by a qualifier, either “G” or “M” that specifies the value in Gigabyte or Megabyte respectively. For example, to set a limit of 10 gigabytes for device 0, use the following command:set_default_device_pinned_mem_limit 0 10GBy default, there is no memory limit set.
Note that for this command, the dev argument is not validated against available devices in the MPS server. Therefore, it is possible to set two memory limits for the same device: one by device UUID and another by ordinal. When an MPS server is started, whichever limit was set last will take effect. A limit set with an invalid device UUID or ordinal will be ignored when starting the MPS server.
get_default_device_pinned_mem_limit <dev>– queries the current default pinned memory limit for the device. Thedevargument may be device UUID string or an integer ordinal.Note that this command does not translate between device UUIDs or ordinals and will return the limit that was set for each device identifier via the
set_default_device_pinned_mem_limitcommand.set_device_pinned_mem_limit <PID> <dev> <value>- overrides the device pinned memory limit for MPS servers. This sets the device pinned memory limit for each client of MPS server instance of the given PID for the device dev. All clients created with that server afterwards will observe the new limit. Existing clients are not affected. Thedevargument may be a device UUID string or an integer ordinal. For example, to set a limit of 900MB for the server with pid 1024 for device 0, use the following command:set_device_pinned_mem_limit 1024 0 900Mget_device_pinned_mem_limit <PID> <dev>– queries the current device pinned memory limit of the MPS server instance of the given PID for the devicedev. Thedevargument may be a device UUID string or an integer ordinal.terminate_client <server PID> <client PID>– terminates all the outstanding GPU work of the MPS client process<client PID>running on the MPS server denoted by<server PID>. For example, to terminate the outstanding GPU work for an MPS client process with PID 1024 running on an MPS server with PID 123, use the following command:terminate_client 123 1024ps [-p PID]– reports a snapshot of the current client processes. It optionally takes the server instance PID. It displays the PID, the unique identifier assigned by the server, the partial UUID of the associated device, the PID of the connected server, the namespace PID, and the command line of the client.set_default_client_priority [priority]– sets the default client priority that will be used for new clients. The value is not applied to existing clients. Priority values should be considered as hints to the CUDA Driver, not guarantees. Allowed values are0 [NORMAL]and 1[BELOW NORMAL]. The set value is lost if aquitcommand is executed. The default is0 [NORMAL].get_default_client_priority– queries the current priority value that will be used for new clients.device_query [<server PID>] [--csv]– Queries the devices that are available to MPS clients. If a server PID is specified, then the command will output the device information for that server and ignore other servers. Ifcsvis specified, then the command will output the device information in a comma-separated format.sm_partition add <device UUID> <number of chunks>– creates an SM partition with the specified number of chunks on the given device. Upon successful creation, the full partition ID is displayed. This command accepts unique partial UUIDs of devices.sm_partition rm <device UUID> <partition>– removes the specified SM partition from the given device.lspart– displays the current SM partitioning configuration. The output includes the device UUID, partition IDs, free and used chunks, free and used SMs, and whether the partition is in use. The display uses unique partial UUIDs of devices.
nvidia-cuda-mps-server#
Typically stored under /usr/bin on Linux and QNX systems, this daemon is run under the
same $UID as the client application running on the node. The nvidia-cuda-mps-server
instances are created on-demand when client applications connect to the
control daemon. The server binary should not be invoked directly, and instead the
control daemon should be used to manage the startup and shutdown of servers.
The nvidia-cuda-mps-server process owns the CUDA context on the GPU and
uses it to execute GPU operations for its client application processes. Due to this, when
querying active processes via nvidia-smi (or any NVML-based application) nvidia-cuda-mps-server
will appear as the active CUDA process rather than any of the client processes.
The version of the nvidia-cuda-mps-server executable can be printed with:
nvidia-cuda-mps-server -v
nvidia-smi#
Typically stored under /usr/bin on Linux systems, this is used to configure GPUs on a
node. The following use cases are relevant to managing MPS:
man nvidia-smi # Describes usage of this utility.
nvidia-smi -L # List the GPU's on node.
nvidia-smi -q # List GPU state and configuration information.
nvidia-smi -q -d compute # Show the compute mode of each GPU.
nvidia-smi -i 0 -c EXCLUSIVE_PROCESS # Set GPU 0 to exclusive mode, run as root.
nvidia-smi -i 0 -c DEFAULT # Set GPU 0 to default mode, run as root. (SHARED_PROCESS)
nvidia-smi -i 0 -r # Reboot GPU 0 with the new setting.
Environment Variables#
CUDA_VISIBLE_DEVICES#
CUDA_VISIBLE_DEVICES is used to specify which GPU’s should be visible to a CUDA
application. Only the devices whose index or UUID is present in the sequence are visible
to CUDA applications and they are enumerated in the order of the sequence.
When CUDA_VISIBLE_DEVICES is set before launching the control daemon, the
devices will be remapped by the MPS server. This means that if your system has devices
0, 1 and 2, and if CUDA_VISIBLE_DEVICES is set to 0,2, then when a client connects
to the server it will see the remapped devices – device 0 and a device 1. Therefore,
keeping CUDA_VISIBLE_DEVICES set to 0,2 when launching the client would lead
to an error.
The MPS control daemon will further filter-out any pre-Volta devices, if any visible device is Volta+.
To avoid this ambiguity, we recommend using UUIDs instead of indices. These can be
viewed by launching nvidia-smi -q. When launching the server, or the application,
you can set CUDA_VISIBLE_DEVICES to UUID_1,UUID_2, where UUID_1 and
UUID_2 are the GPU UUIDs. It will also work when you specify the first few characters
of the UUID (including GPU-) rather than the full UUID.
The MPS server will fail to start if incompatible devices are visible after the application
of CUDA_VISIBLE_DEVICES.
CUDA_MPS_PIPE_DIRECTORY#
The MPS control daemon, the MPS server, and the associated MPS clients communicate
with each other via named pipes and UNIX domain sockets. The default directory for
these pipes and sockets is /tmp/nvidia-mps. The environment variable,
CUDA_MPS_PIPE_DIRECTORY, can be used to override the location of these pipes and
sockets. The value of this environment variable should be consistent across all MPS
clients sharing the same MPS server, and the MPS control daemon.
The recommended location for the directory containing these named pipes and domain
sockets is local folders such as /tmp. If the specified location exists in a shared, multi-node
filesystem, the path must be unique for each node to prevent multiple MPS servers
or MPS control daemons from using the same pipes and sockets. When provisioning
MPS on a per-user basis, the directory should be set to a location such that
different users will not end up using the same directory.
On Tegra platforms, there is no default directory setting for pipes and sockets. Users must set this environment variable such that only intended users have access to this location.
CUDA_MPS_LOG_DIRECTORY#
The MPS control daemon maintains a control.log file which contains the status of
its MPS servers, user commands issued and their result, and startup and shutdown
notices for the daemon. The MPS server maintains a server.log file containing its
startup and shutdown information and the status of its clients.
By default these log files are stored in the directory /var/log/nvidia-mps. The
CUDA_MPS_LOG_DIRECTORY environment variable can be used to override the default
value. This environment variable should be set in the MPS control daemon’s
environment and is automatically inherited by any MPS servers launched by that control daemon.
On Tegra platforms, there is no default directory setting for storing the log files. MPS will remain operational without the user setting this environment variable; however, in such instances, MPS logs will not be available. If logs are required to be captured, then the user must set this environment variable such that only intended users have access to this location.
CUDA_DEVICE_MAX_CONNECTIONS#
When encountered in the MPS client’s environment,
CUDA_DEVICE_MAX_CONNECTIONS sets the preferred number of compute and
copy engine concurrent connections (work queues) from the host to the device for that
client. The number actually allocated by the driver may differ from what is requested
based on hardware resource limitations or other considerations. Under MPS, each
server’s clients share one pool of connections, whereas without MPS each CUDA context
would be allocated its own separate connection pool. Volta MPS clients exclusively
owns the connections set aside for the client in the shared pool, so setting this
environment variable under Volta MPS may reduce the number of available clients. The
default value is 2 for Volta MPS clients.
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE#
On Volta GPUs, this environment variable sets the portion of the available threads that can be used by the client contexts. The limit can be configured at different levels.
MPS Control Daemon Level#
Setting this environment variable in an MPS control’s environment will configure the default active thread percentage when the MPS control daemon starts.
All the MPS servers spawned by the MPS control daemon will observe this limit. Once the MPS control daemon has started, changing this environment variable cannot affect the MPS servers.
Client Process Level#
Setting this environment variable in an MPS client’s environment will configure the
active thread percentage when the client process starts. The new limit will only further
constrain the limit set by the control daemon (via set_default_active_thread_percentage
or set_active_thread_percentage control daemon commands or this environment
variable at the MPS control daemon level). If the control daemon has a lower setting, the
control daemon setting will be obeyed by the client process instead.
All the client CUDA contexts created within the client process will observe the new limit. Once the client process has started, changing the value of this environment variable cannot affect the client CUDA contexts.
Client CUDA Context Level#
By default, configuring the active thread percentage at the client CUDA context level is
disabled. User must explicitly opt-in via environment variable
CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING. Refer to
CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING for more details.
Setting this environment variable within a client process will configure the active thread percentage when creating a new client CUDA context. The new limit will only further constraint the limit set at the control daemon level and the client process level. If the control daemon or the client process has a lower setting, the lower setting will be obeyed by the client CUDA context instead. All the client CUDA contexts created afterwards will observe the new limit. Existing client CUDA contexts are not affected.
CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING#
By default, users can only partition the available threads uniformly. An explicit opt-in via this environment variable is required to enable non-uniform partitioning capability. To enable non-uniform partitioning capability, this environment variable must be set before the client process starts.
When non-uniform partitioning capability is enabled in an MPS client’s environment,
client CUDA contexts can have different active thread percentages within the same
client process via setting CUDA_MPS_ACTIVE_THREAD_PERCENTAGE before
context creations. The device attribute cudaDevAttrMultiProcessorCount will
reflect the active thread percentage and return the portion of available SMs that can be
used by the client CUDA context current to the calling thread.
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT#
The pinned memory limit control limits the amount of GPU memory that is allocatable
by CUDA APIs by the client process. On Volta GPUs, this environment variable sets a
limit on pinned device memory that can be allocated by the client contexts. Setting this
environment variable in an MPS client’s environment will set the device’s pinned
memory limit when the client process starts. The new limit will only further constrain
the limit set by the control daemon (via set_default_device_pinned_mem_limit
or set_device_pinned_mem_limit control daemon commands or this environment
variable at the MPS control daemon level). If the control daemon has a lower value, the
control daemon setting will be obeyed by the client process instead. This environment
variable will have the same semantics as CUDA_VISIBLE_DEVICES i.e. the value string
can contain comma-separated device ordinals and/or device UUIDs with per device
memory limit separated by an equals. Example usage:
$ export CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=''0=1G,1=512MB''
The following example highlights the hierarchy and usage of the MPS memory limiting functionality.
# Set the default device pinned mem limit to 3G for device 0. The default limit constrains the memory allocation limit of all the MPS clients of future MPS servers to 3G on device 0.
$ nvidia-cuda-mps-control set_default_device_pinned_mem_limit 0 3G
# Start daemon in background process
$ nvidia-cuda-mps-control -d
# Set device pinned mem limit to 2G for device 0 for the server instance of the
# given PID. All the MPS clients on this server will observe this new limit of 2G
# instead of the default limit of 3G when allocating pinned device memory on device 0.
# Note -- users are allowed to specify a server limit (via set_device_pinned_mem_limit)
# greater than the default limit previously set by set_default_device_pinned_mem_limit.
$ nvidia-cuda-mps-control set_device_pinned_mem_limit <pid> 0 2G
# Further constrain the device pinned mem limit for a particular MPS client to 1G for
# device 0. This ensures the maximum amount of memory allocated by this client is capped
# at 1G.
# Note - setting this environment variable to a value greater than value observed by the
# server for its clients (through set_default_device_pinned_mem_limit/ set_device_pinned_mem_limit)
* will not set the limit to the higher value and thus will be ineffective and the eventual
# limit observed by the client will be that observed by the server.
$ export CUDA_MPS_DEVICE_MEM_LIMIT="0=1G"
CUDA_MPS_CLIENT_PRIORITY#
The client priority level variable controls the initial default server value for the MPS Control Daemon if used to launch that, or the client priority level value for a given client if used in a client launch. The following examples demonstrate both usages.
# Set the default client priority level for new servers and clients to Below Normal
$ export CUDA_MPS_CLIENT_PRIORITY=1
$ nvidia-cuda-mps-control -d
# Set the client priority level for a single program to Normal without changing the priority level for future clients
$ CUDA_MPS_CLIENT_PRIORITY=0 <program>
Note
CUDA priority levels are not guarantees of execution order – they are only a performance hint to the CUDA driver.
MPS Logging Format#
Control Log#
Some of the example messages logged by the control daemon:
Startup and shutdown of MPS servers identified by their process IDs and the user ID with which they are being launched.
[2013-08-05 12:50:23.347 Control 13894] Starting new server 13929 for user 500[2013-08-05 12:50:24.870 Control 13894] NEW SERVER 13929: Ready[2013-08-05 13:02:26.226 Control 13894] Server 13929 exited with status 0New MPS client connections identified by the client process ID and the user ID of the user that launched the client process.
[2013-08-05 13:02:10.866 Control 13894] NEW CLIENT 19276 from user 500: Server already exists[2013-08-05 13:02:10.961 Control 13894] Accepting connection...User commands issued to the control daemon and their result.
[2013-08-05 12:50:23.347 Control 13894] Starting new server 13929 for user 500[2013-08-05 12:50:24.870 Control 13894] NEW SERVER 13929: ReadyError information such as failing to establish a connection with a client.
[2013-08-05 13:02:10.961 Control 13894] Accepting connection...[2013-08-05 13:02:10.961 Control 13894] Unable to read new connection type information
Server Log#
Some of the example messages logged by the MPS server:
New MPS client connections and disconnections identified by the client process ID.
[2013-08-05 13:00:09.269 Server 13929] New client 14781 connected[2013-08-05 13:00:09.270 Server 13929] Client 14777 disconnectedError information such as the MPS server failing to start due to system requirements not being met.
[2013-08-06 10:51:31.706 Server 29489] MPS server failed to start[2013-08-06 10:51:31.706 Server 29489] MPS is only supported on 64-bit Linux platforms, with an SM 3.5 or higher GPU.Information about fatal GPU error containment on Volta+ MPS
[2022-04-28 15:56:07.410 Other 11570] Volta MPS: status of client {11661, 1} is ACTIVE[2022-04-28 15:56:07.468 Other 11570] Volta MPS: status of client {11663, 1} is ACTIVE[2022-04-28 15:56:07.518 Other 11570] Volta MPS: status of client {11643, 2} is ACTIVE[2022-04-28 15:56:08.906 Other 11570] Volta MPS: Server is handling a fatal GPU error.[2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11641, 1} is INACTIVE[2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11643, 1} is INACTIVE[2022-04-28 15:56:08.906 Other 11570] Volta MPS: status of client {11643, 2} is INACTIVE[2022-04-28 15:56:08.906 Other 11570] Volta MPS: The following devices[2022-04-28 15:56:08.906 Other 11570] 0[2022-04-28 15:56:08.907 Other 11570] 1[2022-04-28 15:56:08.907 Other 11570] Volta MPS: The following clients have a sticky error set:[2022-04-28 15:56:08.907 Other 11570] 11641[2022-04-28 15:56:08.907 Other 11570] 11643[2022-04-28 15:56:09.200 Other 11570] Client {11641, 1} exit[2022-04-28 15:56:09.244 Other 11570] Client {11643, 1} exit[2022-04-28 15:56:09.244 Other 11570] Client {11643, 2} exit[2022-04-28 15:56:09.245 Other 11570] Volta MPS: Destroy server context on device 0[2022-04-28 15:56:09.269 Other 11570] Volta MPS: Destroy server context on device 1[2022-04-28 15:56:10.310 Other 11570] Volta MPS: Creating server context on device 0[2022-04-28 15:56:10.397 Other 11570] Volta MPS: Creating server context on device 1
MPS Known Issues#
Clients may fail to start, returning
ERROR_OUT_OF_MEMORYwhen the first CUDA context is created, even though there are fewer client contexts than the hard limit of 16.Comments: When creating a context, the client tries to reserve virtual address space for the Unified Virtual Addressing memory range. On certain systems, this can clash with the system linker and the dynamic shared libraries loaded by it. Ensure that CUDA initialization (for example,
cuInit()or anycuda*()Runtime API function) is one of the first functions called in your code. To provide a hint to the linker and to the Linux kernel that you want your dynamic shared libraries higher up in the VA space (where it won’t clash with CUDA’s UVA range), compile your code as PIC (Position Independent Code) and PIE (Position Independent Executable). Refer to your compiler manual for instructions on how to achieve this.Memory allocation API calls (including context creation) may fail with the following message in the server log: MPS Server failed to create/open SHM segment.
Comments: This is most likely due to exhausting the file descriptor limit on your system. Check the maximum number of open file descriptors allowed on your system and increase if necessary. We recommend setting it to 16384 and higher. Typically this information can be checked via the command
ulimit -n; refer to your operating system instructions on how to change the limit.