nvidia-cuda-mps-control

Typically stored under /usr/bin on Linux systems and typically run with superuser privileges, this control daemon is used to manage the nvidia-cuda-mps-server described in the section following. These are the relevant use cases:

man nvidia-cuda-mps-control # Describes usage of this utility.

nvidia-cuda-mps-control -d # Start daemon as a background process.

ps -ef | grep mps # See if the MPS daemon is running.

echo quit | nvidia-cuda-mps-control # Shut the daemon down.

nvidia-cuda-mps-control # Start in interactive mode.

When used in interactive mode, the available commands are

get_server_list – this will print out a list of all PIDs of server instances.
start_server –uid <user id> - this will manually start a new instance of nvidia-cuda-mps-server with the given user ID.
get_client_list <PID> - this lists the PIDs of client applications connected to a server instance assigned to the given PID
quit – terminates the nvidia-cuda-mps-control daemon
Commands available to Volta MPS control:
get_device_client_list [<PID>] - this lists the devices and PIDs of client applications that enumerated this device. It optionally takes the server instance PID.
set_default_active_thread_percentage <percentage> - this sets the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server. The set value is lost if a quit command is executed. The default is 100.
get_default_active_thread_percentage - queries the current default available thread percentage.
set_active_thread_percentage <PID> <percentage> - this sets the active thread percentage for the MPS server instance of the given PID. All clients created with that server afterwards will observe the new limit. Existing clients are not affected.
get_active_thread_percentage <PID> - queries the current available thread percentage of the MPS server instance of the given PID.

Only one instance of the nvidia-cuda-mps-control daemon should be run per node.