IMEX Performance Statistics#

The IMEX service is crucial for NVLink Multi-Node CUDA jobs, so we recommend that you monitor IMEX queues and message exchange performance statistics. IMEX has a lightweight, thread-safe framework to periodically log these statistics, which are stored by default in the /var/log/nvidia-imex-stats.log file. This logging mechanism ensures visibility into the IMEX performance, which allows the proactive management and optimization of CUDA jobs.

Statistics logging can be redirected to another location with the STATS_FILE_NAME config option.

  • Here are the queue statistics:

    • queueDepth: The number of elements in a queue.

    • oldestDurationMillis: The age of the oldest element in a queue (in msec).

  • Here are the message statistics (per-direction, per request-type):

    • avgQueueTimeMillis: The average time spent queued for processing (in msec).

    • avgOperTimeMillis: The average time spent executing the task (in msec).

    • avgTotalTimeMillis: The average total time between queueing to completion (in msec).

Here is a snippet from nvidia-imex-stats.log after a sample application is run:

Performance statistics for last 30 seconds  : {
 "QueueStats": {
  "ProcessorQueue": {
   "queueDepth": "0",
   "oldestDurationMillis": 0
  },
  "PendingFreeQueue": {
   "queueDepth": "0",
   "oldestDurationMillis": 0
  }
 },
 "IncomingMessages": {
  "UnimportResponse": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.01,
   "avgOperTimeMillis": 0.06,
   "avgTotalTimeMillis": 0.07
  },
  "UnicastImportRequest": {
   "msgCount": "7",
   "avgQueueTimeMillis": 1.14,
   "avgOperTimeMillis": 0.22,
   "avgTotalTimeMillis": 1.35
  },
  "Heartbeat": {
   "msgCount": "80",
   "avgQueueTimeMillis": 0.01,
   "avgOperTimeMillis": 0.01,
   "avgTotalTimeMillis": 0.01
  },
  "ImportResponse": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.01,
   "avgOperTimeMillis": 0.02,
   "avgTotalTimeMillis": 0.03
  },
  "UnimportRequest": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.04,
   "avgOperTimeMillis": 0.02,
   "avgTotalTimeMillis": 0.05
  }
 },
 "OutgoingMessages": {
  "UnicastImportRequest": {
   "msgCount": "7",
   "avgQueueTimeMillis": 1.21,
   "avgOperTimeMillis": 0,
   "avgTotalTimeMillis": 1.21
  },
  "UnimportRequest": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.14,
   "avgOperTimeMillis": 0,
   "avgTotalTimeMillis": 0.14
  },
  "ImportResponse": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.19,
   "avgOperTimeMillis": 0,
   "avgTotalTimeMillis": 0.19
  },
  "UnimportResponse": {
   "msgCount": "7",
   "avgQueueTimeMillis": 0.1,
   "avgOperTimeMillis": 0,
   "avgTotalTimeMillis": 0.1
  }
 }
}

Note: For outgoing messages, use avgTotalTimeMillis, which reflects the time to deliver the message.

Note: NVLink P2P memory import and unimport requests expect a response. |