General Cluster information

This is a one node instance k8s cluster with an A100 or H100 GPU which was split into three multi-instance GPUs (MIG).

Display GPU information

Use the following command to see the current GPU information. The GPU should be an NVIDIA A100 or H100 80GB that has been turned into 3 Multi-Instance GPU (MIG) instances.

Copy
Copied!

            
            kubectl run nvidia-smi --rm -t -i --restart=Never --image=nvidia/cuda:12.0.0-base-ubuntu20.04  nvidia-smi

Output should look similar to the one below.

Copy
Copied!

            
            +-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:CA:00.0 Off |                   On |
| N/A   42C    P0    81W / 300W |                  N/A |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    3   0   0  |      6MiB / 19968MiB | 28      0 |  2   0    1    0    0 |
|                  |      0MiB / 32767MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    4   0   1  |      6MiB / 19968MiB | 28      0 |  2   0    1    0    0 |
|                  |      0MiB / 32767MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    5   0   2  |      6MiB / 19968MiB | 28      0 |  2   0    1    0    0 |
|                  |      0MiB / 32767MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Launchpad Environment Information

The benchmarks use interactive Jupyter notebook applications that will run on the LaunchPad cluster deployed in Kubernetes. You can access the cluster by clicking the System Console link in the left menu. Once you are connected you can connect to the sparkrunner pod according to each benchmark guide and run the benchmarks. We also deploy some services such as spark history server/spark/jupyter which you can access by clicking the Desktop tab for monitoring the application status, view eventlogs easily. You can also ssh to the cluster in the desktop in the terminal.