Overview

The Microbenchmarks Lab is derived from the spark-rapids-example repo. The microbenchmark on RAPIDS Accelerator For Apache Spark is to identify, test and analyze the best queries which can be accelerated on the GPU.

The queries are based on several tables in Parquet format derived from the TPC-DS benchmark, so that similar speedups can be reproducible by others. The microbenchmarks include commonly used Spark SQL operations such as expand, hash aggregate, windowing, and cross join and runs the same queries in CPU mode and GPU mode. You can see some queries are faster the second time, which can be caused by JVM JIT, initialization overhead or caching input data in the OS page cache, etc. The improved performance is influenced by many components, including the dataset’s scale factors and the GPU accelerator model.

  1. Connect to System Console using the left-hand navigation menu link.

  2. Find current IP address of Spark-RAPIDS pod.

    Copy
    Copied!
                

    kubectl describe pod sparkrunner-0 | grep IP


  3. Connect to the sparkrunner pod.

    Copy
    Copied!
                

    kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash


  4. Update the /home/spark/lp-runjupyter-etl-gpu.sh and /home/spark/lp-runjupyter-etl-cpu.sh script and replace the following line with the correct IP.

    Copy
    Copied!
                

    SPARK_DRIVER_HOST=192.168.<IP>.<IP>


  5. Copy and paste is available on the Desktop VNC connection. You will see a sidebar on the left of the screen and once that is opened you can paste into the clipboard. Once you have pasted something it is immediately available to paste within the VNC desktop

spark-rapids-overview-02.png

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.