Running Mortgage Benchmark

Accelerating Apache Spark with Zero Code Changes (Latest Version)
  1. Connect to System Console using the left-hand menu link.

  2. Connect to the sparkrunner pod.


    kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash

  3. cd to /home/spark/spark-scripts and execute the /home/spark/spark-scripts/ or /home/spark/spark-scripts/ in the System Console.

  4. In the left menu open up the Desktop link and click the VNC connect button.


  5. Open the web browser in the Linux desktop.


  6. Browse to


  7. You should see the list above.

  8. Click the lp-mortgageETL.ipynb link and this should start the Jupyter notebook.



    Please “trust” the notebook before you run it.


  9. Validate the creation of the Mortgage Benchmark pods with the following command.

    • Open another System Console.


    kubectl get pods | grep app-name

    • The output should look similar to this.


    app-name-79d837808b2d2ba5-exec-1 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-2 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-3 1/1 Running 0 31m

  10. Create two directories for the Mortgage Dataset in the console session from the previous step.


    cd `mount | awk -F ':' '/spark-rapids-claim/ {print $2}'|grep var | awk '{print $1}'` mkdir -p mortgage/input mkdir -p mortgage/output chmod 777 mortgage/output

  11. From the LaunchPad Desktop download the input dataset from the Fannie Mae website.

    • Go to Single-Family Loan Performance Data page.

      • Login or Register as a new user.

    • Select HP.

      • Click on Download Data and choose Single-Family Loan Performance Data. You will find a tabular list of Acquisition and Performance files sorted based on year and quarter. Click on the file to download. Eg:

      • Unzip the downloaded file to extract the csv file: Eg: 2017Q1.csv

      • Copy the csv files to the GPU node.


      scp 2017Q1.csv nvidia@${your-default-spark-rapids-claim-path}/mortgage/input/

  12. Run the notebook by clicking Cell -> Run All.


  13. Note the timing for the benchmark so you can compare to your CPU run time.


  14. Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.

  15. Run the same Mortgage Benchmark using only CPUs.

    • Execute the script.

  16. Compare the differences between the two outputs.


You must close the notebook tab and then shutdown the notebook to start another session. If this is not done you will not be able to start another spark session.

© Copyright 2022-2023, NVIDIA. Last updated on Jun 23, 2023.