NVIDIA Docs Hub NVIDIA LaunchPad Accelerating Apache Spark with Zero Code Changes Running Mortgage Benchmark

Running Mortgage Benchmark

Connect to System Console using the left-hand menu link.

Connect to the sparkrunner pod.

Copy
Copied!

            
            kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash

cd to /home/spark/spark-scripts and execute the /home/spark/spark-scripts/lp-runjupyter-etl-gpu.sh or /home/spark/spark-scripts/lp-runjupyter-etl-cpu.sh in the System Console.
In the left menu open up the Desktop link and click the VNC connect button.
Open the web browser in the Linux desktop.
Browse to 172.16.0.10:30002.
You should see the list above.
Click the lp-mortgageETL.ipynb link and this should start the Jupyter notebook.

Note

Please “trust” the notebook before you run it.

Validate the creation of the Mortgage Benchmark pods with the following command.

Open another System Console.

Copy
Copied!

            
            kubectl get pods | grep app-name

The output should look similar to this.

Copy
Copied!

            
            app-name-79d837808b2d2ba5-exec-1   1/1     Running   0          31m
app-name-79d837808b2d2ba5-exec-2   1/1     Running   0          31m
app-name-79d837808b2d2ba5-exec-3   1/1     Running   0          31m

Create two directories for the Mortgage Dataset in the console session from the previous step.

Copy
Copied!

            
            cd `mount | awk -F ':' '/spark-rapids-claim/ {print $2}'|grep var | awk '{print $1}'`
mkdir -p mortgage/input
mkdir -p mortgage/output
chmod 777 mortgage/output

From the LaunchPad Desktop download the input dataset from the Fannie Mae website.
- Go to Single-Family Loan Performance Data page.
  - Login or Register as a new user.
- Select HP.
  - Click on Download Data and choose Single-Family Loan Performance Data. You will find a tabular list of Acquisition and Performance files sorted based on year and quarter. Click on the file to download. Eg: 2017Q1.zip
  - Unzip the downloaded file to extract the csv file: Eg: 2017Q1.csv
  - Copy the csv files to the GPU node.
  Copy
  
  Copied!
  
  scp 2017Q1.csv nvidia@172.16.0.10:/data/${your-default-spark-rapids-claim-path}/mortgage/input/
Run the notebook by clicking Cell -> Run All.
Note the timing for the benchmark so you can compare to your CPU run time.
Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.
Run the same Mortgage Benchmark using only CPUs.
- Execute the lp-runjupyter-etl-cpu.sh script.
Compare the differences between the two outputs.

Note

You must close the notebook tab and then shutdown the notebook to start another session. If this is not done you will not be able to start another spark session.