Running Mortgage Benchmark
Connect to System Console using the left-hand menu link.
Connect to the sparkrunner pod.
kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash
cd to /home/spark/spark-scripts and execute the
/home/spark/spark-scripts/lp-runjupyter-etl-gpu.sh
or/home/spark/spark-scripts/lp-runjupyter-etl-cpu.sh
in the System Console.In the left menu open up the Desktop link and click the VNC connect button.
Open the web browser in the Linux desktop.
Browse to 172.16.0.10:30002.
You should see the list above.
Click the
lp-mortgageETL.ipynb
link and this should start the Jupyter notebook.NotePlease “trust” the notebook before you run it.
Validate the creation of the Mortgage Benchmark pods with the following command.
Open another System Console.
kubectl get pods | grep app-name
The output should look similar to this.
app-name-79d837808b2d2ba5-exec-1 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-2 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-3 1/1 Running 0 31m
Create two directories for the Mortgage Dataset in the console session from the previous step.
cd `mount | awk -F ':' '/spark-rapids-claim/ {print $2}'|grep var | awk '{print $1}'` mkdir -p mortgage/input mkdir -p mortgage/output chmod 777 mortgage/output
From the LaunchPad Desktop download the input dataset from the Fannie Mae website.
Go to Single-Family Loan Performance Data page.
Login or Register as a new user.
Select HP.
Click on Download Data and choose Single-Family Loan Performance Data. You will find a tabular list of Acquisition and Performance files sorted based on year and quarter. Click on the file to download. Eg: 2017Q1.zip
Unzip the downloaded file to extract the csv file: Eg: 2017Q1.csv
Copy the csv files to the GPU node.
scp 2017Q1.csv nvidia@172.16.0.10:/data/${your-default-spark-rapids-claim-path}/mortgage/input/
Run the notebook by clicking Cell -> Run All.
Note the timing for the benchmark so you can compare to your CPU run time.
Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.
Run the same Mortgage Benchmark using only CPUs.
Execute the
lp-runjupyter-etl-cpu.sh
script.
Compare the differences between the two outputs.
You must close the notebook tab and then shutdown the notebook to start another session. If this is not done you will not be able to start another spark session.