In the previous step you should have updated the lp-runjupyter-etl-gpu.sh and lp-runjupyter-etl-cpu.sh
file with the correct IP address for your LaunchPad instance. If you have not done this refer to Setting up the environment.
First cd to /home/spark and execute the
/home/spark/lp-runjupyter-etl-gpu.sh
or/home/spark/lp-runjupyter-etl-cpu.sh
in the System Console.In the left menu open up the Desktop link and click the VNC connect button.
Open the web browser in the Linux desktop.
Browse to 172.16.0.10:30002.
You should see the list above.
Click the
lp-mortgageETL.ipynb
link and this should start the Jupyter notebook.Validate the creation of the Mortgage Benchmark pods with the following command.
Open another System Console.
kubectl get pods | grep app-name
The output should look similar to this.
app-name-79d837808b2d2ba5-exec-1 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-2 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-3 1/1 Running 0 31m
Create two directories for the Mortgage Dataset in the console session from the previous step.
mkdir -p /data/mortgage/input mkdir -p /data/mortgage/output chmod 777 /data/mortgage/output
From the LaunchPad Desktop download the input dataset from the Fannie Mae website.
Go to Single-Family Loan Performance Data page.
Login or Register as a new user.
Select HP.
Click on Download Data and choose Single-Family Loan Performance Data. You will find a tabular list of Acquisition and Performance files sorted based on year and quarter. Click on the file to download. Eg: 2017Q1.zip
Unzip the downloaded file to extract the csv file: Eg: 2017Q1.csv
Copy the csv files to the GPU node.
scp 2017Q1.csv nvidia@172.16.0.10:/data/mortgage/input
Run the notebook by clicking Cell -> Run All.
Note the timing for the benchmark so you can compare to your CPU run time.
Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.
Run the same Mortgage Benchmark using only CPUs.
Execute the
lp-runjupyter-etl-cpu.sh
script.
Compare the differences between the two outputs.