Running Mortgage Benchmark

In the previous step you should have updated the lp-runjupyter-etl-gpu.sh and lp-runjupyter-etl-cpu.sh file with the correct IP address for your LaunchPad instance. If you have not done this refer to Setting up the environment.

  1. First cd to /home/spark and execute the /home/spark/lp-runjupyter-etl-gpu.sh or /home/spark/lp-runjupyter-etl-cpu.sh in the System Console.

  2. In the left menu open up the Desktop link and click the VNC connect button.

    spark-rapids-lab1-01.png

  3. Open the web browser in the Linux desktop.

    spark-rapids-lab1-02.png

  4. Browse to 172.16.0.10:30002.

    spark-rapids-lab1-03.png

  5. You should see the list above.

  6. Click the lp-mortgageETL.ipynb link and this should start the Jupyter notebook.

    spark-rapids-lab2-01.png

  7. Validate the creation of the Mortgage Benchmark pods with the following command.

    Copy
    Copied!
                

    kubectl get pods | grep app-name

    • The output should look similar to this.

    Copy
    Copied!
                

    app-name-79d837808b2d2ba5-exec-1 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-2 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-3 1/1 Running 0 31m

  8. Create two directories for the Mortgage Dataset in the console session from the previous step.

    Copy
    Copied!
                

    mkdir -p /data/mortgage/input mkdir -p /data/mortgage/output chmod 777 /data/mortgage/output

  9. From the LaunchPad Desktop download the input dataset from the Fannie Mae website.

    • Go to Single-Family Loan Performance Data page.

      • Login or Register as a new user.

    • Select HP.

      • Click on Download Data and choose Single-Family Loan Performance Data. You will find a tabular list of Acquisition and Performance files sorted based on year and quarter. Click on the file to download. Eg: 2017Q1.zip

      • Unzip the downloaded file to extract the csv file: Eg: 2017Q1.csv

      • Copy the csv files to the GPU node.

      Copy
      Copied!
                  

      scp 2017Q1.csv nvidia@172.16.0.10:/data/mortgage/input


  10. Run the notebook by clicking Cell -> Run All.

    spark-rapids-lab2-02.png

  11. Note the timing for the benchmark so you can compare to your CPU run time.

    spark-rapids-lab2-03.png

  12. Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.

  13. Run the same Mortgage Benchmark using only CPUs.

    • Execute the lp-runjupyter-etl-cpu.sh script.

  14. Compare the differences between the two outputs.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.