Running Churn Benchmark

In the previous step you should have updated the lp-runjupyter-etl-gpu.sh and lp-runjupyter-etl-cpu.sh file with the correct IP address for your LaunchPad instance. If you have not done this refer to Setting up the environment.

  1. First cd to /home/spark and execute the /home/spark/lp-runjupyter-etl-gpu.sh or /home/spark/lp-runjupyter-etl-cpu.sh in the System Console.

  2. In the left menu open up the Desktop link and click the VNC connect button.

    spark-rapids-lab1-01.png

  3. Open the web browser in the Linux desktop.

    spark-rapids-lab1-02.png

  4. Browse to 172.16.0.10:30002.

    spark-rapids-lab1-03.png

  5. You should see the list above.

  6. Create dataset for use with ETL job.

    Copy
    Copied!
                

    mkdir -p /data/churn/input mkdir -p /data/churn/output chmod 777 /data/churn/*

    • Open a bash session into the running container.

    Copy
    Copied!
                

    kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash

    • Copy the seed file.

    Copy
    Copied!
                

    cp /home/spark/WA_Fn-UseC_-Telco-Customer-Churn-.csv /data/churn/input exit

  7. Click the lp-churn-augment.ipynb link to start the Jupyter notebook.

    spark-rapids-lab3-01.png

  8. Validate the creation of the Churn Benchmark pods with the following command

    Copy
    Copied!
                

    kubectl get pods | grep app-name

    • The output should look similar to this.

    Copy
    Copied!
                

    app-name-79d837808b2d2ba5-exec-1 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-2 1/1 Running 0 31m app-name-79d837808b2d2ba5-exec-3 1/1 Running 0 31m

    • If you see that your pods are in a PENDING status then the previous pods did not close properly. You can remove those pods with the following command:

    Copy
    Copied!
                

    kubectl delete pod app-name-XXXX

  9. Run the notebook by clicking Cell -> Run All.

    spark-rapids-lab3-03.png

  10. Confirm the creation of the Churn dataset.

    • Review output of notebook.

    spark-rapids-lab3-04.png

    • In the System Console run (should see approximately 21G of data).

    Copy
    Copied!
                

    du -hs /data/churn/output

  11. Click the lp-churn-etl.ipynb link to start the Juypter notebook.

  12. Run the notebook by clicking Cell -> Run All

    spark-rapids-lab3-05.png

  13. Note the timing for the benchmark so you can compare to your CPU run time.

    spark-rapids-lab3-06.png

  14. Stop the notebook you started in step 1 by pressing ctrl-c in the System Console window that you started the notebook. Answer Y when asked if you want to “Shutdown this notebook server?”.

  15. Run the same Churn Benchmark using only CPUs.

    • Execute the lp-runjupyter-etl-cpu.sh script.

  16. Compare the differences between the two outputs.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.