Mortgage demo is derived from spark-rapids-example repo. This is an ETL spark job with the input dataset which is derived from Fannie Mae’s Single-Family Loan Performance Data, and wll generate two datasets for train and test.
We have provided a sample dataset in aws s3, you can also download different scale dataset from https://docs.rapids.ai/datasets/mortgage-data to /data.
Connect to System Console using the left-hand menu link.
Find current IP address of Spark-RAPIDS pod.
kubectl describe pod sparkrunner-0 | grep IP
Connect to the sparkrunner pod.
kubectl exec --stdin --tty sparkrunner-0 -- /bin/bash
Update the
/home/spark/lp-runjupyter-etl-gpu.sh
and/home/spark/lp-runjupyter-etl-cpu.sh
script and replace the following line with the correct IP.SPARK_DRIVER_HOST=192.168.<IP>.<IP>
Copy and paste is available on the Desktop VNC connection. You will see a sidebar on the left of the screen and once that is opened you can paste into the clipboard. Once you have pasted something it is immediately available to paste within the VNC desktop
