Overview

Accelerating Apache Spark with Zero Code Changes (Latest Version)

The Customer Churn lab is derived from the data-science-blueprints repo, where customer churn is modeled from federating data and performing exploratory query-based analytics to feature engineering, model training, and model operationalization. The data engineering portions of the blueprint are accelerated with the RAPIDS Accelerator for Apache Spark and the machine learning portions are accelerated with the RAPIDS Python libraries.This lab shows a realistic ETL workflow based on synthetic normalized data. It consists of two pieces:

  • An augmentation script, which synthesizes normalized (long-form) data from a wide-form input file, optionally augmenting it by duplicating records.

  • An ETL script, which performs joins and aggregations in order to generate wide-form data from the synthetic long-form data.

  1. Copy and paste is available on the Desktop VNC connection. You will see a sidebar on the left of the screen and once that is opened you can paste into the clipboard. Once you have pasted something it is immediately available to paste within the VNC desktop

spark-rapids-003.png


© Copyright 2022-2023, NVIDIA. Last updated on Jun 23, 2023.