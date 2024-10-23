Existing ML Libraries#

The RAPIDS Accelerator for Apache Spark can be used to accelerate the ETL portions (for example,, loading training data from parquet files) of applications using ML libraries with Spark DataFrame APIs. Examples of such libraries include the original Apache Spark MLlib, XGBoost, RAPIDS Accelerator ML, and the DL inference UDF function introduced in Spark 3.4. The latter three also enable leveraging GPUs (in the case of the DL inference UDF, indirectly via the underlying DL framework) to accelerate the core ML algorithms, and thus, in conjunction with the RAPIDS Accelerator for Apache Spark for ETL, can further enhance the cost-benefit of GPU accelerated Spark clusters.

For Spark API compatible ML libraries that implement their core ML computations inside pandas UDFs, such as XGBoost’s pySpark API, RAPIDS Accelerator ML pySpark API, and the DL inference UDF it’s recommended to enable the RAPIDS Accelerator for Apache Spark’s support for GPU accelerated pandas UDFs.