Compiling the tools jar#
How to compile the tools jar#
See instructions here: NVIDIA/spark-rapids-tools
If any input is a S3 file path or directory path, 2 extra steps are needed to access S3 in Spark:
- Download the matched jars based on the Hadoop version: - hadoop-aws-<version>.jar
- aws-java-sdk-<version>.jar
 
- Take Hadoop 2.7.4 for example, we can download and include below jars in the ‘–jars’ option to spark-shell or spark-submit: hadoop-aws-2.7.4.jar and aws-java-sdk-1.7.4.jar 
- In $SPARK_HOME/conf, create - hdfs-site.xmlwith below AWS S3 keys inside:
 1<?xml version="1.0"?>
 2<configuration>
 3<property>
 4  <name>fs.s3a.access.key</name>
 5  <value>xxx</value>
 6</property>
 7<property>
 8  <name>fs.s3a.secret.key</name>
 9  <value>xxx</value>
10</property>
11</configuration>
Please refer to this doc on more options about integrating hadoop-aws module with S3.