Tools Reports API#
The qualification and profiling tools write their results as a directory of CSV, JSON, and text files. To make those results easier to consume from Python — for example, in notebooks, custom dashboards, or downstream pipelines — the tools ship a small Python API called spark_rapids_tools.api_v1.
The API reads the same files that the tools produce; it does not re-run any analysis. Use it when you want to:
Load a qualification or profiling summary into a pandas
DataFramewithout parsing filenames yourself.Fetch per-application data by application ID instead of scanning subdirectories.
Read outputs directly from cloud storage (S3, GCS, ABFS) or HDFS paths, alongside local paths.
Combine qualification and profiling outputs for the same workload in one script.
Installing#
The API ships with the spark-rapids-user-tools pip package, so if you have already installed the CLI there is nothing extra to do. Otherwise:
pip install spark-rapids-user-tools
Quickstart#
Point a handler at a qualification or profiling output directory and load a table:
from spark_rapids_tools.api_v1 import QualCore, CSVReport
handler = QualCore("/path/to/qual_20240814145334_d2CaFA34/qual_core_output")
summary = CSVReport(handler).table("qualCoreCSVSummary").load()
# summary.data is a pandas DataFrame
print(f"Analysed {len(summary.data)} applications")
The same pattern works for profiling output:
from spark_rapids_tools.api_v1 import ProfCore, CSVReport
handler = ProfCore("s3://my-bucket/profiling/prof_20240105163618_9e2B995F")
app_info = CSVReport(handler).table("coreRawApplicationInformationCSV").load()
Per-application tables take one or more application IDs:
stages = (CSVReport(handler)
.table("coreRawStageLevelAggregatedTaskMetricsCSV")
.apps(["application_001", "application_002"])
.load())
for app_id, result in stages.items():
print(app_id, len(result.data))
Handlers and Report Types#
Four handlers cover the common output shapes:
Handler |
Use it for |
|---|---|
|
The |
|
The top-level qualification run directory (wrapper metadata, summary CSVs). |
|
The |
|
The top-level profiling run directory. |
Report builders match the on-disk format: CSVReport for CSV tables, JSONReport for JSON files, JPropsReport for Java-properties files, TXTReport for plain text and JSONL files (including tuning recommendations).
Reference#
The full API reference — including advanced options (fallback callbacks, column renaming, custom pandas arguments), all supported table labels, and end-to-end examples combining qualification, profiling, and tuning outputs — lives with the tools source: Tools Reports API documentation.