Tools Reports API#

The qualification and profiling tools write their results as a directory of CSV, JSON, and text files. To make those results easier to consume from Python — for example, in notebooks, custom dashboards, or downstream pipelines — the tools ship a small Python API called spark_rapids_tools.api_v1.

The API reads the same files that the tools produce; it does not re-run any analysis. Use it when you want to:

  • Load a qualification or profiling summary into a pandas DataFrame without parsing filenames yourself.

  • Fetch per-application data by application ID instead of scanning subdirectories.

  • Read outputs directly from cloud storage (S3, GCS, ABFS) or HDFS paths, alongside local paths.

  • Combine qualification and profiling outputs for the same workload in one script.

Installing#

The API ships with the spark-rapids-user-tools pip package, so if you have already installed the CLI there is nothing extra to do. Otherwise:

pip install spark-rapids-user-tools

Quickstart#

Point a handler at a qualification or profiling output directory and load a table:

from spark_rapids_tools.api_v1 import QualCore, CSVReport

handler = QualCore("/path/to/qual_20240814145334_d2CaFA34/qual_core_output")

summary = CSVReport(handler).table("qualCoreCSVSummary").load()

# summary.data is a pandas DataFrame
print(f"Analysed {len(summary.data)} applications")

The same pattern works for profiling output:

from spark_rapids_tools.api_v1 import ProfCore, CSVReport

handler = ProfCore("s3://my-bucket/profiling/prof_20240105163618_9e2B995F")

app_info = CSVReport(handler).table("coreRawApplicationInformationCSV").load()

Per-application tables take one or more application IDs:

stages = (CSVReport(handler)
    .table("coreRawStageLevelAggregatedTaskMetricsCSV")
    .apps(["application_001", "application_002"])
    .load())

for app_id, result in stages.items():
    print(app_id, len(result.data))

Handlers and Report Types#

Four handlers cover the common output shapes:

Handler

Use it for

QualCore

The qual_core_output/ directory produced by the qualification tool.

QualWrapper

The top-level qualification run directory (wrapper metadata, summary CSVs).

ProfCore

The rapids_4_spark_profile/ directory produced by the profiling tool.

ProfWrapper

The top-level profiling run directory.

Report builders match the on-disk format: CSVReport for CSV tables, JSONReport for JSON files, JPropsReport for Java-properties files, TXTReport for plain text and JSONL files (including tuning recommendations).

Reference#

The full API reference — including advanced options (fallback callbacks, column renaming, custom pandas arguments), all supported table labels, and end-to-end examples combining qualification, profiling, and tuning outputs — lives with the tools source: Tools Reports API documentation.