QualX#

QualX is the speedup prediction engine that powers the qualification tool. It is an XGBoost model trained on paired CPU and GPU runs of representative Spark workloads and is used to predict how much faster an application would run on GPU.

You do not need to interact with QualX directly to use the qualification tool — running spark_rapids qualification invokes QualX internally. This page explains how predictions are produced, what outputs QualX generates, and when you might want to override the default model.

How Predictions Are Produced#

When the qualification tool processes a CPU event log, it extracts features describing the workload (SQL plan shape, operator mix, task metrics, read schemas, and similar signals) and feeds them to the QualX model. The model predicts two quantities:

  • An estimated GPU duration per SQL query and per application.

  • An estimated GPU speedup, computed as the ratio of CPU duration to predicted GPU duration.

These values flow into the standard qualification outputs (Estimated GPU Speedup, Estimated GPU Duration, Estimated GPU Speedup Category).

QualX ships a separate default model for each supported platform. The models are trained against benchmark datasets such as NDS on the platform-specific reference cluster configurations — see Benchmark Environments for the exact hardware.

QualX Output Files#

In addition to the standard qualification outputs, QualX writes the following files under xgboost_predictions/ in the per-run output directory:

File

Description

per_app.csv

Raw per-application speedup predictions.

per_sql.csv

Per-SQL speedup predictions.

features.csv

Feature values fed to the model for each prediction — useful for understanding what drove a prediction.

feature_importance.csv

SHAP-based feature importance values across the prediction dataset. Indicates which features contributed most to the predicted speedups.

shap_values/

Per-SQL SHAP values for in-depth inspection of individual predictions.

The top-level prediction.csv file in the output directory contains the human-readable per-application speedup summary.

Using a Custom Model#

If you have trained your own QualX model — for example, on workloads more representative of your production environment than the NDS-based defaults — you can supply it via the --custom_model_file argument:

spark_rapids qualification \
   --platform <platform> \
   --eventlogs <path> \
   --custom_model_file /path/to/model.json

The file must be a QualX-trained model.json. When the flag is omitted, the qualification tool uses the default model bundled with the platform.

You can also override the QualX configuration file (for example, to change feature selection or logging) using --qualx_config /path/to/qualx-conf.yaml. The default configuration is a good starting point.

Training a Custom Model#

Training your own QualX model is an advanced workflow. It requires:

  • Paired CPU and GPU event logs from the same workloads, organised by platform, dataset, and application name.

  • Dataset metadata describing each run (runType, scaleFactor).

  • A Python environment with the training extras installed (pip install 'spark-rapids-user-tools[qualx]') and SPARK_HOME / QUALX_DATA_DIR set.

The training pipeline runs the profiling and qualification tools against the training data to extract features, fits an XGBoost regressor, and emits a model.json that can be consumed by --custom_model_file as described above.

Because the training workflow depends on reproducible paired eventlogs and on internals of the tools repository, the authoritative reference lives upstream: QualX training guide.