For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Welcome
    • Overview
    • Release Notes
  • Installation and Configuration
    • System Requirements
    • Installation
    • Services and Utilities
    • Executor Resource Manager
    • Configuration Parameters
    • Security
    • Distributed Configuration
  • Loading and Exporting Data
    • Supported Data Sources
    • Command Line
  • SQL
    • Data Definition (DDL)
    • Data Manipulation (DML)
  • HeavyImmerse
    • Introduction to HeavyImmerse
    • Admin Portal
    • Control Panel
    • Working with Dashboards
    • Measures and Dimensions
    • Using Parameters
    • Using Filters
    • Using Cross-link
    • Color Consistency
    • Chart Animation
    • Multilayer Charts
    • SQL Editor
    • Customization
    • Joins (Beta)
    • Chart Types
    • Deprecated Charts
    • HeavyIQ SQL Notebook
  • HeavyIQ Conversational Analytics
    • HeavyIQ Overview
    • HeavyIQ Model Overview (HeavyLM)
  • HeavyRF
    • Introduction to HeavyRF
    • Getting Started
    • HeavyRF Table Functions
  • HeavyConnect
    • HeavyConnect Release Overview
    • Getting Started
    • Best Practices
    • Examples
    • Command Reference
    • Parquet Data Wrapper Reference
    • ODBC Data Wrapper Reference
    • Raster Data Wrapper Reference
  • HeavyML (BETA)
    • HeavyML Overview
    • Clustering Algorithms
    • Regression Algorithms
    • Principal Components Analysis
  • Python / Data Science
    • Data Science Foundation
    • JupyterLab Installation and Configuration
    • Using HEAVY.AI with JupyterLab
    • Python User-Defined Functions (UDFs) with RBC
    • Ibis
    • Interactive Data Exploration with Altair
    • Additional Examples
  • APIs and Interfaces
    • heavysql
    • Thrift
    • JDBC
    • ODBC
    • Vega
    • RJDBC
    • SQuirreL SQL
    • heavyai-connector
  • Tutorials and Demos
    • Loading Data
    • Using Heavy Immerse
    • Hello World
    • Creating a Kafka Streaming Application
    • Getting Started with Open Source
    • Try Vega
  • Troubleshooting and Special Topics
    • Vulkan Renderer
    • Optimizing
    • Known Issues and Limitations
    • Logs and Monitoring
    • Archived Release Notes
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Summary
  • Why use HeavyML?
  • Capabilities Overview
HeavyML (BETA)

HeavyML Overview

||View as Markdown|
Previous

Raster Data Wrapper Reference

Next

Clustering Algorithms

Summary

HeavyML is a framework (currently in beta) that allows users to leverage HEAVY.AI’s lightning fast SQL to power machine learning workflows.

Why use HeavyML?

The line between data analytics and data science and machine learning has become increasingly blurred. Use cases such as predictive analytics, anomaly detection and explanation, classification, and AI-assisted data cleansing are increasingly emerging as mainstream analytics use cases.

However for many users trained in traditional analytics approaches, for example SQL or visualization, it can be difficult to leverage machine learning techniques to drive more advanced use cases. Furthermore, even with the fastest of data platforms like HEAVY.AI, extracting massive amounts of data from a database into a Python notebook to conduct CPU-machine learning base approaches can be prohibitively slow, or worse still run out of memory or encounter other operational snags.

HeavyML takes a new approach to these issues by allowing users to leverage intuitive new native SQL and visualization capabilities in the HEAVY.AI platforms to perform machine learning and predictive analytics operations directly in-database. This provides several advantages to end users:

  1. Users can tap into their existing SQL knowledge to orchestrate formerly complex machine learning workflows.
  2. Pre-ML ELT (Extract-Transform-Load) and data cleansing is a breeze, as data can be filtered, grouped and manipulated directly in the same SQL query that launches the ML training or inference operations.
  3. Significant performance gains are achieved by keeping the relevant data in-database, avoiding the overheads of transferring and marshaling the data to other processes and formats for ML processing. In addition, HeavyML takes full advantage of the massive CPU and GPU parallelism the HEAVY.AI platform is known for, leading to orders-of-magnitude speedups for some operations.

Capabilities Overview

Note that HeavyML, as a beta capability, is currently being rapidly iterated on and as such, features will continue to be added at a fast pace for the foreseeable future.

  • Clustering Support
    • Two clustering models are currently supported: KMeans and DBScan. Clustering is currently performed by calling the associated table functions: kmeans and dbscan.
  • Regression Support
    • Four regression models are currently supported: linear regression, random forest regression, Gradient Boosting Tree (GBT) regression, and decision tree regression.
    • Both categorical text and continuous numeric input features (predictors) are supported. Categorical features are automatically one-hot encoded.
    • Models can be created via a new CREATE MODEL command, and inference can be performed row-wise with a new ML_PREDICT method.
    • Inference using ML_PREDICT will run on GPU if available, while model creation/training is currently executed multi-threaded on CPU. (GPU model training may be supported in the future).
    • Convenience methods are defined to extract the linear regression coefficients for linear regression models, the variable importance scores for random forest models, and the R2 score all regression models.