Building NumPy and SciPy with NVPL#

Overview#

NumPy and SciPy can be built with NVPL BLAS and LAPACK backends. Starting with NVPL 25.11, pkg-config support is available in the NVPL tarball to simplify the build configuration process.

This guide provides step-by-step instructions for building custom wheels of NumPy and SciPy that use NVPL as their BLAS and LAPACK backend.

Why Build Custom Wheels with NVPL?#

NumPy and SciPy wheels available on PyPI are built with OpenBLAS and configured with a hard limit of 64 threads. On systems with more than 64 cores this limitation creates a significant performance bottleneck. Additionally NVPL BLAS and LAPACK will typically see a performance advantage versus OpenBLAS on supported systems.

Prerequisites#

Before building NumPy and SciPy with NVPL, ensure you have:

Note

uv is a fast Python package installer and resolver. To install uv, see uv installation instructions. If you prefer traditional tools, pip works equally well.

Virtual Environment Setup#

It is recommended to use a virtual environment for building NumPy and SciPy. You can use either venv with pip or uv.

Using pip and venv:

python -m venv nvpl-env
source nvpl-env/bin/activate

Using uv:

uv venv nvpl-env
source nvpl-env/bin/activate

Install Build Dependencies (Optional)#

The build process will automatically install necessary build dependencies in an isolated environment. However, if you want to pre-install them in your virtual environment, you can use your preferred package manager.

Using pip:

pip install build wheel

Using uv:

uv pip install build wheel

Environment Setup#

To enable pkg-config to find NVPL libraries, set the PKG_CONFIG_PATH environment variable to include the NVPL pkg-config directory.

Assume the NVPL distribution is installed at NVPL_ROOT. Add the following to your environment:

export NVPL_ROOT=/path/to/nvpl
export PKG_CONFIG_PATH=${NVPL_ROOT}/lib/pkgconfig:${PKG_CONFIG_PATH}

Configure pkg-config with –define-prefix#

NVPL pkg-config files require the --define-prefix flag to correctly resolve library paths. Create a pkg-config wrapper script in your working directory:

cat > pkg-config << 'EOF'
#!/bin/bash
exec $(which -a pkg-config | sed -n '2p') --define-prefix "$@"
EOF
chmod +x pkg-config

Add the current directory to your PATH so the wrapper is used:

export PATH=$(pwd):$PATH

For NumPy and SciPy builds, use the OpenMP-threaded variants:

  • nvpl-blas-lp64-omp - NVPL BLAS with LP64 ABI and OpenMP threading

  • nvpl-lapack-lp64-omp - NVPL LAPACK with LP64 ABI and OpenMP threading

Verify the pkg-config setup:

pkg-config --modversion nvpl-blas-lp64-omp
pkg-config --libs nvpl-blas-lp64-omp
# Should show: -L/path/to/nvpl/lib -lnvpl_blas_lp64_omp ...

Building NumPy with NVPL#

NumPy uses Meson as its build system. For detailed information, refer to the NumPy building from source documentation.

NumPy Step 1: Obtain Source and Setup pkg-config#

Option A: Using Release Tarball (Recommended)

Download and extract a NumPy release tarball:

wget https://github.com/numpy/numpy/releases/download/v2.1.3/numpy-2.1.3.tar.gz
tar -xzf numpy-2.1.3.tar.gz
cd numpy-2.1.3

Option B: Using Git Repository

Clone the NumPy repository and initialize submodules:

git clone https://github.com/numpy/numpy.git
cd numpy
git checkout v2.1.3  # or desired version
git submodule update --init

Note

Release tarballs are often more reliable as they include pre-configured version information and vendored dependencies.

NumPy Step 2: Configure BLAS/LAPACK Backend#

Create a meson.build.d/nvpl.ini file to specify NVPL as the BLAS/LAPACK backend:

[properties]
blas = 'nvpl-blas-lp64-omp'
lapack = 'nvpl-lapack-lp64-omp'

Alternatively, set environment variables:

export NPY_BLAS_ORDER=nvpl-blas-lp64-omp
export NPY_LAPACK_ORDER=nvpl-lapack-lp64-omp

NumPy Step 3: Build and Install#

Build and install NumPy directly using your preferred package manager.

Using pip:

pip install . -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Using uv:

uv pip install . -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Or build a wheel for distribution.

Using pip:

python -m build --wheel -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Using uv:

uv build --wheel -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

The wheel will be created in the dist/ directory and can be installed with pip install dist/<wheel-file> or uv pip install dist/<wheel-file>.

Note

The build process will automatically handle build isolation and install necessary build dependencies. If you encounter issues, you can add the -v flag for verbose output to help diagnose problems.

NumPy Step 4: Verify the Build#

After installation, verify that NumPy is using NVPL. Important: Change to a different directory before importing NumPy to avoid import errors:

cd ~  # or any directory outside the NumPy source tree
python -c "import numpy as np; np.show_config()"

Or in an interactive Python session:

import numpy as np
np.show_config()

The output should show NVPL libraries in the BLAS and LAPACK configuration.

Building SciPy with NVPL#

SciPy also uses Meson for its build system. For detailed information, refer to the SciPy building from source documentation.

SciPy Step 1: Obtain Source and Setup pkg-config#

Option A: Using Release Tarball (Recommended)

Download and extract a SciPy release tarball:

wget https://github.com/scipy/scipy/releases/download/v1.14.1/scipy-1.14.1.tar.gz
tar -xzf scipy-1.14.1.tar.gz
cd scipy-1.14.1

Option B: Using Git Repository

Clone the SciPy repository and initialize submodules:

git clone https://github.com/scipy/scipy.git
cd scipy
git checkout v1.14.1  # or desired version
git submodule update --init

Note

Release tarballs are often more reliable as they include pre-configured version information and vendored dependencies.

SciPy Step 2: Configure BLAS/LAPACK Backend#

Similar to NumPy, create a meson.build.d/nvpl.ini file:

[properties]
blas = 'nvpl-blas-lp64-omp'
lapack = 'nvpl-lapack-lp64-omp'

Or set environment variables:

export SCIPY_BLAS=nvpl-blas-lp64-omp
export SCIPY_LAPACK=nvpl-lapack-lp64-omp

SciPy Step 3: Build and Install#

Build and install SciPy directly using your preferred package manager.

Using pip:

pip install . -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Using uv:

uv pip install . -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Or build a wheel for distribution.

Using pip:

python -m build --wheel -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

Using uv:

uv build --wheel -Csetup-args=-Dblas=nvpl-blas-lp64-omp -Csetup-args=-Dlapack=nvpl-lapack-lp64-omp

The wheel will be created in the dist/ directory and can be installed with pip install dist/<wheel-file> or uv pip install dist/<wheel-file>.

Note

The build process will automatically handle build isolation and install necessary build dependencies. If you encounter issues, you can add the -v flag for verbose output to help diagnose problems.

SciPy Step 4: Verify the Build#

After installation, verify that SciPy is using NVPL. Important: Change to a different directory before importing SciPy to avoid import errors:

cd ~  # or any directory outside the SciPy source tree
python -c "import scipy; scipy.show_config()"

Or in an interactive Python session:

import scipy
scipy.show_config()

The output should show NVPL libraries in the BLAS and LAPACK configuration.

Installing Custom Wheels in Other Environments#

Once you’ve built custom NumPy and SciPy wheels with NVPL, you can install them in any Python environment (virtual environments, conda environments, etc.) without rebuilding.

Collecting Built Wheels#

After building, the wheel files are located in the dist/ directory of each project:

# Example wheel filenames (actual names will vary by version and platform)
numpy-2.1.3/dist/numpy-2.1.3-cp313-cp313-linux_aarch64.whl
scipy-1.14.1/dist/scipy-1.14.1-cp313-cp313-linux_aarch64.whl

Create a directory to store your custom wheels:

mkdir -p ~/nvpl-wheels
cp numpy-2.1.3/dist/*.whl ~/nvpl-wheels/
cp scipy-1.14.1/dist/*.whl ~/nvpl-wheels/

Installing from Local Wheels#

To install these custom wheels in a new environment, use the -f (--find-links) flag to point pip or uv to your wheel directory.

Create a new virtual environment:

Using pip and venv:

python -m venv myproject-env
source myproject-env/bin/activate

Using uv:

uv venv myproject-env
source myproject-env/bin/activate

Install NumPy and SciPy from local wheels:

Using pip:

pip install --no-index --find-links ~/nvpl-wheels numpy scipy

Using uv:

uv pip install --no-index --find-links ~/nvpl-wheels numpy scipy

The --no-index flag prevents pip/uv from checking PyPI, ensuring your custom wheels are used. The --find-links flag tells pip/uv where to find the wheel files.

Setting Up LD_LIBRARY_PATH#

When using custom wheels built with NVPL, ensure the NVPL libraries are accessible at runtime:

export LD_LIBRARY_PATH=${NVPL_ROOT}/lib:${LD_LIBRARY_PATH}

You can add this to your virtual environment’s activation script:

echo "export LD_LIBRARY_PATH=${NVPL_ROOT}/lib:\${LD_LIBRARY_PATH}" >> myproject-env/bin/activate

Or create a wrapper script:

cat > activate-with-nvpl.sh << 'EOF'
#!/bin/bash
export NVPL_ROOT=/path/to/nvpl  # Update with your actual path
export LD_LIBRARY_PATH=${NVPL_ROOT}/lib:${LD_LIBRARY_PATH}
source myproject-env/bin/activate
EOF
chmod +x activate-with-nvpl.sh

Then activate with:

source activate-with-nvpl.sh

Sharing Wheels with a Team#

To distribute your custom wheels to a team or across multiple systems:

  1. Create a shared directory (e.g., on NFS or a network share):

    mkdir -p /shared/nvpl-wheels
    cp ~/nvpl-wheels/*.whl /shared/nvpl-wheels/
    
  2. Team members can install from the shared location:

    pip install --find-links /shared/nvpl-wheels numpy scipy
    
  3. Or set up a simple HTTP server to serve wheels:

    cd ~/nvpl-wheels
    python -m http.server 8000
    

    Then install from the server:

    pip install --find-links http://yourserver:8000 numpy scipy
    

Verifying the Installation#

After installing, verify that NumPy and SciPy are using NVPL:

python -c "import numpy as np; np.show_config()"
python -c "import scipy; scipy.show_config()"

The output should show NVPL BLAS and LAPACK libraries.

Runtime Configuration#

NVPL libraries use OpenMP for threading. You can control the number of threads in two ways:

Using Environment Variables (Set Before Program Starts)#

Set OMP_NUM_THREADS before starting your Python program:

export OMP_NUM_THREADS=16  # Set to desired number of threads, defaults to using all available cores
python my_script.py

Note

The OMP_NUM_THREADS environment variable is read when OpenMP initializes and cannot be changed during runtime.

Using threadpoolctl (Dynamic Control at Runtime)#

For dynamic control during program execution, use the threadpoolctl library:

from threadpoolctl import threadpool_limits
import numpy as np

# Run with specific thread count
# Use 'openmp' for NVPL since it uses OpenMP threading
with threadpool_limits(limits=32, user_api='openmp'):
    result = np.dot(A, B)  # Uses 32 threads

# Run with different thread count
with threadpool_limits(limits=64, user_api='openmp'):
    result = np.linalg.eig(C)  # Uses 64 threads

This is the recommended approach for applications that need to adjust threading dynamically or run benchmarks across different thread counts.

Note

Use user_api='openmp' for NVPL. Other BLAS implementations may require user_api='blas' instead.

Performance Testing#

Test the performance of your NVPL-enabled NumPy installation across different thread counts.

First, install threadpoolctl to enable dynamic thread control:

pip install threadpoolctl

Or with uv:

uv pip install threadpoolctl

Then run the NumPy benchmark:

import numpy as np
import time
import multiprocessing
from threadpoolctl import threadpool_limits

nproc = multiprocessing.cpu_count()

# Matrix multiplication benchmark
n = 4096
A = np.random.rand(n, n)
B = np.random.rand(n, n)
num_runs = 3  # Number of runs to average (after warmup)

print(f"NumPy Matrix Multiplication Benchmark ({n}x{n})")
print(f"System has {nproc} processors")
print(f"Averaging {num_runs} runs after 1 warmup run")
print("-" * 50)

thread_counts = [1, *range(8, nproc, 8), nproc]

first_time = None
for num_threads in thread_counts:
    times = []
    for run in range(num_runs + 1):
        # Run benchmark with specified thread count
        # Note: Use 'openmp' for NVPL since it uses OpenMP threading
        with threadpool_limits(limits=num_threads, user_api='openmp'):
            start = time.time()
            C = np.dot(A, B)
            end = time.time()

        # Skip first run (warmup)
        if run > 0:
            times.append(end - start)

    elapsed = sum(times) / len(times)
    if first_time is None:
        first_time = elapsed
        speedup = 1.0
    else:
        speedup = first_time / elapsed

    print(f"Threads: {num_threads:3d}  Time: {elapsed:.4f} seconds  Speedup: {speedup:.2f}x")

Test SciPy linear algebra operations across different thread counts:

import scipy.linalg as la
import numpy as np
import time
import multiprocessing
from threadpoolctl import threadpool_limits

nproc = multiprocessing.cpu_count()

# Eigenvalue decomposition benchmark
n = 2048
A = np.random.rand(n, n)
A = (A + A.T) / 2  # Make symmetric
num_runs = 3  # Number of runs to average (after warmup)

print(f"SciPy Eigenvalue Decomposition Benchmark ({n}x{n})")
print(f"System has {nproc} processors")
print(f"Averaging {num_runs} runs after 1 warmup run")
print("-" * 50)

thread_counts = [1, *range(8, nproc, 8), nproc]

first_time = None
for num_threads in thread_counts:
    times = []
    for run in range(num_runs + 1):
        # Run benchmark with specified thread count
        # Note: Use 'openmp' for NVPL since it uses OpenMP threading
        with threadpool_limits(limits=num_threads, user_api='openmp'):
            start = time.time()
            eigenvalues, eigenvectors = la.eigh(A)
            end = time.time()

        # Skip first run (warmup)
        if run > 0:
            times.append(end - start)

    elapsed = sum(times) / len(times)
    if first_time is None:
        first_time = elapsed
        speedup = 1.0
    else:
        speedup = first_time / elapsed

    print(f"Threads: {num_threads:3d}  Time: {elapsed:.4f} seconds  Speedup: {speedup:.2f}x")

Note

The threadpoolctl library is used to dynamically control the number of threads for each test, as OMP_NUM_THREADS is only read at library initialization.

Important: Use user_api='openmp' with threadpoolctl for NVPL, since NVPL uses OpenMP threading. For other BLAS implementations (like MKL or OpenBLAS), you may need to use user_api='blas' instead.

On high-core-count systems (>64 cores), you should see continued scaling with NVPL, whereas OpenBLAS from PyPI wheels would plateau at 64 threads.

Troubleshooting#

Error: “Could not find the specified meson: vendored-meson/meson/meson.py”#

This error indicates that git submodules were not initialized. Run the following command in the NumPy or SciPy source directory:

git submodule update --init

Then retry the build.

Error: “Command gitversion.py failed with status 1”#

This error can occur when building from a git clone due to git repository state issues. Solutions:

  1. Use a release tarball instead (recommended):

    wget https://github.com/numpy/numpy/releases/download/v2.1.3/numpy-2.1.3.tar.gz
    tar -xzf numpy-2.1.3.tar.gz
    cd numpy-2.1.3
    
  2. Or fix the git repository:

    # Ensure you're on a proper branch, not detached HEAD
    git checkout main
    git pull
    git checkout v2.1.3
    
    # Clean any stale build artifacts
    git clean -xdf
    

Build fails to find NVPL libraries#

This is often caused by missing the --define-prefix flag for pkg-config. Ensure you have:

  1. Set PKG_CONFIG_PATH correctly:

    export PKG_CONFIG_PATH=${NVPL_ROOT}/lib/pkgconfig:${PKG_CONFIG_PATH}
    
  2. Created the local pkg-config wrapper and added it to PATH:

    cd /path/to/numpy-or-scipy-source
    cat > pkg-config << 'EOF'
    #!/bin/bash
    exec $(which -a pkg-config | sed -n '2p') --define-prefix "$@"
    EOF
    chmod +x pkg-config
    export PATH=$PWD:$PATH
    
  3. Verify the configuration returns correct library paths:

    pkg-config --libs nvpl-blas-lp64-omp
    # Should show: -L/path/to/nvpl/lib -lnvpl_blas_lp64_omp ...
    

ImportError: “you should not try to import numpy from its source directory”#

This error occurs when you try to import NumPy or SciPy from within the source directory. The solution is simple:

cd ~  # Change to any directory outside the source tree
python -c "import numpy as np; np.show_config()"

Import errors at runtime: Library not found#

If you see errors about missing NVPL libraries when importing NumPy or SciPy, make sure NVPL libraries are in the library search path:

export LD_LIBRARY_PATH=${NVPL_ROOT}/lib:${LD_LIBRARY_PATH}

Alternatively, install NVPL in a standard system location or use the NVPL Python wheel (see NVPL Python Usage).

Meson configuration errors#

If Meson cannot find the BLAS/LAPACK libraries, try cleaning the build directory:

git clean -xdf  # WARNING: removes all untracked files
# Or manually remove build directories:
rm -rf build* _build dist *.egg-info

Additional Resources#