Using NVPL with Conda#

Overview#

NVPL is available as a BLAS/LAPACK backend in the conda-forge ecosystem through the blas-feedstock. This allows users to easily install NumPy, SciPy, and other scientific computing packages that use NVPL for optimized linear algebra operations on Arm-based systems.

Using conda with NVPL is simpler than building from source and provides pre-built packages that are ready to use.

Why Use NVPL with Conda?#

Performance Benefits#

Like custom-built wheels, conda packages with NVPL overcome the limitations of generic OpenBLAS packages:

  • No artificial thread limit - NVPL scales to all available cores

  • Optimized for Armv9 - Specifically tuned for Arm Neoverse cores in NVIDIA Grace CPUs

Prerequisites#

Before setting up NVPL with conda, ensure you have:

  • Conda or Mamba installed - See Miniforge installation for Arm64 systems

  • Arm64 Linux system (aarch64 architecture)

  • conda-forge channel configured (included by default in Miniforge)

Creating an NVPL-Enabled Environment#

Basic Environment Setup#

Create a new conda environment with NumPy and SciPy using NVPL as the BLAS backend:

conda create -n nvpl-env python=3.13 numpy scipy "blas=*=nvpl"

This command:

  • Creates a new environment named nvpl-env

  • Installs Python 3.13

  • Installs NumPy and SciPy

  • Specifies NVPL as the BLAS implementation using "blas=*=nvpl"

Activate the environment:

conda activate nvpl-env

Note

You can use mamba instead of conda for faster dependency resolution. All conda commands in this document can be replaced with mamba.

For reproducible environments, pin package versions as needed while keeping the "blas=*=nvpl" constraint. Additional packages that depend on NumPy or SciPy, such as pandas or scikit-learn, will automatically use the NVPL backend.

Using Environment Files with NVPL#

To specify NVPL in an environment.yml file for reproducible setups:

name: nvpl-env
channels:
  - conda-forge
dependencies:
  - python=3.13
  - numpy
  - scipy
  - blas=*=nvpl

The key line is blas=*=nvpl which ensures NVPL is used as the BLAS backend. Create the environment with:

conda env create -f environment.yml

Verifying NVPL Installation#

Check BLAS Configuration#

After activating your environment, verify that NumPy is using NVPL:

python -c "import numpy as np; np.show_config()"

You should see NVPL libraries listed in the BLAS/LAPACK configuration.

Check SciPy Configuration#

Verify SciPy is also using NVPL:

python -c "import scipy; scipy.show_config()"

List Installed BLAS Variant#

Check which BLAS implementation is installed in your environment:

conda list blas

You should see a line indicating the NVPL variant:

blas                      2.120                     nvpl    conda-forge

Runtime Configuration#

NVPL uses OpenMP for threading. Control thread count with the OMP_NUM_THREADS environment variable:

export OMP_NUM_THREADS=32
python my_script.py

For dynamic thread control during execution, use threadpoolctl:

conda install threadpoolctl

See Building NumPy and SciPy with NVPL for detailed threading configuration and performance testing examples.

Quick Performance Verification#

Verify NVPL is providing good performance with a simple test:

import numpy as np
import time

n = 4096
A = np.random.rand(n, n)
B = np.random.rand(n, n)

start = time.time()
C = np.dot(A, B)
end = time.time()

print(f"Matrix multiplication ({n}x{n}): {end - start:.4f} seconds")

For comprehensive benchmarking and thread scaling tests, see Building NumPy and SciPy with NVPL.

Comparing BLAS Implementations#

Creating Multiple Environments#

You can create separate environments with different BLAS implementations to compare performance:

# Environment with NVPL
conda create -n test-nvpl python=3.13 numpy scipy "blas=*=nvpl"

# Environment with OpenBLAS (default)
conda create -n test-openblas python=3.13 numpy scipy "blas=*=openblas"

# Environment with MKL (if available for your platform)
conda create -n test-mkl python=3.13 numpy scipy "blas=*=mkl"

Running the Same Benchmark in Different Environments#

# Test with NVPL
conda activate test-nvpl
python benchmark.py

# Test with OpenBLAS
conda activate test-openblas
python benchmark.py

Switching BLAS Backends#

To test different BLAS implementations, switch between environments:

conda activate test-nvpl      # Use NVPL backend
conda activate test-openblas  # Use OpenBLAS backend

Troubleshooting#

NVPL Package Not Found#

If conda cannot find the NVPL variant, ensure you’re using conda-forge channel:

# Add conda-forge channel (if not already added)
conda config --add channels conda-forge
conda config --set channel_priority strict

# Try installation again
conda install numpy "blas=*=nvpl"

Verify you’re on an Arm64 system:

uname -m  # Should output: aarch64

Conflicting BLAS Libraries#

If you have issues with conflicting BLAS libraries:

# Remove existing numpy/scipy
conda remove numpy scipy

# Reinstall with explicit BLAS specification
conda install numpy scipy "blas=*=nvpl"

Performance Not as Expected#

  1. Check thread settings:

    echo $OMP_NUM_THREADS  # Should be unset or match desired thread count
    
  2. Verify NVPL is actually being used:

    import numpy as np
    np.show_config()  # Check for NVPL in output
    
  3. Ensure you’re running compute-intensive operations (small matrices may not show benefits)

Updating Packages While Keeping NVPL#

When updating NumPy or SciPy, explicitly specify NVPL to ensure the backend is maintained:

conda activate nvpl-env
conda update numpy scipy "blas=*=nvpl"

This ensures conda doesn’t switch to a different BLAS implementation during the update.

Additional Resources#

NVPL-Specific Resources#

Conda and BLAS Resources#