Using NVPL with Conda#

Overview#

NVPL is available as a BLAS/LAPACK backend in the conda-forge ecosystem through the blas-feedstock. This allows users to easily install NumPy, SciPy, and other scientific computing packages that use NVPL for optimized linear algebra operations on ARM-based systems.

Using conda with NVPL is simpler than building from source and provides pre-built packages that are ready to use.

Why Use NVPL with Conda?#

Performance Benefits#

Like custom-built wheels, conda packages with NVPL overcome the limitations of generic OpenBLAS packages:

  • No artificial-thread limit - NVPL scales to all available cores

  • Optimized for Arm-v9 - Specifically tuned for ARM Neoverse cores in NVIDIA Grace CPUs

Prerequisites#

Before setting up NVPL with conda, ensure you have:

  • Conda or Mamba installed - See Miniforge installation for ARM64 systems

  • ARM64 Linux system (aarch64 architecture)

  • conda-forge channel configured (included by default in Miniforge)

Creating an NVPL-Enabled Environment#

Basic Environment Setup#

Create a new conda environment with NumPy and SciPy using NVPL as the BLAS backend:

conda create -n nvpl-env python=3.13 numpy scipy "blas=*=nvpl"

This command:

  • Creates a new environment named nvpl-env

  • Installs Python 3.13

  • Installs NumPy and SciPy

  • Specifies NVPL as the BLAS implementation using "blas=*=nvpl"

Activate the environment:

conda activate nvpl-env

Note

You can use mamba instead of conda for faster dependency resolution. All conda commands in this document can be replaced with mamba.

Specifying NVPL with Package Versions#

For reproducible environments, you can pin specific package versions while still using NVPL:

conda create -n nvpl-env python=3.13 \
    numpy=1.26.0 \
    scipy=1.11.3 \
    "blas=*=nvpl"

Additional packages that depend on NumPy/SciPy (like pandas, scikit-learn) will automatically use the NVPL backend.

Using Environment Files with NVPL#

To specify NVPL in an environment.yml file for reproducible setups:

name: nvpl-env
channels:
  - conda-forge
dependencies:
  - python=3.13
  - numpy
  - scipy
  - blas=*=nvpl

The key line is blas=*=nvpl which ensures NVPL is used as the BLAS backend. Create the environment with:

conda env create -f environment.yml

Verifying NVPL Installation#

Check BLAS Configuration#

After activating your environment, verify that NumPy is using NVPL:

python -c "import numpy as np; np.show_config()"

You should see NVPL libraries listed in the BLAS/LAPACK configuration.

Check SciPy Configuration#

Verify SciPy is also using NVPL:

python -c "import scipy; scipy.show_config()"

List Installed BLAS Variant#

Check which BLAS implementation is installed in your environment:

conda list blas

You should see a line indicating the NVPL variant:

blas                      2.120                     nvpl    conda-forge

Runtime Configuration#

NVPL uses OpenMP for threading. Control thread count with the OMP_NUM_THREADS environment variable:

export OMP_NUM_THREADS=32
python my_script.py

For dynamic thread control during execution, use threadpoolctl:

conda install threadpoolctl

See Building NumPy and SciPy with NVPL for detailed threading configuration and performance testing examples.

Quick Performance Verification#

Verify NVPL is providing good performance with a simple test:

import numpy as np
import time

n = 4096
A = np.random.rand(n, n)
B = np.random.rand(n, n)

start = time.time()
C = np.dot(A, B)
end = time.time()

print(f"Matrix multiplication ({n}x{n}): {end - start:.4f} seconds")

For comprehensive benchmarking and thread scaling tests, see Building NumPy and SciPy with NVPL.

Comparing BLAS Implementations#

Creating Multiple Environments#

You can create separate environments with different BLAS implementations to compare performance:

# Environment with NVPL
conda create -n test-nvpl python=3.13 numpy scipy "blas=*=nvpl"

# Environment with OpenBLAS (default)
conda create -n test-openblas python=3.13 numpy scipy "blas=*=openblas"

# Environment with MKL (if available for your platform)
conda create -n test-mkl python=3.13 numpy scipy "blas=*=mkl"

Running the Same Benchmark in Different Environments#

# Test with NVPL
conda activate test-nvpl
python benchmark.py

# Test with OpenBLAS
conda activate test-openblas
python benchmark.py

Switching BLAS Backends#

To test different BLAS implementations, switch between environments:

conda activate test-nvpl      # Use NVPL backend
conda activate test-openblas  # Use OpenBLAS backend

Troubleshooting#

NVPL Package Not Found#

If conda cannot find the NVPL variant, ensure you’re using conda-forge channel:

# Add conda-forge channel (if not already added)
conda config --add channels conda-forge
conda config --set channel_priority strict

# Try installation again
conda install numpy "blas=*=nvpl"

Verify you’re on an ARM64 system:

uname -m  # Should output: aarch64

Conflicting BLAS Libraries#

If you have issues with conflicting BLAS libraries:

# Remove existing numpy/scipy
conda remove numpy scipy

# Reinstall with explicit BLAS specification
conda install numpy scipy "blas=*=nvpl"

Performance Not as Expected#

  1. Check thread settings:

    echo $OMP_NUM_THREADS  # Should be unset or match desired thread count
    
  2. Verify NVPL is actually being used:

    import numpy as np
    np.show_config()  # Check for NVPL in output
    
  3. Ensure you’re running compute-intensive operations (small matrices may not show benefits)

Updating Packages While Keeping NVPL#

When updating NumPy or SciPy, explicitly specify NVPL to ensure the backend is maintained:

conda activate nvpl-env
conda update numpy scipy "blas=*=nvpl"

This ensures conda doesn’t switch to a different BLAS implementation during the update.

Additional Resources#

NVPL-Specific Resources#

Conda and BLAS Resources#