Using NVPL with Conda#
Overview#
NVPL is available as a BLAS/LAPACK backend in the conda-forge ecosystem through the blas-feedstock. This allows users to easily install NumPy, SciPy, and other scientific computing packages that use NVPL for optimized linear algebra operations on ARM-based systems.
Using conda with NVPL is simpler than building from source and provides pre-built packages that are ready to use.
Why Use NVPL with Conda?#
Performance Benefits#
Like custom-built wheels, conda packages with NVPL overcome the limitations of generic OpenBLAS packages:
No artificial-thread limit - NVPL scales to all available cores
Optimized for Arm-v9 - Specifically tuned for ARM Neoverse cores in NVIDIA Grace CPUs
Prerequisites#
Before setting up NVPL with conda, ensure you have:
Conda or Mamba installed - See Miniforge installation for ARM64 systems
ARM64 Linux system (aarch64 architecture)
conda-forge channel configured (included by default in Miniforge)
Creating an NVPL-Enabled Environment#
Basic Environment Setup#
Create a new conda environment with NumPy and SciPy using NVPL as the BLAS backend:
conda create -n nvpl-env python=3.13 numpy scipy "blas=*=nvpl"
This command:
Creates a new environment named
nvpl-envInstalls Python 3.13
Installs NumPy and SciPy
Specifies NVPL as the BLAS implementation using
"blas=*=nvpl"
Activate the environment:
conda activate nvpl-env
Note
You can use mamba instead of conda for faster dependency resolution. All conda commands in this document can be replaced with mamba.
Specifying NVPL with Package Versions#
For reproducible environments, you can pin specific package versions while still using NVPL:
conda create -n nvpl-env python=3.13 \
numpy=1.26.0 \
scipy=1.11.3 \
"blas=*=nvpl"
Additional packages that depend on NumPy/SciPy (like pandas, scikit-learn) will automatically use the NVPL backend.
Using Environment Files with NVPL#
To specify NVPL in an environment.yml file for reproducible setups:
name: nvpl-env
channels:
- conda-forge
dependencies:
- python=3.13
- numpy
- scipy
- blas=*=nvpl
The key line is blas=*=nvpl which ensures NVPL is used as the BLAS backend. Create the environment with:
conda env create -f environment.yml
Verifying NVPL Installation#
Check BLAS Configuration#
After activating your environment, verify that NumPy is using NVPL:
python -c "import numpy as np; np.show_config()"
You should see NVPL libraries listed in the BLAS/LAPACK configuration.
Check SciPy Configuration#
Verify SciPy is also using NVPL:
python -c "import scipy; scipy.show_config()"
List Installed BLAS Variant#
Check which BLAS implementation is installed in your environment:
conda list blas
You should see a line indicating the NVPL variant:
blas 2.120 nvpl conda-forge
Runtime Configuration#
NVPL uses OpenMP for threading. Control thread count with the OMP_NUM_THREADS environment variable:
export OMP_NUM_THREADS=32
python my_script.py
For dynamic thread control during execution, use threadpoolctl:
conda install threadpoolctl
See Building NumPy and SciPy with NVPL for detailed threading configuration and performance testing examples.
Quick Performance Verification#
Verify NVPL is providing good performance with a simple test:
import numpy as np
import time
n = 4096
A = np.random.rand(n, n)
B = np.random.rand(n, n)
start = time.time()
C = np.dot(A, B)
end = time.time()
print(f"Matrix multiplication ({n}x{n}): {end - start:.4f} seconds")
For comprehensive benchmarking and thread scaling tests, see Building NumPy and SciPy with NVPL.
Comparing BLAS Implementations#
Creating Multiple Environments#
You can create separate environments with different BLAS implementations to compare performance:
# Environment with NVPL
conda create -n test-nvpl python=3.13 numpy scipy "blas=*=nvpl"
# Environment with OpenBLAS (default)
conda create -n test-openblas python=3.13 numpy scipy "blas=*=openblas"
# Environment with MKL (if available for your platform)
conda create -n test-mkl python=3.13 numpy scipy "blas=*=mkl"
Running the Same Benchmark in Different Environments#
# Test with NVPL
conda activate test-nvpl
python benchmark.py
# Test with OpenBLAS
conda activate test-openblas
python benchmark.py
Switching BLAS Backends#
To test different BLAS implementations, switch between environments:
conda activate test-nvpl # Use NVPL backend
conda activate test-openblas # Use OpenBLAS backend
Troubleshooting#
NVPL Package Not Found#
If conda cannot find the NVPL variant, ensure you’re using conda-forge channel:
# Add conda-forge channel (if not already added)
conda config --add channels conda-forge
conda config --set channel_priority strict
# Try installation again
conda install numpy "blas=*=nvpl"
Verify you’re on an ARM64 system:
uname -m # Should output: aarch64
Conflicting BLAS Libraries#
If you have issues with conflicting BLAS libraries:
# Remove existing numpy/scipy
conda remove numpy scipy
# Reinstall with explicit BLAS specification
conda install numpy scipy "blas=*=nvpl"
Performance Not as Expected#
Check thread settings:
echo $OMP_NUM_THREADS # Should be unset or match desired thread count
Verify NVPL is actually being used:
import numpy as np np.show_config() # Check for NVPL in output
Ensure you’re running compute-intensive operations (small matrices may not show benefits)
Updating Packages While Keeping NVPL#
When updating NumPy or SciPy, explicitly specify NVPL to ensure the backend is maintained:
conda activate nvpl-env
conda update numpy scipy "blas=*=nvpl"
This ensures conda doesn’t switch to a different BLAS implementation during the update.
Additional Resources#
NVPL-Specific Resources#
Building NumPy and SciPy with NVPL - Building NumPy/SciPy from source with NVPL
NVPL Python Usage - NVPL Python wheel packages
Conda and BLAS Resources#
BLAS Feedstock - conda-forge BLAS variants including NVPL
Conda Documentation - General conda usage
Miniforge - Recommended conda distribution for ARM64