Settings#

cuPyNumeric has a number of runtime settings that can be configured through environment variables.

`preload_cudalibs`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_PRELOAD_CUDALIBS
Default:: False

Preload and initialize handles of all CUDA libraries (cuBLAS, cuSOLVER, etc.) used in cuPyNumeric.

`warn`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_WARN
Default:: False

Turn on warnings.

`report_coverage`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_REPORT_COVERAGE
Default:: False

Print an overall percentage of cupynumeric coverage.

`report_dump_callstack`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_REPORT_DUMP_CALLSTACK
Default:: False

Print an overall percentage of cupynumeric coverage with a call stack.

`report_dump_csv`#

Type:: str
Env var:: CUPYNUMERIC_REPORT_DUMP_CSV
Default:: None

Save a coverage report to a specified CSV file.

`numpy_compat`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_NUMPY_COMPATIBILITY
Default:: False

cuPyNumeric will issue additional tasks to match numpy’s results and behavior. This is currently used in the following APIs: nanmin, nanmax, nanargmin, nanargmax

`fast_math`#

Type:: bool (“0” or “1”)
Env var:: CUPYNUMERIC_FAST_MATH
Default:: False

Enable certain optimized execution modes for floating-point math operations, that may violate strict IEEE specifications. Currently this flag enables the acceleration of single-precision cuBLAS routines using TF32 tensor cores.

This is a read-only environment variable setting used by the runtime.

`min_gpu_chunk`#

Type:: int
Env var:: CUPYNUMERIC_MIN_GPU_CHUNK
Default:: 65536 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using GPUs, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

`min_cpu_chunk`#

Type:: int
Env var:: CUPYNUMERIC_MIN_CPU_CHUNK
Default:: 1024 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using native CPU code, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

`min_omp_chunk`#

Type:: int
Env var:: CUPYNUMERIC_MIN_OMP_CHUNK
Default:: 8192 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using OpenMP, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

`force_thunk`#

Type:: str
Env var:: CUPYNUMERIC_FORCE_THUNK
Default:: None (test-mode default: ‘deferred’)

Force cuPyNumeric to always use a specific strategy for backing ndarrays: “deferred”, i.e. managed by the Legate runtime, which enables distribution and accelerated operations, but has some up-front offloading overhead, or “eager”, i.e. falling back to using a vanilla NumPy array. By default cuPyNumeric will decide this on a per-array basis, based on the size of the array and the accelerator in use.

This is a read-only environment variable setting used by the runtime.

`matmul_cache_size`#

Type:: int
Env var:: CUPYNUMERIC_MATMUL_CACHE_SIZE
Default:: 134217728 (test-mode default: 4096)

Force cuPyNumeric to keep temporary task slices during matmul computations smaller than this threshold. Whenever the temporary space needed during computation would exceed this value the task will be batched over ‘k’ to fulfill the requirement.

This is a read-only environment variable setting used by the runtime.