Settings#
cuPyNumeric has a number of runtime settings that can be configured through environment variables.
preload_cudalibs
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_PRELOAD_CUDALIBS
- Default:
False
Preload and initialize handles of all CUDA libraries (cuBLAS, cuSOLVER, etc.) used in cuPyNumeric.
warn
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_WARN
- Default:
False
Turn on warnings.
report_coverage
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_REPORT_COVERAGE
- Default:
False
Print an overall percentage of cupynumeric coverage.
report_dump_callstack
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_REPORT_DUMP_CALLSTACK
- Default:
False
Print an overall percentage of cupynumeric coverage with a call stack.
report_dump_csv
#
- Type:
str
- Env var:
CUPYNUMERIC_REPORT_DUMP_CSV
- Default:
None
Save a coverage report to a specified CSV file.
numpy_compat
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_NUMPY_COMPATIBILITY
- Default:
False
cuPyNumeric will issue additional tasks to match numpy’s results and behavior. This is currently used in the following APIs: nanmin, nanmax, nanargmin, nanargmax
fast_math
#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_FAST_MATH
- Default:
False
Enable certain optimized execution modes for floating-point math operations, that may violate strict IEEE specifications. Currently this flag enables the acceleration of single-precision cuBLAS routines using TF32 tensor cores.
This is a read-only environment variable setting used by the runtime.
min_gpu_chunk
#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_GPU_CHUNK
- Default:
65536 (test-mode default: 2)
Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using GPUs, as the offloading overhead would likely not be offset by the accelerated operation code.
This is a read-only environment variable setting used by the runtime.
min_cpu_chunk
#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_CPU_CHUNK
- Default:
1024 (test-mode default: 2)
Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using native CPU code, as the offloading overhead would likely not be offset by the accelerated operation code.
This is a read-only environment variable setting used by the runtime.
min_omp_chunk
#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_OMP_CHUNK
- Default:
8192 (test-mode default: 2)
Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using OpenMP, as the offloading overhead would likely not be offset by the accelerated operation code.
This is a read-only environment variable setting used by the runtime.
force_thunk
#
- Type:
str
- Env var:
CUPYNUMERIC_FORCE_THUNK
- Default:
None (test-mode default: ‘deferred’)
Force cuPyNumeric to always use a specific strategy for backing ndarrays: “deferred”, i.e. managed by the Legate runtime, which enables distribution and accelerated operations, but has some up-front offloading overhead, or “eager”, i.e. falling back to using a vanilla NumPy array. By default cuPyNumeric will decide this on a per-array basis, based on the size of the array and the accelerator in use.
This is a read-only environment variable setting used by the runtime.
matmul_cache_size
#
- Type:
int
- Env var:
CUPYNUMERIC_MATMUL_CACHE_SIZE
- Default:
134217728 (test-mode default: 4096)
Force cuPyNumeric to keep temporary task slices during matmul computations smaller than this threshold. Whenever the temporary space needed during computation would exceed this value the task will be batched over ‘k’ to fulfill the requirement.
This is a read-only environment variable setting used by the runtime.