Settings#

cuNumeric has a number of runtime settings that can be configured through environment variables.

preload_cudalibs#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_PRELOAD_CUDALIBS

Default:

False

Preload and initialize handles of all CUDA libraries (cuBLAS, cuSOLVER, etc.) used in cuNumeric.

warn#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_WARN

Default:

False

Turn on warnings.

report_coverage#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_REPORT_COVERAGE

Default:

False

Print an overall percentage of cunumeric coverage.

report_dump_callstack#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_REPORT_DUMP_CALLSTACK

Default:

False

Print an overall percentage of cunumeric coverage with call stack info.

report_dump_csv#

Type:

str

Env var:

CUNUMERIC_REPORT_DUMP_CSV

Default:

None

Save a coverage report to a specified CSV file.

numpy_compat#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_NUMPY_COMPATIBILITY

Default:

False

cuNumeric will issue additional tasks to match numpy’s results and behavior. This is currently used in the following APIs: nanmin, nanmax, nanargmin, nanargmax

fast_math#

Type:

bool (“0” or “1”)

Env var:

CUNUMERIC_FAST_MATH

Default:

False

Enable certain optimized execution modes for floating-point math operations, that may violate strict IEEE specifications. Currently this flag enables the acceleration of single-precision cuBLAS routines using TF32 tensor cores.

This is a read-only environment variable setting used by the runtime.

min_gpu_chunk#

Type:

int

Env var:

CUNUMERIC_MIN_GPU_CHUNK

Default:

65536 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using GPUs, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

min_cpu_chunk#

Type:

int

Env var:

CUNUMERIC_MIN_CPU_CHUNK

Default:

1024 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using native CPU code, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

min_omp_chunk#

Type:

int

Env var:

CUNUMERIC_MIN_OMP_CHUNK

Default:

8192 (test-mode default: 2)

Legate will fall back to vanilla NumPy when handling arrays smaller than this, rather than attempt to accelerate using OpenMP, as the offloading overhead would likely not be offset by the accelerated operation code.

This is a read-only environment variable setting used by the runtime.

force_thunk#

Type:

str

Env var:

CUNUMERIC_FORCE_THUNK

Default:

None (test-mode default: ‘deferred’)

Force cuNumeric to always use a specific strategy for backing ndarrays: “deferred”, i.e. managed by the Legate runtime, which enables distribution and accelerated operations, but has some up-front offloading overhead, or “eager”, i.e. falling back to using a vanilla NumPy array. By default cuNumeric will decide this on a per-array basis, based on the size of the array and the accelerator in use.

This is a read-only environment variable setting used by the runtime.