Environment Variables Reference#

This comprehensive reference covers all environment variables used by NeMo Curator for runtime configuration, performance optimization, and system integration. Environment variables provide the highest precedence in the configuration hierarchy.

Tip

Applying Environment Variables: These variables are used throughout NeMo Curator deployments:


Core NeMo Curator Variables#

Device and Processing Configuration#

Table 19 Device Configuration Variables#

Variable

Default

Description

DEVICE

“cpu”

Processing device: “cpu” or “gpu”

INTERFACE

“eth0”

Network interface for Dask communication

PROTOCOL

“tcp”

Network protocol: “tcp” or “ucx”

CPU_WORKER_MEMORY_LIMIT

“0”

Memory limit per CPU worker (“0” = no limit)

Example Usage:

# GPU processing with UCX protocol
export DEVICE="gpu"
export PROTOCOL="ucx"
export INTERFACE="ib0"  # InfiniBand interface
export CPU_WORKER_MEMORY_LIMIT="8GB"

Logging and Profiling#

Table 20 Logging Configuration Variables#

Variable

Default

Description

LOGDIR

“./logs”

Directory for log files

PROFILESDIR

“./profiles”

Directory for performance profiles

SCHEDULER_FILE

auto-generated

Path to Dask scheduler connection file

SCHEDULER_LOG

auto-generated

Path to scheduler log file

DONE_MARKER

auto-generated

Path to job completion marker file

Example Usage:

# Custom logging configuration
export LOGDIR="/shared/logs/nemo_curator"
export PROFILESDIR="/shared/profiles"
export SCHEDULER_LOG="/shared/logs/scheduler.log"

RAPIDS and GPU Configuration#

Memory Management (RMM)#

Table 21 RMM Memory Pool Variables#

Variable

Default

Description

RMM_WORKER_POOL_SIZE

“72GiB”

GPU memory pool size per worker

RMM_SCHEDULER_POOL_SIZE

“1GB”

GPU memory pool size for scheduler

RMM_ALLOCATOR

“pool”

Memory allocator: “pool”, “arena”, “binning”

RMM_POOL_INIT_SIZE

“256MB”

Initial pool size

RMM_MAXIMUM_POOL_SIZE

auto-detect

Maximum pool size

Memory Sizing Guidelines:

# For 80GB GPU (A100/H100)
export RMM_WORKER_POOL_SIZE="72GiB"  # 90% of GPU memory

# For 40GB GPU (A100)
export RMM_WORKER_POOL_SIZE="36GiB"  # 90% of GPU memory

# For 16GB GPU (V100)
export RMM_WORKER_POOL_SIZE="14GiB"  # 87.5% of GPU memory

# Percentage-based allocation (alternative)
export RMM_WORKER_POOL_SIZE="0.9"  # 90% of available memory

RAPIDS Initialization#

Table 22 RAPIDS Initialization Variables#

Variable

Default

Description

RAPIDS_NO_INITIALIZE

“1”

Delay CUDA context creation: “0” or “1”

CUDF_SPILL

“1”

Enable automatic GPU memory spilling: “0” or “1”

CUDF_SPILL_DEVICE_LIMIT

“0.8”

Spill threshold (fraction of GPU memory)

LIBCUDF_CUFILE_POLICY

“OFF”

GPUDirect Storage policy: “OFF”, “ON”, “GDS”

Configuration Examples:

# High-performance setup (sufficient GPU memory)
export RAPIDS_NO_INITIALIZE="0"  # Initialize immediately
export CUDF_SPILL="0"            # Disable spilling
export LIBCUDF_CUFILE_POLICY="ON"  # Enable direct storage access

# Memory-constrained setup
export RAPIDS_NO_INITIALIZE="1"  # Delay initialization
export CUDF_SPILL="1"            # Enable spilling
export CUDF_SPILL_DEVICE_LIMIT="0.7"  # Spill at 70% capacity

Dask Configuration#

Distributed Computing#

Table 23 Dask Distributed Variables#

Variable

Default

Description

DASK_DISTRIBUTED__COMM__TIMEOUTS__CONNECT

“10s”

Connection timeout

DASK_DISTRIBUTED__COMM__TIMEOUTS__TCP

“30s”

TCP timeout

DASK_DISTRIBUTED__WORKER__DAEMON

“True”

Run workers as daemons

DASK_DISTRIBUTED__WORKER__MEMORY__TARGET

“0.6”

Target memory usage fraction

DASK_DISTRIBUTED__WORKER__MEMORY__SPILL

“0.7”

Spill to disk threshold

DASK_DISTRIBUTED__WORKER__MEMORY__PAUSE

“0.8”

Pause computation threshold

DASK_DISTRIBUTED__WORKER__MEMORY__TERMINATE

“0.95”

Terminate worker threshold

Performance Profiling#

Table 24 Dask Profiling Variables#

Variable

Default

Description

DASK_DISTRIBUTED__WORKER__PROFILE__ENABLED

“False”

Enable worker profiling

DASK_DISTRIBUTED__WORKER__PROFILE__INTERVAL

“10ms”

Profiling sample interval

DASK_DISTRIBUTED__WORKER__PROFILE__CYCLE

“1000ms”

Profiling cycle duration

DataFrame Configuration#

Table 25 Dask DataFrame Variables#

Variable

Default

Description

DASK_DATAFRAME__CONVERT_STRING

“False”

Convert strings to categorical

DASK_DATAFRAME__QUERY_PLANNING

“False”

Enable query planning optimization

DASK_DATAFRAME__PARQUET__MINIMUM_PARTITION_SIZE

“128MB”

Minimum partition size for Parquet

DASK_DATAFRAME__PARQUET__MAXIMUM_PARTITION_SIZE

“256MB”

Maximum partition size for Parquet

DASK_DATAFRAME__PARQUET__COMPRESSION

“snappy”

Compression algorithm for Parquet

Optimized DataFrame Settings:

# High-performance I/O
export DASK_DATAFRAME__PARQUET__MINIMUM_PARTITION_SIZE="256MB"
export DASK_DATAFRAME__PARQUET__MAXIMUM_PARTITION_SIZE="1GB"
export DASK_DATAFRAME__PARQUET__COMPRESSION="lz4"
export DASK_DATAFRAME__CONVERT_STRING="False"

Network and Communication#

UCX Configuration#

Table 26 UCX Communication Variables#

Variable

Default

Description

UCX_TLS

auto-detect

Transport layers: “rc,cuda_copy,cuda_ipc”

UCX_NET_DEVICES

auto-detect

Network devices to use

UCX_MEMTYPE_CACHE

“y”

Enable memory type cache

UCX_RNDV_SCHEME

“put_zcopy”

Rendezvous protocol scheme

UCX_IB_GPU_DIRECT_RDMA

“yes”

Enable GPU Direct RDMA

InfiniBand Optimization:

# Optimized UCX for InfiniBand + GPU
export UCX_TLS="rc,cuda_copy,cuda_ipc"
export UCX_NET_DEVICES="mlx5_0:1"  # Specific InfiniBand device
export UCX_IB_GPU_DIRECT_RDMA="yes"
export UCX_MEMTYPE_CACHE="y"

TCP Configuration#

Table 27 TCP Network Variables#

Variable

Default

Description

DASK_UCX__CUDA_COPY

“True”

Enable CUDA memory copy

DASK_UCX__TCP

“True”

Enable TCP transport

DASK_UCX__NVLINK

“True”

Enable NVLink transport

DASK_UCX__INFINIBAND

“True”

Enable InfiniBand transport

DASK_UCX__RDMACM

“True”

Enable RDMA CM


Storage and I/O#

Cloud Storage Optimization#

AWS S3 Variables#

Table 28 AWS S3 Configuration Variables#

Variable

Default

Description

AWS_ACCESS_KEY_ID

none

AWS access key identifier

AWS_SECRET_ACCESS_KEY

none

AWS secret access key

AWS_DEFAULT_REGION

none

Default AWS region

AWS_PROFILE

“default”

AWS profile to use

AWS_MAX_ATTEMPTS

“5”

Maximum retry attempts

AWS_RETRY_MODE

“legacy”

Retry mode: “legacy”, “standard”, “adaptive”

AWS_S3_USE_ACCELERATE_ENDPOINT

“false”

Use S3 Transfer Acceleration

AWS_S3_ADDRESSING_STYLE

“auto”

S3 addressing style: “auto”, “virtual”, “path”

Azure Storage Variables#

Table 29 Azure Storage Configuration Variables#

Variable

Default

Description

AZURE_STORAGE_CONNECTION_STRING

none

Azure storage connection string

AZURE_STORAGE_ACCOUNT_NAME

none

Azure storage account name

AZURE_STORAGE_ACCOUNT_KEY

none

Azure storage account key

AZURE_STORAGE_SAS_TOKEN

none

Azure SAS token

Local I/O Optimization#

Table 30 Local I/O Variables#

Variable

Default

Description

OMP_NUM_THREADS

CPU count

OpenMP thread count

MKL_NUM_THREADS

CPU count

Intel MKL thread count

NUMBA_NUM_THREADS

CPU count

Numba thread count

TMPDIR

“/tmp”

Temporary directory

PYTHONPATH

system default

Python module search path

Thread Optimization:

# Prevent oversubscription in distributed environments
export OMP_NUM_THREADS="1"
export MKL_NUM_THREADS="1"
export NUMBA_NUM_THREADS="1"

# Use fast storage for temporary files
export TMPDIR="/fast/ssd/tmp"

API and Service Configuration#

Machine Learning Services#

Table 31 ML Service API Variables#

Variable

Default

Description

HUGGINGFACE_HUB_TOKEN

none

HuggingFace Hub API token

HF_HOME

“~/.cache/huggingface”

HuggingFace cache directory

OPENAI_API_KEY

none

OpenAI API key

OPENAI_ORG

none

OpenAI organization ID

NVIDIA_API_KEY

none

NVIDIA AI foundation models API key

NVIDIA_BASE_URL

“https://integrate.api.nvidia.com/v1”

NVIDIA API base URL

Model Caching#

Table 32 Model Cache Variables#

Variable

Default

Description

TRANSFORMERS_CACHE

“~/.cache/huggingface/transformers”

Transformers model cache

HF_DATASETS_CACHE

“~/.cache/huggingface/datasets”

HuggingFace datasets cache

TORCH_HOME

“~/.cache/torch”

PyTorch model cache

SENTENCE_TRANSFORMERS_HOME

“~/.cache/torch/sentence_transformers”

Sentence transformers cache


CUDA and GPU Runtime#

CUDA Configuration#

Table 33 CUDA Runtime Variables#

Variable

Default

Description

CUDA_VISIBLE_DEVICES

all

Visible GPU devices (e.g., “0,1,2,3”)

CUDA_LAUNCH_BLOCKING

“0”

Synchronous CUDA launches: “0” or “1”

CUDA_CACHE_PATH

auto

CUDA kernel cache path

CUDA_FORCE_PTX_JIT

“0”

Force PTX JIT compilation

CUDA_MODULE_LOADING

“LAZY”

Module loading strategy: “LAZY” or “EAGER”

GPU Memory Management#

Table 34 GPU Memory Variables#

Variable

Default

Description

PYTORCH_CUDA_ALLOC_CONF

none

PyTorch CUDA allocator configuration

CUDA_MPS_PIPE_DIRECTORY

none

Multi-Process Service pipe directory

CUDA_MPS_LOG_DIRECTORY

none

Multi-Process Service log directory

Memory Optimization Examples:

# PyTorch memory optimization
export PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:512"

# Enable CUDA MPS for multi-process GPU sharing
export CUDA_MPS_PIPE_DIRECTORY="/tmp/nvidia-mps"
export CUDA_MPS_LOG_DIRECTORY="/tmp/nvidia-log"

Environment Variable Profiles#

Development Profile#

# Development environment variables
export DEVICE="cpu"
export PROTOCOL="tcp"
export INTERFACE="eth0"
export CPU_WORKER_MEMORY_LIMIT="4GB"
export LOGDIR="./dev_logs"
export PROFILESDIR="./dev_profiles"
export DASK_DISTRIBUTED__WORKER__PROFILE__ENABLED="True"
export OMP_NUM_THREADS="2"
export MKL_NUM_THREADS="2"

Production CPU Profile#

# Production CPU environment
export DEVICE="cpu"
export PROTOCOL="tcp"
export INTERFACE="eth0"
export CPU_WORKER_MEMORY_LIMIT="0"  # No limit
export LOGDIR="/shared/logs"
export PROFILESDIR="/shared/profiles"
export OMP_NUM_THREADS="1"
export MKL_NUM_THREADS="1"
export DASK_DATAFRAME__PARQUET__MINIMUM_PARTITION_SIZE="256MB"
export AWS_MAX_ATTEMPTS="10"
export AWS_RETRY_MODE="adaptive"

Production GPU Profile#

# Production GPU environment
export DEVICE="gpu"
export PROTOCOL="ucx"
export INTERFACE="ib0"
export RAPIDS_NO_INITIALIZE="0"
export CUDF_SPILL="0"
export RMM_WORKER_POOL_SIZE="72GiB"
export RMM_SCHEDULER_POOL_SIZE="1GB"
export LIBCUDF_CUFILE_POLICY="ON"
export UCX_TLS="rc,cuda_copy,cuda_ipc"
export UCX_IB_GPU_DIRECT_RDMA="yes"
export LOGDIR="/shared/logs"
export PROFILESDIR="/shared/profiles"

Memory-Constrained Profile#

# Memory-constrained environment
export DEVICE="gpu"
export PROTOCOL="tcp"
export RAPIDS_NO_INITIALIZE="1"
export CUDF_SPILL="1"
export CUDF_SPILL_DEVICE_LIMIT="0.7"
export RMM_WORKER_POOL_SIZE="12GB"  # Smaller pool
export CPU_WORKER_MEMORY_LIMIT="8GB"
export DASK_DISTRIBUTED__WORKER__MEMORY__TARGET="0.5"
export DASK_DISTRIBUTED__WORKER__MEMORY__SPILL="0.6"

Environment Variable Management#

Loading Environment Variables#

From File#

# Load from environment file
set -a  # Automatically export variables
source /path/to/nemo-curator.env
set +a  # Stop auto-export

# Or use explicit loading
export $(cat /path/to/nemo-curator.env | xargs)

Systemd Service#

# /etc/systemd/system/nemo-curator.service
[Unit]
Description=NeMo Curator Service
After=network.target

[Service]
Type=exec
User=curator
Group=curator
EnvironmentFile=/etc/nemo-curator/environment
ExecStart=/usr/local/bin/nemo-curator-script
Restart=on-failure

[Install]
WantedBy=multi-user.target

Docker Environment#

# Dockerfile
FROM nvcr.io/nvidia/nemo:latest

# Set environment variables
ENV DEVICE=gpu
ENV PROTOCOL=ucx
ENV RMM_WORKER_POOL_SIZE=72GiB
ENV CUDF_SPILL=0

# Or load from file
COPY nemo-curator.env /etc/environment
RUN set -a && source /etc/environment && set +a

Validation Script#

#!/usr/bin/env python3
"""Validate NeMo Curator environment variables."""

import os
import sys

def validate_environment():
    """Validate environment variable configuration."""
    
    required_vars = {
        "DEVICE": ["cpu", "gpu"],
        "PROTOCOL": ["tcp", "ucx"],
    }
    
    recommended_vars = {
        "LOGDIR": str,
        "RMM_WORKER_POOL_SIZE": str,
        "CUDF_SPILL": ["0", "1"],
    }
    
    issues = []
    
    # Check required variables
    for var, valid_values in required_vars.items():
        value = os.getenv(var)
        if not value:
            issues.append(f"Missing required variable: {var}")
        elif valid_values and value not in valid_values:
            issues.append(f"Invalid value for {var}: {value} (valid: {valid_values})")
    
    # Check recommended variables
    for var, expected_type in recommended_vars.items():
        value = os.getenv(var)
        if not value:
            print(f"⚠ Recommended variable not set: {var}")
        elif expected_type == str:
            print(f"✓ {var} = {value}")
        elif isinstance(expected_type, list) and value not in expected_type:
            issues.append(f"Invalid value for {var}: {value} (valid: {expected_type})")
    
    # GPU-specific validation
    if os.getenv("DEVICE") == "gpu":
        gpu_vars = ["RMM_WORKER_POOL_SIZE", "CUDF_SPILL"]
        for var in gpu_vars:
            if not os.getenv(var):
                issues.append(f"GPU mode requires {var} to be set")
    
    # Report results
    if issues:
        print("❌ Environment validation failed:")
        for issue in issues:
            print(f"  - {issue}")
        return False
    else:
        print("✅ Environment validation passed")
        return True

if __name__ == "__main__":
    success = validate_environment()
    sys.exit(0 if success else 1)