Environment Variables Reference#
This comprehensive reference covers all environment variables used by NeMo Curator for runtime configuration, performance optimization, and system integration. Environment variables provide the highest precedence in the configuration hierarchy.
Tip
Applying Environment Variables: These variables are used throughout NeMo Curator deployments:
Deployment Environment Configuration: Environment-specific variable patterns
Kubernetes Deployment: Setting variables in Kubernetes ConfigMaps
Slurm Deployment: Using variables in Slurm job scripts
Core NeMo Curator Variables#
Device and Processing Configuration#
Variable |
Default |
Description |
---|---|---|
|
“cpu” |
Processing device: “cpu” or “gpu” |
|
“eth0” |
Network interface for Dask communication |
|
“tcp” |
Network protocol: “tcp” or “ucx” |
|
“0” |
Memory limit per CPU worker (“0” = no limit) |
Example Usage:
# GPU processing with UCX protocol
export DEVICE="gpu"
export PROTOCOL="ucx"
export INTERFACE="ib0" # InfiniBand interface
export CPU_WORKER_MEMORY_LIMIT="8GB"
Logging and Profiling#
Variable |
Default |
Description |
---|---|---|
|
“./logs” |
Directory for log files |
|
“./profiles” |
Directory for performance profiles |
|
auto-generated |
Path to Dask scheduler connection file |
|
auto-generated |
Path to scheduler log file |
|
auto-generated |
Path to job completion marker file |
Example Usage:
# Custom logging configuration
export LOGDIR="/shared/logs/nemo_curator"
export PROFILESDIR="/shared/profiles"
export SCHEDULER_LOG="/shared/logs/scheduler.log"
RAPIDS and GPU Configuration#
Memory Management (RMM)#
Variable |
Default |
Description |
---|---|---|
|
“72GiB” |
GPU memory pool size per worker |
|
“1GB” |
GPU memory pool size for scheduler |
|
“pool” |
Memory allocator: “pool”, “arena”, “binning” |
|
“256MB” |
Initial pool size |
|
auto-detect |
Maximum pool size |
Memory Sizing Guidelines:
# For 80GB GPU (A100/H100)
export RMM_WORKER_POOL_SIZE="72GiB" # 90% of GPU memory
# For 40GB GPU (A100)
export RMM_WORKER_POOL_SIZE="36GiB" # 90% of GPU memory
# For 16GB GPU (V100)
export RMM_WORKER_POOL_SIZE="14GiB" # 87.5% of GPU memory
# Percentage-based allocation (alternative)
export RMM_WORKER_POOL_SIZE="0.9" # 90% of available memory
RAPIDS Initialization#
Variable |
Default |
Description |
---|---|---|
|
“1” |
Delay CUDA context creation: “0” or “1” |
|
“1” |
Enable automatic GPU memory spilling: “0” or “1” |
|
“0.8” |
Spill threshold (fraction of GPU memory) |
|
“OFF” |
GPUDirect Storage policy: “OFF”, “ON”, “GDS” |
Configuration Examples:
# High-performance setup (sufficient GPU memory)
export RAPIDS_NO_INITIALIZE="0" # Initialize immediately
export CUDF_SPILL="0" # Disable spilling
export LIBCUDF_CUFILE_POLICY="ON" # Enable direct storage access
# Memory-constrained setup
export RAPIDS_NO_INITIALIZE="1" # Delay initialization
export CUDF_SPILL="1" # Enable spilling
export CUDF_SPILL_DEVICE_LIMIT="0.7" # Spill at 70% capacity
Dask Configuration#
Distributed Computing#
Variable |
Default |
Description |
---|---|---|
|
“10s” |
Connection timeout |
|
“30s” |
TCP timeout |
|
“True” |
Run workers as daemons |
|
“0.6” |
Target memory usage fraction |
|
“0.7” |
Spill to disk threshold |
|
“0.8” |
Pause computation threshold |
|
“0.95” |
Terminate worker threshold |
Performance Profiling#
Variable |
Default |
Description |
---|---|---|
|
“False” |
Enable worker profiling |
|
“10ms” |
Profiling sample interval |
|
“1000ms” |
Profiling cycle duration |
DataFrame Configuration#
Variable |
Default |
Description |
---|---|---|
|
“False” |
Convert strings to categorical |
|
“False” |
Enable query planning optimization |
|
“128MB” |
Minimum partition size for Parquet |
|
“256MB” |
Maximum partition size for Parquet |
|
“snappy” |
Compression algorithm for Parquet |
Optimized DataFrame Settings:
# High-performance I/O
export DASK_DATAFRAME__PARQUET__MINIMUM_PARTITION_SIZE="256MB"
export DASK_DATAFRAME__PARQUET__MAXIMUM_PARTITION_SIZE="1GB"
export DASK_DATAFRAME__PARQUET__COMPRESSION="lz4"
export DASK_DATAFRAME__CONVERT_STRING="False"
Network and Communication#
UCX Configuration#
Variable |
Default |
Description |
---|---|---|
|
auto-detect |
Transport layers: “rc,cuda_copy,cuda_ipc” |
|
auto-detect |
Network devices to use |
|
“y” |
Enable memory type cache |
|
“put_zcopy” |
Rendezvous protocol scheme |
|
“yes” |
Enable GPU Direct RDMA |
InfiniBand Optimization:
# Optimized UCX for InfiniBand + GPU
export UCX_TLS="rc,cuda_copy,cuda_ipc"
export UCX_NET_DEVICES="mlx5_0:1" # Specific InfiniBand device
export UCX_IB_GPU_DIRECT_RDMA="yes"
export UCX_MEMTYPE_CACHE="y"
TCP Configuration#
Variable |
Default |
Description |
---|---|---|
|
“True” |
Enable CUDA memory copy |
|
“True” |
Enable TCP transport |
|
“True” |
Enable NVLink transport |
|
“True” |
Enable InfiniBand transport |
|
“True” |
Enable RDMA CM |
Storage and I/O#
Cloud Storage Optimization#
AWS S3 Variables#
Variable |
Default |
Description |
---|---|---|
|
none |
AWS access key identifier |
|
none |
AWS secret access key |
|
none |
Default AWS region |
|
“default” |
AWS profile to use |
|
“5” |
Maximum retry attempts |
|
“legacy” |
Retry mode: “legacy”, “standard”, “adaptive” |
|
“false” |
Use S3 Transfer Acceleration |
|
“auto” |
S3 addressing style: “auto”, “virtual”, “path” |
Azure Storage Variables#
Variable |
Default |
Description |
---|---|---|
|
none |
Azure storage connection string |
|
none |
Azure storage account name |
|
none |
Azure storage account key |
|
none |
Azure SAS token |
Local I/O Optimization#
Variable |
Default |
Description |
---|---|---|
|
CPU count |
OpenMP thread count |
|
CPU count |
Intel MKL thread count |
|
CPU count |
Numba thread count |
|
“/tmp” |
Temporary directory |
|
system default |
Python module search path |
Thread Optimization:
# Prevent oversubscription in distributed environments
export OMP_NUM_THREADS="1"
export MKL_NUM_THREADS="1"
export NUMBA_NUM_THREADS="1"
# Use fast storage for temporary files
export TMPDIR="/fast/ssd/tmp"
API and Service Configuration#
Machine Learning Services#
Variable |
Default |
Description |
---|---|---|
|
none |
HuggingFace Hub API token |
|
“~/.cache/huggingface” |
HuggingFace cache directory |
|
none |
OpenAI API key |
|
none |
OpenAI organization ID |
|
none |
NVIDIA AI foundation models API key |
|
“https://integrate.api.nvidia.com/v1” |
NVIDIA API base URL |
Model Caching#
Variable |
Default |
Description |
---|---|---|
|
“~/.cache/huggingface/transformers” |
Transformers model cache |
|
“~/.cache/huggingface/datasets” |
HuggingFace datasets cache |
|
“~/.cache/torch” |
PyTorch model cache |
|
“~/.cache/torch/sentence_transformers” |
Sentence transformers cache |
CUDA and GPU Runtime#
CUDA Configuration#
Variable |
Default |
Description |
---|---|---|
|
all |
Visible GPU devices (e.g., “0,1,2,3”) |
|
“0” |
Synchronous CUDA launches: “0” or “1” |
|
auto |
CUDA kernel cache path |
|
“0” |
Force PTX JIT compilation |
|
“LAZY” |
Module loading strategy: “LAZY” or “EAGER” |
GPU Memory Management#
Variable |
Default |
Description |
---|---|---|
|
none |
PyTorch CUDA allocator configuration |
|
none |
Multi-Process Service pipe directory |
|
none |
Multi-Process Service log directory |
Memory Optimization Examples:
# PyTorch memory optimization
export PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:512"
# Enable CUDA MPS for multi-process GPU sharing
export CUDA_MPS_PIPE_DIRECTORY="/tmp/nvidia-mps"
export CUDA_MPS_LOG_DIRECTORY="/tmp/nvidia-log"
Environment Variable Profiles#
Development Profile#
# Development environment variables
export DEVICE="cpu"
export PROTOCOL="tcp"
export INTERFACE="eth0"
export CPU_WORKER_MEMORY_LIMIT="4GB"
export LOGDIR="./dev_logs"
export PROFILESDIR="./dev_profiles"
export DASK_DISTRIBUTED__WORKER__PROFILE__ENABLED="True"
export OMP_NUM_THREADS="2"
export MKL_NUM_THREADS="2"
Production CPU Profile#
# Production CPU environment
export DEVICE="cpu"
export PROTOCOL="tcp"
export INTERFACE="eth0"
export CPU_WORKER_MEMORY_LIMIT="0" # No limit
export LOGDIR="/shared/logs"
export PROFILESDIR="/shared/profiles"
export OMP_NUM_THREADS="1"
export MKL_NUM_THREADS="1"
export DASK_DATAFRAME__PARQUET__MINIMUM_PARTITION_SIZE="256MB"
export AWS_MAX_ATTEMPTS="10"
export AWS_RETRY_MODE="adaptive"
Production GPU Profile#
# Production GPU environment
export DEVICE="gpu"
export PROTOCOL="ucx"
export INTERFACE="ib0"
export RAPIDS_NO_INITIALIZE="0"
export CUDF_SPILL="0"
export RMM_WORKER_POOL_SIZE="72GiB"
export RMM_SCHEDULER_POOL_SIZE="1GB"
export LIBCUDF_CUFILE_POLICY="ON"
export UCX_TLS="rc,cuda_copy,cuda_ipc"
export UCX_IB_GPU_DIRECT_RDMA="yes"
export LOGDIR="/shared/logs"
export PROFILESDIR="/shared/profiles"
Memory-Constrained Profile#
# Memory-constrained environment
export DEVICE="gpu"
export PROTOCOL="tcp"
export RAPIDS_NO_INITIALIZE="1"
export CUDF_SPILL="1"
export CUDF_SPILL_DEVICE_LIMIT="0.7"
export RMM_WORKER_POOL_SIZE="12GB" # Smaller pool
export CPU_WORKER_MEMORY_LIMIT="8GB"
export DASK_DISTRIBUTED__WORKER__MEMORY__TARGET="0.5"
export DASK_DISTRIBUTED__WORKER__MEMORY__SPILL="0.6"
Environment Variable Management#
Loading Environment Variables#
From File#
# Load from environment file
set -a # Automatically export variables
source /path/to/nemo-curator.env
set +a # Stop auto-export
# Or use explicit loading
export $(cat /path/to/nemo-curator.env | xargs)
Systemd Service#
# /etc/systemd/system/nemo-curator.service
[Unit]
Description=NeMo Curator Service
After=network.target
[Service]
Type=exec
User=curator
Group=curator
EnvironmentFile=/etc/nemo-curator/environment
ExecStart=/usr/local/bin/nemo-curator-script
Restart=on-failure
[Install]
WantedBy=multi-user.target
Docker Environment#
# Dockerfile
FROM nvcr.io/nvidia/nemo:latest
# Set environment variables
ENV DEVICE=gpu
ENV PROTOCOL=ucx
ENV RMM_WORKER_POOL_SIZE=72GiB
ENV CUDF_SPILL=0
# Or load from file
COPY nemo-curator.env /etc/environment
RUN set -a && source /etc/environment && set +a
Validation Script#
#!/usr/bin/env python3
"""Validate NeMo Curator environment variables."""
import os
import sys
def validate_environment():
"""Validate environment variable configuration."""
required_vars = {
"DEVICE": ["cpu", "gpu"],
"PROTOCOL": ["tcp", "ucx"],
}
recommended_vars = {
"LOGDIR": str,
"RMM_WORKER_POOL_SIZE": str,
"CUDF_SPILL": ["0", "1"],
}
issues = []
# Check required variables
for var, valid_values in required_vars.items():
value = os.getenv(var)
if not value:
issues.append(f"Missing required variable: {var}")
elif valid_values and value not in valid_values:
issues.append(f"Invalid value for {var}: {value} (valid: {valid_values})")
# Check recommended variables
for var, expected_type in recommended_vars.items():
value = os.getenv(var)
if not value:
print(f"⚠ Recommended variable not set: {var}")
elif expected_type == str:
print(f"✓ {var} = {value}")
elif isinstance(expected_type, list) and value not in expected_type:
issues.append(f"Invalid value for {var}: {value} (valid: {expected_type})")
# GPU-specific validation
if os.getenv("DEVICE") == "gpu":
gpu_vars = ["RMM_WORKER_POOL_SIZE", "CUDF_SPILL"]
for var in gpu_vars:
if not os.getenv(var):
issues.append(f"GPU mode requires {var} to be set")
# Report results
if issues:
print("❌ Environment validation failed:")
for issue in issues:
print(f" - {issue}")
return False
else:
print("✅ Environment validation passed")
return True
if __name__ == "__main__":
success = validate_environment()
sys.exit(0 if success else 1)