nemo_curator.core.serve.constants

View as Markdown

Module Contents

Data

DEFAULT_SERVE_HEALTH_TIMEOUT_S

DEFAULT_SERVE_PORT

NOSET_CUDA_RUNTIME_ENV

PLACEMENT_GROUP_READY_TIMEOUT_S

SIGKILL_WAIT_S

SIGTERM_WAIT_S

WORKER_NODE_LABEL

API

nemo_curator.core.serve.constants.DEFAULT_SERVE_HEALTH_TIMEOUT_S = 300
nemo_curator.core.serve.constants.DEFAULT_SERVE_PORT = 8000
nemo_curator.core.serve.constants.NOSET_CUDA_RUNTIME_ENV: dict[str, Any] = {'env_vars': {'RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES': '1'}}

Runtime-env fragment telling Ray not to overwrite the worker’s CUDA_VISIBLE_DEVICES.

We explicitly set CUDA_VISIBLE_DEVICES in subprocess_env from ray.get_accelerator_ids(), so for the subprocess this flag is largely redundant — it’s kept defensively because the canonical vLLM+Ray pattern (vLLM issues #7890/#30016/#35848) relies on it.

nemo_curator.core.serve.constants.PLACEMENT_GROUP_READY_TIMEOUT_S = 180

Default timeout for pg.ready() on a freshly-created placement group.

nemo_curator.core.serve.constants.SIGKILL_WAIT_S = 5

Seconds to wait after SIGKILL before giving up on a subprocess.

nemo_curator.core.serve.constants.SIGTERM_WAIT_S = 10

Seconds to wait for SIGTERM to reap a subprocess before escalating to SIGKILL.

nemo_curator.core.serve.constants.WORKER_NODE_LABEL = {'ray.io/node-type': 'worker'}

Bundle label selector applied when CURATOR_IGNORE_RAY_HEAD_NODE=1.

Anyscale auto-labels head/worker nodes. OSS Ray users must start worker nodes with ray start --labels ray.io/node-type=worker for this to take effect.