Ray Deployment#
Distribute evaluations across a Ray cluster for teams using NeMo Gym’s Ray infrastructure.
Setup#
pip install -e ".[ray]"
CLI#
ray job submit --working-dir . -- python -m nemo_evaluator.engine.ray_launcher \
--bench gsm8k --shards 8 --repeats 5 \
--model-url https://integrate.api.nvidia.com/v1 \
--model-id azure/openai/gpt-5.2 \
--output-dir ./eval_results/ray
Architecture#
flowchart TB
HEAD["Ray Head Node<br/>ray_launcher.py"] --> W0["Worker 0<br/>problems [0, 165)"]
HEAD --> W1["Worker 1<br/>problems [165, 330)"]
HEAD --> W7["Worker 7<br/>problems [1155, 1319)"]
W0 --> FS["Shared Storage"]
W1 --> FS
W7 --> FS
FS --> MERGE["In-process merge"]
MERGE --> RESULT["Final bundle<br/>eval-*.json"]
Each shard runs as a @ray.remote task. The head node collects all results and merges them locally.
Python API#
import ray
from nemo_evaluator.engine.ray_launcher import run_shard
ray.init()
shards = 8
futures = [
run_shard.remote(
benchmark="gsm8k",
shard_idx=i,
total_shards=shards,
model_url="https://integrate.api.nvidia.com/v1",
model_id="azure/openai/gpt-5.2",
n_repeats=5,
)
for i in range(shards)
]
results = ray.get(futures)
# results is a list of bundle dicts, one per shard
Resource requirements#
Evaluation is CPU+network bound (no GPU needed for the evaluator itself):
@ray.remote(num_cpus=2, memory=2 * 1024 * 1024 * 1024)
def run_shard(...):
...
Adjust based on dataset size and concurrency. The ModelClient default is 8 concurrent requests.