DAPO#

Dual-Clip Asymmetric Policy Optimization (DAPO) extends GRPO by allowing asymmetric clipping with distinct minimum and maximum clip parameters. This provides more fine-grained control over policy updates.

DAPO is implemented through the same ClippedPGLossFn as GRPO, but with the ability to set different values for ratio_clip_min and ratio_clip_max. For standard GRPO/PPO, these parameters are set to the same value.

Key Differences from GRPO#

  • Asymmetric Clipping: DAPO allows ratio_clip_minratio_clip_max, providing asymmetric bounds on the probability ratio

  • Same Infrastructure: Uses the same training infrastructure and configurations as GRPO

DAPO Single Node#

To run DAPO on a single GPU, use the GRPO script with asymmetric clip parameters:

# Run DAPO with asymmetric clipping
uv run python examples/run_grpo_math.py \
  policy.model_name="Qwen/Qwen2.5-1.5B" \
  grpo.ratio_clip_min=0.15 \
  grpo.ratio_clip_max=0.25 \
  checkpointing.checkpoint_dir="results/dapo_math" \
  logger.wandb_enabled=True \
  logger.wandb.name="dapo-math"

For multi-GPU setups:

uv run python examples/run_grpo_math.py \
  cluster.gpus_per_node=8 \
  grpo.ratio_clip_min=0.15 \
  grpo.ratio_clip_max=0.25 \
  checkpointing.checkpoint_dir="results/dapo_8gpu" \
  logger.wandb_enabled=True \
  logger.wandb.name="dapo-8gpu"

DAPO Multi-node#

DAPO can be run on multiple nodes using the same approach as GRPO:

# Run from the root of NeMo RL repo
NUM_ACTOR_NODES=2

COMMAND="uv run ./examples/run_grpo_math.py \
  --config examples/configs/grpo_math_8B.yaml \
  cluster.num_nodes=2 \
  grpo.ratio_clip_min=0.15 \
  grpo.ratio_clip_max=0.25 \
  checkpointing.checkpoint_dir='results/dapo_2nodes' \
  logger.wandb_enabled=True \
  logger.wandb.name='dapo-multinode'" \
CONTAINER=YOUR_CONTAINER \
MOUNTS="$PWD:$PWD" \
sbatch \
    --nodes=${NUM_ACTOR_NODES} \
    --account=YOUR_ACCOUNT \
    --job-name=YOUR_JOBNAME \
    --partition=YOUR_PARTITION \
    --time=4:0:0 \
    --gres=gpu:8 \
    ray.sub

Configuration#

DAPO uses the same configuration structure as GRPO. The key parameters are:

grpo:
  ratio_clip_min: 0.15  # Minimum clip value (can be different from max)
  ratio_clip_max: 0.25  # Maximum clip value (can be different from min)
  # ... other GRPO parameters ...

For more details on other configuration options, refer to the GRPO documentation.

Additional Resources#