Success Criteria

View as Markdown

Use these criteria to validate that your Gym integration is working correctly. A successful integration must pass all validation benchmarks.

These success criteria may evolve as new integration challenges are discovered. Check this page for updates when troubleshooting integration issues.

Validation Checklist

1. Component Form Factor

Verify that your integration implements all required components as specified in Gym Integration Footprint And Form Factor:

  • OpenAI-compatible HTTP server
  • On-policy token ID fixes
  • Gym spinup and integration
  • Rollout orchestration
  • Training loop integration

2. Environment Configuration

Verify that your integration can load and run arbitrary Gym training environments through configuration:

  • Environment configuration loads from YAML
  • Multiple environments can be selected at runtime
  • Environment parameters are configurable without code changes

3. Math Reasoning Benchmark

Train on the DAPO17k math training environment and verify model improvement on AIME24.

ParameterValue
Training environmentDAPO17k math environment
Base modelQwen3-4B-Instruct-2507
Minimum training steps1,000
Validation setAIME24 (included with training environment)
Target accuracy≥85%

4. Workplace Assistant Benchmark

Train on the workplace assistant environment and verify validation set improvements.

ParameterValue
Training environmentWorkplace assistant environment
Base modelQwen3-4B-Instruct-2507
Minimum training steps100
Success criterionObservable validation set improvement

Troubleshooting

If your integration fails to meet the success criteria:

  1. Training crashes: Check for off-policy issues. Refer to On-Policy Corrections
  2. No improvement: Verify rollout orchestration is correctly tracking token IDs
  3. Environment errors: Verify OpenAI-compatible HTTP server endpoints match the specification