Success Criteria#
Use these criteria to validate that your Gym integration is working correctly. A successful integration must pass all validation benchmarks.
Tip
These success criteria may evolve as new integration challenges are discovered. Check this page for updates when troubleshooting integration issues.
Validation Checklist#
1. Component Form Factor#
Verify that your integration implements all required components as specified in Integration Footprint:
OpenAI-compatible HTTP server
On-policy token ID fixes
Gym spinup and integration
Rollout orchestration
Training loop integration
2. Environment Configuration#
Verify that your integration can load and run arbitrary Gym training environments through configuration:
Environment configuration loads from YAML
Multiple environments can be selected at runtime
Environment parameters are configurable without code changes
3. Math Reasoning Benchmark#
Train on the DAPO17k math training environment and verify model improvement on AIME24.
Parameter |
Value |
|---|---|
Training environment |
|
Base model |
|
Minimum training steps |
1,000 |
Validation set |
AIME24 (included with training environment) |
Target accuracy |
≥85% |
4. Workplace Assistant Benchmark#
Train on the workplace assistant environment and verify validation set improvements.
Parameter |
Value |
|---|---|
Training environment |
|
Base model |
|
Minimum training steps |
100 |
Success criterion |
Observable validation set improvement |
Troubleshooting#
If your integration fails to meet the success criteria:
Training crashes: Check for off-policy issues. Refer to On-Policy Corrections
No improvement: Verify rollout orchestration is correctly tracking token IDs
Environment errors: Verify OpenAI-compatible HTTP server endpoints match the specification