Release Developer Guide#
Overview#
Our release cycle spans 2 months. During this window, we develop and land features through a series of Release Candidates (RCs), before entering a code-freeze period for stabilization and a final release.
Release Candidate Cadence#
New RCs are cut every Saturday, when the weekly pipeline runs.
RC |
Approximate Timing |
Key Activity |
|---|---|---|
RC0 |
Week 1 (7th–10th) |
Major dependency bump: NGC PyTorch |
RC1 |
Week 2 |
Dependency bump: TransformerEngine |
RC2 |
Week 3 |
Feature development continues |
RC3 |
Week 4 |
Code-freeze begins |
Week 5 |
Bug fixes, small improvements |
|
Week 6 |
Bug fixes, small improvements |
|
Week 7 |
QA exit, release |
RC0 through RC2 are a feature development phase — new features are actively being landed. Stabilization begins at RC3 with code-freeze.
From RC3 onward, RCs are cut more frequently and as needed, rather than strictly on Saturdays.
Golden Values#
Golden values are reference outputs used to validate model behavior in CI.
During the RC Phase (before code-freeze)#
Golden values are updated selectively:
They are updated if the new values represent an improvement, or
If the team collectively decides that a regression is acceptable.
This means golden values are not automatically updated with every run — a deliberate decision is required for any regression.
On the Release Branch (during code-freeze)#
When the release branch is created at code-freeze, all golden values are updated unconditionally. Whatever the current output is becomes the new reference baseline for the release.
Code-Freeze#
Code-freeze lasts two weeks and begins when RC3 is cut. This is the stabilization phase — no new features are landed.
First Half#
Release branches are created.
All golden values on the release branch are updated unconditionally (see above).
The last bulk CI run occurs one week into the code-freeze period.
RCs continue to be cut as needed.
Second Half#
Engineers are responsible for updating golden values on the release branch — reviewing any remaining discrepancies and ensuring the suite is in a clean state ahead of release.
RCs continue to be cut as needed.
Release Day#
The release goes out on the first Wednesday after the code-freeze window ends.
CI and Known Failures#
Ticket-Annotated Tests#
Failing CI tests can be linked to a tracking ticket. When a test fails with the same error code as the one recorded on its linked ticket, CI reports it as “passing, with known error” rather than a hard failure.
This means a green CI result does not guarantee a fully healthy test suite — it means there are no unexpected failures.
Important: Keeping Annotations Up to Date#
Ticket annotations must be actively maintained in both directions:
Add a ticket annotation when a test starts failing with a known, accepted error.
Remove the ticket annotation when the test heals.
If a test recovers but its ticket annotation is not removed, CI will report it as failing — because the actual error code no longer matches the one on record. The test being healthy is not enough; the annotation must be cleaned up for CI to go green again.