Release Process for Monorepo-Native Services
Release Process for Monorepo-Native Services
Release Process for Monorepo-Native Services
This page covers how a service in the NVCF umbrella monorepo gets a new release: how versions are computed, what triggers a release, what gets published (OCI image, Helm chart), and how to add a new service to the release pipeline.
Pre-cutover services kept their own repo with manual chart-version / image-version bumps. Once a service moves to the umbrella, it joins this shared auto-versioning machinery.
No human git tag step. Steps 3-5 are automatic once the MR merges.
Each service has its own version line. The git tag format is:
e.g. nvcf-unbound-v0.7.18, nvcf-ratelimiter-v1.13.0,
nvcf-grpc-proxy-v0.4.2.
The prefix is required because all services share one repo’s tag
namespace: a bare v0.7.18 from nvcf-unbound would collide with the
next v0.7.18 from ratelimiter. semantic-release-monorepo uses
this prefix to know which commits belong to which service when
computing the next version (it scopes by subtree path).
The version itself is computed by semantic-release from
Conventional Commits since the previous tag. The release rules per
service (see tools/generate-subproject-ci/main.go for the
.releaserc.json template) currently bump:
feat: -> minorfix: / chore: / refactor: / style: / docs: / ci: / perf: -> patchThis is more aggressive than vanilla semantic-release defaults
(feat / fix / perf only); the umbrella opts in to releasing on
chore + docs + ci so internal-only refactors still get a fresh image
and chart that match the latest commit. If you do not want a commit
to trigger a release, use a commit subject that does not match any
of these types (e.g. wip: or no type at all).
The umbrella holds many services in one repo, so the commit analyzer
needs to know which commits are “for” each service. That filtering
comes from semantic-release-monorepo (loaded via
"extends": "semantic-release-monorepo" in the generated
.releaserc.json).
Two pieces of the wiring make path scoping work:
compute-next-release-version-<svc> script runs
cd "$SUBTREE" before writing package.json and the
.releaserc.json. That places the npm package root inside the
service subtree.semantic-release-monorepo infers the package directory from the
package.json location, then runs git log --follow -- <path>
under that subtree to enumerate the commits the analyzer sees.Commits that don’t touch files under the subtree are ignored when
computing the next version. A fix(grpc-proxy): ... commit cannot
release nvcf-unbound, and vice versa. Cross-subtree changes (e.g. an
edit in tools/ci/ plus an edit in src/<svc>/) appear in every
service whose change_paths overlap with the diff — which is why
the umbrella generator config restricts most service change_paths
to the service’s own subtree.
The version stamps every release artifact in lockstep:
nvcr.io/<org>/<image>:<X.Y.Z> (also latest,
<short-sha>)Chart.yaml’s version and appVersion both stamp
to <X.Y.Z> at package time (the in-tree Chart.yaml carries a
placeholder; the packaged .tgz gets the real semver)<service-name>-v<X.Y.Z>Every service that opts into the release pipeline emits this fixed
set of jobs in the generated YAML
(tools/ci/generated-release-jobs.yml):
compute-next-release-version-<svc> (stage: init)Runs semantic-release --dry-run --no-ci to compute the next
version. Writes a NEXT_VERSION dotenv artifact that later jobs
consume. Runs on:
$CI_COMMIT_TAG =~ /^<service-name>-v/); the
script falls back to ${CI_COMMIT_TAG#<service-name>-v} when
there is no new release-worthy commit<svc>-bazel (stage: Prerequisites)bazel test //... plus bazel build :image_index (and chart
target if applicable). Runs on MR + default-branch + scheduled +
web pipelines.
<svc>-image-push (stage: Prerequisites)On MR pipelines: runs bazel build :image_push_*. This invokes
Bazel’s analyze + assemble graph (which fetches the base image and
exercises the docker auth setup) but does NOT execute the
oci_push action. Nothing is published to NGC from MR pipelines.
On default-branch and tag pipelines: runs bazel run :image_push_*.
This executes the oci_push action and publishes to NGC.
The build-not-push split is what catches docker-auth-scope bugs and
base-image-availability bugs in MR review, instead of post-merge on
main (the original NVCF-10337 scenario). The docker_auth_path
scoping in tools/ci/subproject-validations.yaml keeps the push
token off the public base-image pull.
To be unambiguous: an MR opening today does NOT push an image anywhere. Only merges to main (or tag pushes) publish.
helm-package-<svc> (stage: publish)Optional, only emitted when the service declares a release.helm
block. Wraps cds/cds-components/helm-package@0.16.6 with
chart-version: ${NEXT_VERSION}. The packaged .tgz carries the
semver from compute-next-release-version-<svc> regardless of what
Chart.yaml in git says.
helm-push-<svc>-<target> (stage: publish)Optional, one job per entry in release.helm.push_targets. Wraps
cds/cds-components/helm-push-ngc@0.16.6 to push the packaged chart
to the named NGC org. Default ngc-duplicate: skip causes the
cds-component to swallow NGC’s “version already exists” rejection
on re-push, so the tag pipeline’s duplicate publish exits 0 without
actually replacing the chart. See “Known shortcomings and follow-ups”
below for the registry-layer truth.
semantic-release-<svc> (stage: publish)Runs semantic-release --no-ci (real run, not dry). Creates the
git tag and GitLab Release if there is a release-worthy commit. The
resource_group: semantic-release-notes setting serializes per-
service runs across the project, avoiding the
refs/notes/semantic-release push race that hits when multiple
services release on one pipeline.
<svc>-slack-notify (stage: Snapshot)Optional, only when the service declares a slack_channel. Posts
to the configured Slack channel via backstage-helper after the tag
pipeline completes.
<svc>-sonarqube-analysis (stage: Prerequisites)Optional, only when the service declares a sonarqube_project_key.
Runs sonar-scanner against the subtree on MR + default-branch
pipelines.
A developer opens an MR with a commit
fix(nvcf-unbound): handle empty A record edge case.
MR pipeline runs:
nvcf-unbound-bazel: builds and testscompute-next-release-version-nvcf-unbound: dry-run says the next
version would be 0.7.19 (patch bump from fix:)nvcf-unbound-image-push: bazel build :image_push_devops etc.
validates the auth setup and base image fetch but does not pushnvcf-unbound-sonarqube-analysis: scans the subtreeThe MR gets reviewed and merged.
Default-branch pipeline (commit on main) runs:
compute-next-release-version-nvcf-unbound recomputes
NEXT_VERSION=0.7.19nvcf-unbound-bazel builds + testsnvcf-unbound-image-push pushes
nvcr.io/nv-ngc-devops/nvcf-unbound:0.7.19 and
nvcr.io/0651155215864979/ncp-dev/nvcf-unbound:0.7.19helm-package-nvcf-unbound packages nvcf-unbound-0.7.19.tgzhelm-push-nvcf-unbound-{ncp-dev,nv-ngc-devops,nvcf-internal}
push the chart to all three NGC orgssemantic-release-nvcf-unbound creates the git tag
nvcf-unbound-v0.7.19 and a matching GitLab Releasenvcf-unbound-slack-notify posts to #nv-nvcf-cicdTag pipeline (triggered by step 6) runs jobs 1-5 again. The re-pushes are NOT clean idempotent operations:
helm-push-ngc cds-component swallows that
rejection because the umbrella sets ngc-duplicate: skip
(default). Result: the second helm-push exits 0 without
publishing. If an admin had set ngc-duplicate: overwrite, the
re-push would actually overwrite; fail would fail the job.oci_push will attempt to move the tag pointer to
the new manifest. Most NGC orgs accept this overwrite; some
reject. We rely on the org being permissive. The first push (on
the default-branch pipeline) is the canonical one.semantic-release-nvcf-unbound does not re-run on the tag pipeline
(its rule scopes to default-branch only); only the publish jobs
re-fire.
To register a new service for auto-versioned releases:
release: block in tools/ci/subproject-validations.yamlImage-only example (no Helm chart):
Image + Helm example (see nvcf-unbound for the working version):
Commit the regenerated tools/ci/generated-release-jobs.yml along
with the validations change. The check-release-pipeline-generated
CI job verifies the file is in sync.
<svc>-bazel job in the umbrella .gitlab-ci.ymlMirror an existing entry (e.g. nats-auth-callout-bazel). Sets
SUBTREE and the changes: filter.
refs/notes/semantic-releaseIf the service had previous releases from its upstream repo, anchor
the upstream’s last version into the umbrella’s history so
semantic-release picks up from a recognized starting point.
Two refs are required, not one: the version tag AND a note on
the same commit under refs/notes/semantic-release. semantic-release
locates the previous release by matching the tag prefix, but it
also reads refs/notes/semantic-release to know which commits have
been “released” already. With only the tag (no note), the
release-notes-generator plugin can misbehave: empty changelogs,
duplicate release entries, or in the worst case a re-publish of
commits already shipped under the old prefix. The nvcf-unbound
1.0.0 regression in May 2026 was caused in part by a missing
anchor; we then pushed only the tag and the next bump still went
sideways for a different reason.
The full anchor procedure for a service whose last upstream
release was <last-upstream-version>:
If the service is genuinely new (no prior releases), skip the
anchor entirely. The first release will be 1.0.0
(semantic-release default).
semantic-release-monorepo’s tag-prefix lookup find
the predecessor version.release-notes-generator know which commits have
already been published, so the next changelog only contains
commits since the anchor.Without the note, the next changelog will include every commit in git history (because semantic-release sees no record of a prior release), and the release object will reference all of them.
The cutover MR’s commit type decides whether the first umbrella-managed release bumps minor or patch:
feat(<svc>): cut over to monorepo-native -> feat: -> minor
bump. Example: anchor <svc>-v0.5.8 plus a feat: cutover
commit produces <svc>-v0.6.0, not <svc>-v0.5.9. The patch
line resets to 0.chore(<svc>): cut over to monorepo-native -> chore: -> patch
bump. Same anchor produces <svc>-v0.5.9.Both are defensible. The cutover changes how an artifact is built
and released (new pipeline, new image registry token scoping,
chart version derivation), so calling it feat: carries real
release-flow signal for operators reading the changelog. If you
care more about keeping the patch line continuous with the
upstream’s last release, use chore:.
The first six native cutovers in the umbrella all used feat:
(grpc-proxy, ratelimiter, nats-auth-callout, function-autoscaler,
http-invocation, nvcf-unbound), so each landed on the next minor
above the anchor. Future cutovers should follow the same convention
unless you have a specific reason to prefer patch continuity.
Some release job has a hard needs: on a job whose rules do not
include $CI_COMMIT_TAG. Either add the tag rule to the depended
job’s rules or mark the needs: as optional: true. Fixed for the
helm-package case in NVCF-10337 follow-up.
The git describe fallback used to fail on shallow CI clones. The
template now prefers ${CI_COMMIT_TAG#<service-name>-v} first, with
git describe as a fallback. If you still see this, the tag pipeline
fired without a matching <service-name>-v* tag.
Two pipelines (default-branch + tag) both fired the slack-notify
rule. The template now scopes slack-notify to $CI_COMMIT_TAG only
so it fires once. If you see two, the rule must have an unintended
$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH entry.
Multiple semantic-release-<svc> jobs raced on the shared notes
ref. The template sets resource_group: semantic-release-notes so
GitLab serializes them. Retrying the failed job usually succeeds
since the ref has settled by then.
The push token’s scope leaked to the public base-image pull. Make
sure docker_auth_path on the affected target is set to the
push destination’s full path (e.g.
nvcr.io/nv-ngc-devops/<image>), not just nvcr.io. Without the
scope, rules_oci applies the scoped push token to every nvcr.io
URL it pulls during the build, including the public distroless
base, and 403s. Fixed in NVCF-10337 / !294.
Items the current shape works around or punts; tracked here so the next contributor can either improve them or know why we live with them.
CI_JOB_TOKENcompute-next-release-version-<svc> and semantic-release-<svc>
both fetch a token from kv/gitlab/semantic-release/gl-token via
the vault-reader template. That token has broader scope than ideal
and the secret has to live in vault.
CI_JOB_TOKEN is the more secure default (auto-scoped, ephemeral),
but it doesn’t work with @semantic-release/gitlab because the
plugin needs API permissions (create release objects, push tags,
write notes) that CI_JOB_TOKEN lacks on protected branches
without explicit project allow-list config.
The standard workaround is to drop @semantic-release/gitlab
entirely: use semantic-release only to compute the next version,
then create the GitLab release object via the official release-cli
(which accepts CI_JOB_TOKEN) in a follow-up job. This would
eliminate the vault dependency for releases. Not done yet; happy to
take a patch.
NGC’s helm registry rejects republishing the same chart version
(immutability). The umbrella’s helm-push-ngc jobs default to
ngc-duplicate: skip, which means the cds-component silently
swallows the rejection on re-push attempts. Net behavior: the
second push exits 0 and nothing changes, which looks idempotent
from the pipeline’s perspective but isn’t truly idempotent at the
registry layer.
This matters when:
ngc-duplicate: overwrite on the component
call AND manual coordination with NGC owners. Not supported by
the current generator output.oci_push will attempt to move the tag pointer on a re-push.
Different NGC orgs configure registry immutability differently;
some accept overwrites, some reject. The umbrella relies on the
target orgs being permissive. The first push (default-branch
pipeline) is the canonical one; tag-pipeline re-push is best-effort
and we accept its outcome silently.
tools/ci/subproject-validations.yaml: source of truth for which
services have a release block and how they are configured.tools/generate-subproject-ci/main.go: the generator that emits
the YAML in tools/ci/generated-release-jobs.yml.tools/ci/generated-release-jobs.yml: generated output the
umbrella .gitlab-ci.yml includes; do not hand-edit.BAZEL.md at the repo root: Bazel-build-related conventions.deploy/stacks/self-managed/.gitlab-ci.yml: the self-managed
stack’s release flow (helmfile-based bundle, not individual chart
push). Different shape from service releases; same semantic-release
driver.