Release Notes for NeMo Platform#

Check out the latest release notes for the NeMo Platform.

Tip

If you’ve installed one of the previous releases of the NeMo Platform using Helm and want to upgrade, choose one of the following options:

Release 26.3.0#

This release is a comprehensive revamp of the NeMo Platform. The architecture, deployment experience, and security model have all been redesigned to improve modularity, ease of use, and enterprise readiness.

New Quickstart Experience#

Getting started with NeMo Platform is now faster and simpler. A new Python-based CLI and SDK (nemo-platform) replaces the previous setup flow:

  • Install with a single command: pip install nemo-platform

  • Launch the full platform locally with nmp quickstart up — no Kubernetes required

  • By default, quickstart uses remote NVIDIA inference endpoints, so no GPU is needed on your machine

  • Optionally configure local GPU inference with nmp quickstart configure for full on-device model serving

  • A built-in chat command (nmp chat <model-name>) lets you interact with models immediately after startup

Redesigned Helm Chart#

The Kubernetes deployment has been consolidated into a single all-in-one Helm chart (nemo-platform), available from the NVIDIA NGC Helm registry:

  • One chart installs the entire NeMo Platform, replacing the previous multi-chart setup

  • Supports on-premises and cloud Kubernetes clusters

  • Configurable add-ons include external databases, persistent volumes, ingress, multi-node networking, and OpenShift compatibility

  • Upgrade and rollback follow standard Helm workflows (helm upgrade, helm rollback)

Authorization Overhaul#

The authorization system has been rebuilt around an embedded policy engine:

  • OPA (Open Policy Agent) policies now run as WebAssembly inside each service process — no external auth sidecar is required

  • All API endpoints are workspace-scoped (/v2/workspaces/{workspace}/...), making multi-tenant access control straightforward

  • Role-based access control policies are compiled to policy.wasm at build time and evaluated at ~5,000 decisions per second

  • Auth can be toggled on or off via configuration, simplifying local development and testing