Release 26.3.0#

This release is a comprehensive revamp of the NeMo Platform. The architecture, deployment experience, and security model have all been redesigned to improve modularity, ease of use, and enterprise readiness.

New Quickstart Experience#

Getting started with NeMo Platform is now faster and simpler. A new Python-based CLI and SDK (nemo-platform) replaces the previous setup flow:

Install with a single command: pip install nemo-platform
Launch the full platform locally with nmp quickstart up — no Kubernetes required
By default, quickstart uses remote NVIDIA inference endpoints, so no GPU is needed on your machine
Optionally configure local GPU inference with nmp quickstart configure for full on-device model serving
A built-in chat command (nmp chat <model-name>) lets you interact with models immediately after startup

Redesigned Helm Chart#

The Kubernetes deployment has been consolidated into a single all-in-one Helm chart (nemo-platform), available from the NVIDIA NGC Helm registry:

One chart installs the entire NeMo Platform, replacing the previous multi-chart setup
Supports on-premises and cloud Kubernetes clusters
Configurable add-ons include external databases, persistent volumes, ingress, multi-node networking, and OpenShift compatibility
Upgrade and rollback follow standard Helm workflows (helm upgrade, helm rollback)

Authorization Overhaul#

The authorization system has been rebuilt around an embedded policy engine:

OPA (Open Policy Agent) policies now run as WebAssembly inside each service process — no external auth sidecar is required
All API endpoints are workspace-scoped (/v2/workspaces/{workspace}/...), making multi-tenant access control straightforward
Role-based access control policies are compiled to policy.wasm at build time and evaluated at ~5,000 decisions per second
Auth can be toggled on or off via configuration, simplifying local development and testing