NvGrid
NvGrid
Overview
NvGrid provides grid integration capabilities that enable datacenter power management in response to utility grid signals. It allows grid operators, demand response aggregators, and power management systems to schedule datacenter power constraints aligned with grid conditions, pricing signals, and curtailment events.
NvGrid operates at the power feed level, providing granular control over individual electrical circuits or datacenter-wide power consumption.
NvGrid Architecture
NvGrid consists of:
- Load Target Scheduler - Event-driven scheduler for time-based power constraints
- Feed Metadata - Power limits and default constraints per electrical feed
- System Status - Current power targets and calculated consumption
- Webhook Notifications - Real-time power event callbacks
Power Feeds
Datacenters are powered by one or more electrical feeds, each representing a distinct circuit. NvGrid provides independent control over each feed.
Feed Properties
Each feed has operational boundaries defined by metadata:
- Feed Tag: Unique identifier (e.g., “feed-a”)
- Power Limits: Minimum and maximum power caps
- Default Constraint: Power limit when no targets are active
Load Target Scheduling
Load targets define time-based power constraints for datacenter feeds:
- Interval - Start and end times for the constraint
- Load Constraint - Target power limit (watts, kilowatts, or megawatts)
- Strategy - Algorithm for achieving the target (
best_effort) - Feed Tags - Specific feeds to constrain (or all if omitted)
- Correlation ID - Unique identifier for tracking grid signals
Schedule Resolution
For overlapping time intervals, the most recently scheduled load target takes precedence. When constraints expire or are removed, power reverts to the default constraint.
Power Management Strategy
NvGrid uses a best_effort strategy:
- Reduces power from DPM-enabled workloads without terminating jobs
- Target may not be achieved if constraint requires job termination
- Automatically restores power when constraints are relaxed
- Integrates with resource group power management
DPM-Enabled Jobs
Dynamic Power Management (DPM) is a resource group feature that allows power limits to be adjusted without terminating workloads. NvGrid can only reduce power from jobs that have DPM enabled.
How DPM Works with NvGrid:
- Resource groups with
dpm_enabled=truecan have their power reduced - DPS adjusts power limits incrementally
- Jobs continue running at reduced power levels
- Power reduction is non-disruptive to workload execution
When DPM is Not Enabled:
- NvGrid cannot reduce power from those workloads
- Load targets may not be fully achieved
- System reports
compliant=falsein webhook events
See Resource Groups for more information on DPM configuration.
Power Restoration
When power constraints expire or are relaxed, DPS automatically restores power to DPM-enabled workloads.
Automatic Recovery
Power restoration occurs when:
- A load target’s end time is reached
- A new target with higher power limit is scheduled
- Default constraint is higher than the expiring target
Restoration Behavior
Power Distribution:
- DPM-enabled resource groups receive increased power allocations
- Power may be distributed differently than the original reduction
- Restoration respects workload power policies and limits
- Recovery occurs incrementally to maintain system stability
Timeline:
- Restoration begins automatically when constraint expires
START_RAMP_UPwebhook event signals restoration startEND_RAMP_UPwebhook event confirms new power level reachedin_flightstatus tracks active power transitions
Restoring Default Schedule
To explicitly return to default constraints before a scheduled target expires, schedule a zero watt target for the time period:
dpsctl nvgrid set-load-target \
--constraint-value=0 \
--constraint-unit=w \
--start-time="2025-10-24T16:00:00Z" \
--end-time="2025-10-24T19:00:00Z"This cancels any active targets in that time range, allowing the system to revert to default constraints.
Webhook Events
NvGrid sends HTTP POST webhook notifications for power state changes:
- START_RAMP_UP - Power constraint is relaxing (increasing limit)
- END_RAMP_UP - Power ramp up complete, new higher limit reached
- START_RAMP_DOWN - Power constraint is tightening (decreasing limit)
- END_RAMP_DOWN - Power ramp down complete, new lower limit reached
Webhooks include power values, correlation IDs, and compliance status.
Webhook Security
Webhook URLs must be configured in the NvGrid whitelist for security. Only whitelisted URLs can be registered for event notifications.
Whitelist Configuration:
Configure allowed webhook URLs in DPS Helm values:
dps:
nvgrid:
webhook:
whitelist:
- "https://grid-solution.example.com/nvgrid/webhook"
- "https://monitoring.example.com/api/"Validation:
- RegisterWebhook API validates URLs against the whitelist
- Non-whitelisted URLs are rejected with error: “webhook URL not whitelisted”
- Whitelist supports path prefix matching (trailing
/matches all sub-paths)
See Prerequisites - Webhook Security Configuration for detailed whitelist rules.
Use Cases
Utility Grid Integration:
- Respond to curtailment events from utilities
- Participate in demand response programs
- Adjust power based on dynamic pricing signals
- Report compliance to grid operators
Power Management:
- Schedule maintenance windows with reduced power
- Coordinate with renewable energy availability
- Manage peak demand periods
- Support sustainability initiatives
Further Reading
- Managing NvGrid - Step-by-step operations guide
- NvGrid CLI Commands - Command reference
- Resource Groups - Workload power management integration