Release Notes#
Version 1.1#
New Features:
Mute Alert: Added the ability to create and manage rules to mute alerts.
Notify Alert: Added the ability to create and manage notify alert rules. Supported channels are: - Email - Slack (via email) - Webhook
CVE checking: Added checks for nodes against the NVIDIA CVE database. Added as Metrics for the ability to Mute.
New Metrics: Added the following new metrics: - dcgm_fi_dev_clocks_event_reasons - dcgm_fi_dev_fabric_manager_status - dcgm_fi_dev_nvlink_count_symbol_ber_float - dcgm_fi_dev_nvlink_count_effective_ber_float
GPU Status chart: Added a GPU Status chart to the Dashboard Resource Stats, Utilization section.
SXID error suggested actions: Suggestions in events and error reports.
Summarize events: Displays a summary of the same events on a node.
Update Agent advice: Agent install advice now displays the latest version of the agent.
Display larger charts: Added a button to display larger charts in the detail and debugging pages.
Search by hostname: Added a search by hostname to the Inventory page.
Allow component telemetry: Allow component telemetry to be selected in the debugging pages.
Agent Precheck script: Added a precheck script to the agent install.
Agent should not enroll without GPUs or with incorrect GPUs: The Agent should not enroll any nodes without GPUs or with incorrect GPUs.
Fixed Bugs:
XID display bug: Fixed accelerator-nvidia-error-sxid has no display name bug.
GPU reported up wrongly bug: Fixed GPU State incorrectly remained “up” in the face of XID 94 & 95.
Compute Zone View stuck bug: Compute Zone View stuck in loading state on Inventory page.
NVLink BER metrics bug: NVLink BER metrics are showing all zeros in debugging page.
Error report dialog hang: Error report modal dialogue hanging after click Generate.
Metric X-axis truncation bug: Metric X-axis truncation bug.
Machine index: GPU index on machine details should match GPU Chart tooltip index
Agent Liveness Check Improvement: Agent liveness check improvement.
Panel Coordination bug: Event/alert from node detail panel should carry to Debug screen.