For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Getting Started
    • Overview
    • Integrations
  • Architecture
    • Data Flow
    • External Datastore
  • Components
    • GPU Health Monitor
    • Syslog Health Monitor
    • CSP Health Monitor IAM
    • Kubernetes Object Monitor
    • Event Exporter
    • Metadata Collector
    • Labeler
    • Platform Connectors
    • Preflight
    • State Manager
    • Node Drainer
    • Fault Quarantine
    • Fault Remediation
    • Circuit Breaker
    • Cancelling Breakfix
    • Log Collection
    • Monitoring Critical Operators
    • PostgreSQL Provider
  • Observability
    • Metrics Reference
    • Distributed Tracing
    • Audit Logging
  • Configuration
    • GPU Health Monitor
    • Syslog Health Monitor
    • CSP Health Monitor
    • Kubernetes Object Monitor
    • Fault Quarantine
    • Node Drainer
    • Fault Remediation
    • Event Exporter
    • Metadata Collector
    • Labeler
    • Platform Connectors
    • Preflight
    • MongoDB Store
  • Runbooks
    • Circuit Breaker
    • Cordoned Nodes
    • CSP Health Monitor IAM
    • Datastore Connection
    • Driver Upgrades
    • GPU Monitor DCGM Failures
    • Health Event Analyzer High Error Rate
    • Health Monitor UDS Failures
    • Log Collection Job Failures
    • Log Rotation Failures
    • MongoDB Connection Error
    • Node Conditions
    • Node Condition Update Failures
    • Node Event Creation Failures
    • Stale Events
NVIDIANVIDIA
Developer-friendly docs for your API
Privacy Policy | Your Privacy Choices | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2026, NVIDIA Corporation.

LogoLogoDocumentation
On this page
  • Overview
  • Procedure
  • 1. Verify MongoDB is Running
  • 2. Connect Using the Helper Script
  • 3. Query Health Events
Runbooks

Runbook: Connecting to the Datastore

||View as Markdown|
Previous

CSP Health Monitor IAM

Next

Driver Upgrades

Overview

This runbook guides you through connecting to the NVSentinel MongoDB datastore to query health events and troubleshoot data-related issues.

Prerequisites:

  • kubectl access to the cluster
  • mongosh (MongoDB Shell) installed locally
  • Access to the nvsentinel namespace

Procedure

1. Verify MongoDB is Running

Check the MongoDB pods:

$kubectl get pods -n nvsentinel -l app.kubernetes.io/name=mongodb

All pods should be in Running state. The default deployment has 3 replicas: mongodb-0, mongodb-1, mongodb-2.

2. Connect Using the Helper Script

Use the provided script to connect:

$cd scripts
$./mongodb-shell.sh

The script automatically:

  • Sets up port forwarding to MongoDB
  • Extracts client certificates from Kubernetes secrets
  • Connects with TLS authentication
  • Cleans up on exit

3. Query Health Events

Once connected, you can query the datastore:

Count total health events:

1db.HealthEvents.countDocuments()

Find unhealthy events:

1db.HealthEvents.find(\{"healthevent.ishealthy": false\}).limit(10).pretty()

Find events for a specific node:

1db.HealthEvents.find(\{"healthevent.nodename": "<NODE_NAME>"\}).pretty()

Find fatal events:

1db.HealthEvents.find(\{"healthevent.isfatal": true\}).limit(10).pretty()