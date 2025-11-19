When using Unified Fabric Manager (UFM), sharp_am publishes statistical data, accessible through an HTTP endpoint in CSV, Prometheus, or JSON formats.

sharp_am generates this data at consistent intervals (recommended: every 60 seconds), regardless of whether it is being actively requested. Because of this, frequent polling will return the same data, so it’s advisable to retrieve information at intervals similar to those configured for sharp_am data updates.

The published data fields include the following:

Field Name Description metadata_host Hostname of the server running sharp_am. metadata_timestamp Unix timestamp (in seconds) indicating when data was generated; independent of request time. timestamp Unix timestamp (in milliseconds) showing when data was requested. active_jobs Total number of currently active SHARP jobs. active_sat_jobs Active SHARP jobs specifically requesting SAT rather than just LLT. agg_nodes_in_invalid_state Aggregation nodes (switches) in an invalid state and excluded from resource allocation.

The data includes histogram fields, such as active_jobs_num_hcas_histogram_bucket_X , representing active jobs based on the number of HCAs each job serves. Each bucket corresponds to a range of HCAs, with the bucket labeled _infinity covering jobs with 1025 or more HCAs.

Similarly, trees_level_histogram_bucket_X fields provide a histogram of active jobs by SHARP tree level. For instance, a job using HCAs connected to the same leaf switch (requiring only one level) would be counted in trees_level_histogram_bucket_0 .

Historical Data Fields

In addition to current metrics, sharp_am also provides historical statistics:

Field Name Description history_starting_timestamp Start time for historical data collection, which resets on restart or failover. history_denied_reservations Count of denied reservation requests, which may indicate configuration issues. history_denied_jobs_by_reservations Count of job requests denied due to mismatched reservations. history_denied_jobs_by_resource_limit Count of job denials due to insufficient resources, potentially due to disconnected or invalid switches. history_jobs_ended_due_to_client_failure Number of jobs that ended due to client-side failure. history_jobs_ended_due_to_fatal_sharp_error Number of jobs that ended due to switch failure or link error. history_jobs_ended_successfully Number of jobs completed without issues. history_ended_jobs_duration_in_hours_histogram_bucket_X Job durations (in hours) of completed jobs, segmented into histogram buckets.

For example, a job active for less than one hour would fall under history_ended_jobs_duration_in_hours_histogram_bucket_1 , while one running for six days would be counted in history_ended_jobs_duration_in_hours_histogram_bucket_168 .

Fetching Data

To retrieve this data, use port 9002 (if configured as default) and one of the following endpoints:

Endpoint Response Format /csv/fset/sharp_am CSV format /json/fset/sharp_am JSON format /fset/sharp_am Prometheus format

Example of a JSON data request:

Copy Copied! curl --silent http:

Enabling UFM Configuration

NVIDIA SHARP telemetry is disabled by default. To enable it: