nat.utils.telemetry.handler#

Telemetry handler with batching, dead-letter queue, and bounded retries.

Designed for short-lived CLI processes: the typical usage is to construct the handler as a context manager, enqueue one event, and let __exit__ trigger the final flush. A 2-second default per-request timeout caps the worst-case additional CLI latency from telemetry.

Attributes#

Classes#

NATTelemetryHandler

Batches, flushes, and retries NAT telemetry events.

Functions#

_resolve_client_version(→ str)

Best-effort lookup of the installed nvidia-nat-core version.

Module Contents#

logger#
DEFAULT_FLUSH_INTERVAL_SECONDS: float = 120.0#
DEFAULT_MAX_QUEUE_SIZE: int = 50#
DEFAULT_MAX_RETRIES: int = 3#
DEFAULT_REQUEST_TIMEOUT_SECONDS: float = 2.0#

Hard cap on per-request HTTP latency. Keeps short-lived CLI invocations from being delayed by an unresponsive telemetry endpoint.

_resolve_client_version() str#

Best-effort lookup of the installed nvidia-nat-core version.

CLIENT_VERSION: str#
class NATTelemetryHandler(
flush_interval_seconds: float = DEFAULT_FLUSH_INTERVAL_SECONDS,
max_queue_size: int = DEFAULT_MAX_QUEUE_SIZE,
max_retries: int = DEFAULT_MAX_RETRIES,
request_timeout_seconds: float = DEFAULT_REQUEST_TIMEOUT_SECONDS,
source_client_version: str = CLIENT_VERSION,
session_id: str = 'undefined',
)#

Batches, flushes, and retries NAT telemetry events.

The handler is a no-op when the global TELEMETRY_ENABLED flag is false: enqueue() drops every event immediately and the timer loop has nothing to send. Lifecycle methods remain safe to call regardless.

Parameters#

flush_interval_seconds:

Periodic flush cadence used by the background timer loop.

max_queue_size:

When the in-memory queue reaches this size, an early flush is triggered.

max_retries:

Maximum re-send attempts per event before it is dropped.

request_timeout_seconds:

Per-request HTTP timeout. Bounds telemetry-induced latency.

source_client_version:

Reported as clientVer in the wire envelope. Defaults to the installed nvidia-nat-core version.

session_id:

Identifier used to group related events. NAT_SESSION_PREFIX is prepended if set.

_flush_interval = 120.0#
_max_queue_size = 50#
_max_retries = 3#
_request_timeout = 2.0#
_source_client_version#
_session_id#
_events: list[nat.utils.telemetry.payload.QueuedEvent] = []#
_dlq: list[nat.utils.telemetry.payload.QueuedEvent] = []#
_flush_signal#
_timer_task: asyncio.Task | None = None#
_running = False#
enqueue(event: nat.utils.telemetry.events.TelemetryEvent) None#

Queue an event for the next flush. Silently no-ops when disabled.

Reads config.TELEMETRY_ENABLED live (not via cached import) so the first-run consent prompt’s late update to the flag is honored.

async astart() None#
async astop() None#
async aflush() None#
start() None#
stop() None#
flush() None#
_run_sync(coro: Any) Any#

Run an async coroutine from sync code, even if a loop is running.

async _timer_loop() None#
async _flush_events() None#
async _send_events(
events: list[nat.utils.telemetry.payload.QueuedEvent],
) None#
async _send_with_client(
client: httpx.AsyncClient,
events: list[nat.utils.telemetry.payload.QueuedEvent],
payload: dict[str, Any],
) None#
_add_to_dlq(
events: list[nat.utils.telemetry.payload.QueuedEvent],
) None#