nat.utils.telemetry.handler#

Telemetry handler with batching, dead-letter queue, and bounded retries.

Designed for short-lived CLI processes: the typical usage is to construct the handler as a context manager, enqueue one event, and let __exit__ trigger the final flush. A 2-second default per-request timeout caps the worst-case additional CLI latency from telemetry.

Attributes#

`logger`
`DEFAULT_FLUSH_INTERVAL_SECONDS`
`DEFAULT_MAX_QUEUE_SIZE`
`DEFAULT_MAX_RETRIES`
`DEFAULT_REQUEST_TIMEOUT_SECONDS`	Hard cap on per-request HTTP latency. Keeps short-lived CLI invocations
`CLIENT_VERSION`

Classes#

NATTelemetryHandler

Batches, flushes, and retries NAT telemetry events.

Functions#

_resolve_client_version(→ str)

Best-effort lookup of the installed nvidia-nat-core version.

Module Contents#

logger#

DEFAULT_FLUSH_INTERVAL_SECONDS: float = 120.0#

DEFAULT_MAX_QUEUE_SIZE: int = 50#

DEFAULT_MAX_RETRIES: int = 3#

DEFAULT_REQUEST_TIMEOUT_SECONDS: float = 2.0#: Hard cap on per-request HTTP latency. Keeps short-lived CLI invocations from being delayed by an unresponsive telemetry endpoint.

_resolve_client_version() → str#: Best-effort lookup of the installed nvidia-nat-core version.

CLIENT_VERSION: str#

class NATTelemetryHandler( flush_interval_seconds: float = DEFAULT_FLUSH_INTERVAL_SECONDS, max_queue_size: int = DEFAULT_MAX_QUEUE_SIZE, max_retries: int = DEFAULT_MAX_RETRIES, request_timeout_seconds: float = DEFAULT_REQUEST_TIMEOUT_SECONDS, source_client_version: str = CLIENT_VERSION, session_id: str = 'undefined', )#

Batches, flushes, and retries NAT telemetry events.

The handler is a no-op when the global TELEMETRY_ENABLED flag is false: enqueue() drops every event immediately and the timer loop has nothing to send. Lifecycle methods remain safe to call regardless.

Parameters#

flush_interval_seconds:: Periodic flush cadence used by the background timer loop.
max_queue_size:: When the in-memory queue reaches this size, an early flush is triggered.
max_retries:: Maximum re-send attempts per event before it is dropped.
request_timeout_seconds:: Per-request HTTP timeout. Bounds telemetry-induced latency.
source_client_version:: Reported as clientVer in the wire envelope. Defaults to the installed nvidia-nat-core version.
session_id:: Identifier used to group related events. NAT_SESSION_PREFIX is prepended if set.

_flush_interval = 120.0#

_max_queue_size = 50#

_max_retries = 3#

_request_timeout = 2.0#

_source_client_version#

_session_id#

_events: list[nat.utils.telemetry.payload.QueuedEvent] = []#

_dlq: list[nat.utils.telemetry.payload.QueuedEvent] = []#

_flush_signal#

_timer_task: asyncio.Task | None = None#

_running = False#

enqueue(event: nat.utils.telemetry.events.TelemetryEvent) → None#

Queue an event for the next flush. Silently no-ops when disabled.

Reads config.TELEMETRY_ENABLED live (not via cached import) so the first-run consent prompt’s late update to the flag is honored.

async astart() → None#

async astop() → None#

async aflush() → None#

start() → None#

stop() → None#

flush() → None#

_run_sync(coro: Any) → Any#: Run an async coroutine from sync code, even if a loop is running.

async _timer_loop() → None#

async _flush_events() → None#

async _send_events( events: list[nat.utils.telemetry.payload.QueuedEvent], ) → None#

async _send_with_client( client: httpx.AsyncClient, events: list[nat.utils.telemetry.payload.QueuedEvent], payload: dict[str, Any], ) → None#

_add_to_dlq( events: list[nat.utils.telemetry.payload.QueuedEvent], ) → None#