nat.utils.telemetry.handler#
Telemetry handler with batching, dead-letter queue, and bounded retries.
Designed for short-lived CLI processes: the typical usage is to construct the
handler as a context manager, enqueue one event, and let __exit__ trigger
the final flush. A 2-second default per-request timeout caps the worst-case
additional CLI latency from telemetry.
Attributes#
Hard cap on per-request HTTP latency. Keeps short-lived CLI invocations |
|
Classes#
Batches, flushes, and retries NAT telemetry events. |
Functions#
|
Best-effort lookup of the installed nvidia-nat-core version. |
Module Contents#
- logger#
- DEFAULT_REQUEST_TIMEOUT_SECONDS: float = 2.0#
Hard cap on per-request HTTP latency. Keeps short-lived CLI invocations from being delayed by an unresponsive telemetry endpoint.
- class NATTelemetryHandler(
- flush_interval_seconds: float = DEFAULT_FLUSH_INTERVAL_SECONDS,
- max_queue_size: int = DEFAULT_MAX_QUEUE_SIZE,
- max_retries: int = DEFAULT_MAX_RETRIES,
- request_timeout_seconds: float = DEFAULT_REQUEST_TIMEOUT_SECONDS,
- source_client_version: str = CLIENT_VERSION,
- session_id: str = 'undefined',
Batches, flushes, and retries NAT telemetry events.
The handler is a no-op when the global
TELEMETRY_ENABLEDflag is false:enqueue()drops every event immediately and the timer loop has nothing to send. Lifecycle methods remain safe to call regardless.Parameters#
- flush_interval_seconds:
Periodic flush cadence used by the background timer loop.
- max_queue_size:
When the in-memory queue reaches this size, an early flush is triggered.
- max_retries:
Maximum re-send attempts per event before it is dropped.
- request_timeout_seconds:
Per-request HTTP timeout. Bounds telemetry-induced latency.
- source_client_version:
Reported as
clientVerin the wire envelope. Defaults to the installednvidia-nat-coreversion.- session_id:
Identifier used to group related events.
NAT_SESSION_PREFIXis prepended if set.
- _flush_interval = 120.0#
- _max_queue_size = 50#
- _max_retries = 3#
- _request_timeout = 2.0#
- _source_client_version#
- _session_id#
- _events: list[nat.utils.telemetry.payload.QueuedEvent] = []#
- _dlq: list[nat.utils.telemetry.payload.QueuedEvent] = []#
- _flush_signal#
- _timer_task: asyncio.Task | None = None#
- _running = False#
- enqueue(event: nat.utils.telemetry.events.TelemetryEvent) None#
Queue an event for the next flush. Silently no-ops when disabled.
Reads
config.TELEMETRY_ENABLEDlive (not via cached import) so the first-run consent prompt’s late update to the flag is honored.
- _run_sync(coro: Any) Any#
Run an async coroutine from sync code, even if a loop is running.
- async _send_events(
- events: list[nat.utils.telemetry.payload.QueuedEvent],
- async _send_with_client(
- client: httpx.AsyncClient,
- events: list[nat.utils.telemetry.payload.QueuedEvent],
- payload: dict[str, Any],
- _add_to_dlq(
- events: list[nat.utils.telemetry.payload.QueuedEvent],