Machine and DPU Logs
This document covers log collection from NICo-managed devices: machine/DPU serial console output and DPU system/application logs. For NICo service logs (nico-api, nico-dns, etc), see /infra-controller/documentation/operations-day-2/observability/logging.
1. Overview
NICo manages two categories of device logs:
2. Machine console logs (nico-ssh-console)
2.1 How it works
When a user connects to a machine’s BMC console through nico-ssh-console, the proxy:
- Establishes an SSH session to the BMC
- Captures all serial console output (stdout from the BMC session)
- Strips ANSI escape sequences for cleaner logs
- Writes timestamped output to a local file
- Rotates files when they exceed the configured size
Console logging runs for the duration of each BMC session. When the session ends, the logger writes a closing timestamp and flushes the file.
2.2 Log file location and naming
Console logs are written to:
For example:
The filename encodes both the NICo machine ID and the BMC IP address, making it easy to identify which machine produced the logs.
2.3 Log content
Console logs contain raw serial output with session markers:
These logs are useful for:
- Debugging boot failures
- Capturing kernel panics and oops messages
- Reviewing BIOS/UEFI output
- Diagnosing hardware initialization issues
2.4 Configuration
Console logging is controlled by nico-ssh-console configuration:
When rotation occurs, the current .log file becomes .log.0, previous .log.0 becomes
.log.1, and so on. The oldest file beyond the limit is deleted.
2.5 Centralizing console logs
The nico-ssh-console Helm chart includes an optional OpenTelemetry Collector sidecar for shipping console logs to a backend.
Enable the sidecar:
Default sidecar configuration:
The sidecar reads from /var/log/consoles/*.log and extracts machine metadata from filenames:
Alternative: stdout to DaemonSet collector:
To follow the standard Kubernetes pattern where the DaemonSet collector picks up all pod logs, configure the sidecar to write to stdout instead of directly to a backend:
The DaemonSet collector on each node reads /var/log/pods/ (including the sidecar’s stdout)
and forwards all logs to your backend. This keeps the architecture simple - console logs flow
through the same pipeline as all other pod logs.
2.6 Querying console logs
Once centralized, query console logs by machine ID:
Loki (LogQL):
VictoriaLogs (LogsQL):
To find boot failures or kernel panics:
3. DPU logs
DPUs run an OpenTelemetry Collector (otelcol-contrib) deployed via the nico-otelcol Helm
chart (bluefield/charts/nico-otelcol/). The chart deploys a DaemonSet that runs on DPU nodes
managed by DPF (DOCA Platform Framework). For non-Kubernetes DPU deployments, a systemd service
(otelcol-contrib.service) provides the same functionality.
The collector gathers logs from multiple sources and forwards them to the site controller over mTLS.
3.1 Log sources on the DPU
The following logs are collected from the DPU Arm OS. All log files are physically located on the DPU’s local filesystem.
DOCA/HBN log paths:
These paths are on the DPU Arm OS filesystem. The HBN container writes logs to host-mounted volumes, making them accessible to the otelcol-contrib collector running on the host.
nico-dpu-agent logs:
The agent can run in two modes depending on the deployment:
- Systemd service (
forge-dpu-agent.service): Logs go to journald. Query withjournalctl -u forge-dpu-agent.service. - Containerized DaemonSet (via DPF): Logs go to stdout, captured at
/var/log/pods/*/nico-dpu-agent/*.log. The collector parses the CRI log format.
In both cases, the agent emits logfmt output. The otelcol-contrib collector extracts log levels
from the logfmt level= field.
3.2 DPU collector configuration
The DPU runs otelcol-contrib with configuration from /etc/otelcol-contrib/config.yaml.
Key aspects:
Resource attributes added to all logs:
host.name— DPU hostname (fromresourcedetectionprocessor)machine.id— NICo machine ID (from file at/run/otelcol-contrib/machine-id)host.machine.id— Host machine ID (from/run/otelcol-contrib/host-machine-id)component— Log source identifier (journald,hbn,dpu-auth-filelog)
Export to site controller:
DPU logs are sent over mTLS using machine certificates provisioned by NICo. The
forge-dpu-otel-agent service handles certificate renewal.
3.3 Site controller receiver
The site controller’s otel-collector receives DPU logs via OTLP and routes them through processing pipelines:
Resource labels for Loki indexing:
3.4 Querying DPU logs
By DPU hostname:
By machine ID:
By component:
Kernel errors on a specific DPU:
4. Troubleshooting
Console logs not appearing
Verify console logging is working:
DPU logs not appearing
Verify DPU collector is running:
Verify site collector is receiving:
Accessing DPU logs directly
When centralized logging isn’t available or you need to debug on the DPU itself:
If SSH to the DPU fails, use DPU BMC or rshim console access to check whether the DPU OS booted.