gRPC Interface
The NMX-T instance runs a gRPC server that allows clients to retrieve application information and subscribe to telemetry data. The full gRPC interface prototype definition, nmx-telemetry.proto, can be found in the ./proto subdirectory of the package installation directory.
service TelemetryService {
rpc Hello(ClientHello) returns (ServerHello);
rpc SubscribeTelemetryData(TelemetrySubscription) returns (stream TelemetryData);
}
The gRPC interface is optionally secured with TLS and mTLS. By default gRPC interface runs unsecured.
disabled - no security communication enforced
tls - TLS encryption enforced, where the gRPC interface trust could be verified by the client
mtls - mutual TLS enforced, where the gRPC server also checks the trust of a connected client
The gRPC interface can be enabled or disabled. By default, it is enabled.
The parameter nmx-telemetry-grpc-interface
controls the interface's on/off state in the user_config.json
file.
The "Hello" remote procedure call is used to synchronize the client and server versions, and if needed, enforce version matching and adjust the logic accordingly.
service TelemetryService {
rpc Hello(ClientHello) returns (ServerHello);
}
Client parameters to the handshake
message ClientHello {
string gatewayId = 1
;
ProtoMsgMajorVersion major_version = 2
;
ProtoMsgMinorVersion minor_version = 3
;
}
In addition to other application-specific data, the telemetry service returns the application instance and environment identifiers.
domain_uuid
environment domain identifier, unique identifier of the GB200 instanceapp_uuid
Application instance unique identifierapp_ver
Application version string
The Remote Procedure Call SubscribeTelemetryData
enables clients to receive a stream of telemetry data collected by NMX-Telemetry.
service TelemetryService {
rpc SubscribeTelemetryData(TelemetrySubscription) returns (stream TelemetryData);
}
Message TelemetrySubscription
defines subscription parameters.
message TelemetrySubscription {
string data_type = 1
; // * | ib_counters | sys_log | gpu_counters
string source_id = 2
;
string source_tag = 3
;
}
Set the parameter values to select the types or sources of data to receive, or leave the values blank to subscribe to all available data.
data_type
Type of the data to subscribe forempty string or asterisk * to subscribe for all the data types
comma-separated list of data types for a fine-grained subscription
source_id
data source identifier to get data fromsource_tag
data source tag
Leave all the parameters empty to receive all telemetry data as it is collected, without any filtering or pre-selection.
The telemetry data response includes metadata fields and the actual data payload. The format of the payload may vary depending on the type of data received.
message TelemetryData {
string aggregator_id = 1
;
string source_id = 2
;
string source_tag = 3
;
string data_type = 4
;
int64 timestamp = 6
;
Encoding encoding_type = 7
;
bytes message = 8
;
}
Metadata fields describe the payload
aggregator_id - the unique identifier of the application domain (Oberon domain UUID)
data_type - a name of the type of data the payload contains, for example "counters"
soruce_id - identifier of the data source - device guid for the NVLink telemetry counters, switch ip and port for the gNMI aggregation, server ip for the syslog message aggregation
timestamp - moment of time the message has been formed, in microseconds
encoding_type - a hint to interpret the payload, could be JSON or BYTES
message - is the actual data payload, as described in the section below
For example a message representing an event of type nvl_packet_types_counters may have the following values:
aggregator_id = b954ce10-be66-4d75-a538-405ac8517c38
data_type = nvl_packet_types_counters
source_id = 0x1070fd030058c216
source_tag = nvlink
Telemetry data, including counters and events, is presented as comma-separated values (CSV) enclosed within a JSON format.
The JSON object consists of
Timestamp: The time at which the data is collected.
Fields: A comma-separated list of data fields contained in the payload.
Values: A list of strings, each representing a list of values corresponding to the respective fields.
Message payload of data type nvl_packet_types_counters may look like the following:
[
{
"timestamp"
: 100
,
"fields"
: "node_guid,port_guid,port_num,port_rcv_ibg1_nvl_pkts,port_rcv_ibg1_non_nvl_pkts,port_rcv_ibg2_pkts,port_xmit_ibg1_nvl_pkts,port_xmit_ibg1_non_nvl_pkts,port_xmit_ibg2_pkts"
,
"values"
: [
"0x1070fd0300580000,0x1070fd030058c216,9,0,0,0,0,0,0"
,
"0x1070fd0300580002,0x1070fd030058c216,9,0,0,0,0,0,0"
]
},
{
"timestamp"
: 200
,
"fields"
: "node_guid,port_guid,port_num,port_rcv_ibg1_nvl_pkts,port_rcv_ibg1_non_nvl_pkts,port_rcv_ibg2_pkts,port_xmit_ibg1_nvl_pkts,port_xmit_ibg1_non_nvl_pkts,port_xmit_ibg2_pkts"
,
"values"
: [
"0x1070fd0300580000,0x1070fd030058c216,9,0,0,0,0,0,0"
,
"0x1070fd0300580002,0x1070fd030058c216,9,0,0,0,0,0,0"
]
}
]
Another example, the data payload of the "counters" data type:
[
{
"timestamp"
: 1729872473718869
,
"fields"
: "node_guid,port_guid,port_num,node_description,roundtrip_time_port_counters_extended"
,
"values"
: [
"0xb83fd20300f9b7dc,0xb83fd20300f9b7dc,1,swx-proton03-bf3-2 HCA-1,,0"
]
}
]
The TelemetryData response that is a result of the gNMI Aggregated Data consists of the following:
aggregator_id: The unique identifier for the application domain (Oberon domain UUID).
data_type: The name of the gNMI subscription.
source_id: The address and port of the gNMI target from which the data is being aggregated.
timestamp: The time, in microseconds, when the message was formed.
encoding_type: A hint for interpreting the payload, which could be either JSON or PROTO.
message: The gNMI update response received from the aggregation target, either in its original binary form (encoded in PROTO) or as a JSON representation of the gNMI update message.
For example a JSON-marshalled gNMI response could look like the following:
{
"update"
: {
"prefix"
: {
"elem"
: [
{
"name"
: "interfaces"
},
{
"key"
: {
"name"
: "fnma1p1"
},
"name"
: "interface"
}
],
"target"
: "netq"
},
"timestamp"
: "1729513043599315230"
,
"update"
: [
{
"path"
: {
"elem"
: [
{
"name"
: "state"
},
{
"name"
: "counters"
},
{
"name"
: "in-octets"
}
]
},
"val"
: {
"uintVal"
: "353952"
}
}
]
}
}
The TelemetryData response that is a result of the syslog collection consists of the following:
aggregator_id: The unique identifier for the application domain (Oberon domain UUID).
data_type: The value "log_message".
source_id: The address and port of the log message's source.
source_tag: The name of the process that sent the log message.
timestamp: The time, in microseconds, when the message was generated.
encoding_type: The encoding format, either JSON or ASCII.
message: The syslog message, which may be in its original text form (encoded in BYTES) or a JSON-serialized OpenTelemetry message.
Example:
{
"time_unix_nano"
: 1731603557000000000
,
"observed_time_unix_nano"
: 1731596357165630000
,
"severity_number"
: 10
,
"severity_text"
: "notice"
,
"body"
: {
"Value"
: {
"StringValue"
: "Nov 14 16:59:17 swx-proton04: Hey!"
}
},
"attributes"
: [
{
"key"
: "facility"
,
"value"
: {
"Value"
: {
"IntValue"
: 1
}
}
},
{
"key"
: "hostname"
,
"value"
: {
"Value"
: {
"StringValue"
: "swx-proton04"
}
}
},
{
"key"
: "message"
,
"value"
: {
"Value"
: {
"StringValue"
: "Hey!"
}
}
},
{
"key"
: "priority"
,
"value"
: {
"Value"
: {
"IntValue"
: 13
}
}
},
{
"key"
: "appname"
,
"value"
: {
"Value"
: {
"StringValue"
: "bash"
}
}
}
]
}