NVIDIA MLNX_OFED Documentation Rev 4.9-5.1.0.0 LTS
Linux Kernel Upstream Release Notes v5.17

Time-Stamping

Time-stamping is the process of keeping track of the creation of a packet. A time-stamping service supports assertions of proof that a datum existed before a particular time. Incoming packets are time-stamped before they are distributed on the PCI depending on the congestion in the PCI buffers. Outgoing packets are time-stamped very close to placing them on the wire.

Enabling Time-Stamping

Time-stamping is off by default and should be enabled before use.

To enable

time-stamping for a socket:

Call setsockopt() with SO_TIMESTAMPING and with the following flags:

SOF_TIMESTAMPING_TX_HARDWARE:

try to obtain send time-stamp in hardware

SOF_TIMESTAMPING_TX_SOFTWARE:

if SOF_TIMESTAMPING_TX_HARDWARE is off or fails, then do it in software

SOF_TIMESTAMPING_RX_HARDWARE:

return the original, unmodified time-stamp as generated by the hardware

SOF_TIMESTAMPING_RX_SOFTWARE:

if SOF_TIMESTAMPING_RX_HARDWARE is off or fails, then do it in software

SOF_TIMESTAMPING_RAW_HARDWARE:

return original raw hardware time-stamp

SOF_TIMESTAMPING_SYS_HARDWARE:

return hardware time-stamp transformed into the system time base

SOF_TIMESTAMPING_SOFTWARE:

return system time-stamp generated in software

SOF_TIMESTAMPING_TX/RX

determine how time-stamps are generated

SOF_TIMESTAMPING_RAW/SYS

determine how they are reported

To enable

ping for a net device:

Admin privileged user can enable/disable time stamping through calling ioctl (sock, SIOCSH-WTSTAMP, &ifreq) with the following values:

  • Send side time sampling, enabled by ifreq.hwtstamp_config.tx_type when:

    Copy
    Copied!
                

    /* possible values for hwtstamp_config->tx_type */ enum hwtstamp_tx_types { /* * No outgoing packet will need hardware time stamping; * should a packet arrive which asks for it, no hardware * time stamping will be done. */ HWTSTAMP_TX_OFF,   /* * Enables hardware time stamping for outgoing packets; * the sender of the packet decides which are to be * time stamped by setting %SOF_TIMESTAMPING_TX_SOFTWARE * before sending the packet. */ HWTSTAMP_TX_ON, /* * Enables time stamping for outgoing packets just as * HWTSTAMP_TX_ON does, but also enables time stamp insertion * directly into Sync packets. In this case, transmitted Sync * packets will not received a time stamp via the socket error * queue. */ HWTSTAMP_TX_ONESTEP_SYNC, }; Note: for send side time stamping currently only HWTSTAMP_TX_OFF and HWTSTAMP_TX_ON are supported.

  • Receive side time sampling, enabled by ifreq.hwtstamp_config.rx_filter when:

    Copy
    Copied!
                

    /* possible values for hwtstamp_config->rx_filter */ enum hwtstamp_rx_filters { /* time stamp no incoming packet at all */ HWTSTAMP_FILTER_NONE,   /* time stamp any incoming packet */ HWTSTAMP_FILTER_ALL, /* return value: time stamp all packets requested plus some others */ HWTSTAMP_FILTER_SOME,   /* PTP v1, UDP, any kind of event packet */ HWTSTAMP_FILTER_PTP_V1_L4_EVENT, /* PTP v1, UDP, Sync packet */ HWTSTAMP_FILTER_PTP_V1_L4_SYNC, /* PTP v1, UDP, Delay_req packet */ HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ, /* PTP v2, UDP, any kind of event packet */ HWTSTAMP_FILTER_PTP_V2_L4_EVENT, /* PTP v2, UDP, Sync packet */ HWTSTAMP_FILTER_PTP_V2_L4_SYNC, /* PTP v2, UDP, Delay_req packet */ HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ, /* 802.AS1, Ethernet, any kind of event packet */ HWTSTAMP_FILTER_PTP_V2_L2_EVENT, /* 802.AS1, Ethernet, Sync packet */ HWTSTAMP_FILTER_PTP_V2_L2_SYNC, /* 802.AS1, Ethernet, Delay_req packet */ HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ,   /* PTP v2/802.AS1, any layer, any kind of event packet */ HWTSTAMP_FILTER_PTP_V2_EVENT, /* PTP v2/802.AS1, any layer, Sync packet */ HWTSTAMP_FILTER_PTP_V2_SYNC, /* PTP v2/802.AS1, any layer, Delay_req packet */ HWTSTAMP_FILTER_PTP_V2_DELAY_REQ, }; Note: for receive side time stamping currently only HWTSTAMP_FILTER_NONE and HWTSTAMP_FILTER_ALL are supported.

Getting Time-Stamping

Once time stamping is enabled time stamp is placed in the socket Ancillary data. recvmsg() can be used to get this control message for regular incoming packets. For send time stamps the outgoing packet is looped back to the socket's error queue with the send time-stamp(s) attached. It can
be received with recvmsg (flags=MSG_ERRQUEUE). The call returns the original outgoing packet data including all headers prepended down to and including the link layer, the scm_time-stamping control message and a sock_extended_err control message with ee_errno==ENOMSG and ee_origin==SO_EE_ORIGIN_TIMESTAMPING. A socket with such a pending bounced packet is ready for reading as far as select() is concerned. If the outgoing packet has to be fragmented, then only the first fragment is time stamped and returned to the sending socket.

Warning

When time-stamping is enabled, VLAN stripping is disabled. For more info please refer to Documentation/networking/timestamping.txt in kernel.org

Time Stamping Capabilities via ethtool

To enable

To display Time Stamping capabilities via ethtool:

Show Time Stamping capabilities:

Copy
Copied!
            

ethtool -T eth<x>

Example:

Copy
Copied!
            

ethtool -T eth0 Time stamping parameters for p2p1: Capabilities: hardware-transmit (SOF_TIMESTAMPING_TX_HARDWARE) software-transmit (SOF_TIMESTAMPING_TX_SOFTWARE) hardware-receive (SOF_TIMESTAMPING_RX_HARDWARE) software-receive (SOF_TIMESTAMPING_RX_SOFTWARE) software-system-clock (SOF_TIMESTAMPING_SOFTWARE) hardware-raw-clock (SOF_TIMESTAMPING_RAW_HARDWARE) PTP Hardware Clock: 1  Hardware Transmit Timestamp Modes: off (HWTSTAMP_TX_OFF) on (HWTSTAMP_TX_ON)   Hardware Receive Filter Modes: none (HWTSTAMP_FILTER_NONE) all (HWTSTAMP_FILTER_ALL)

For more details on PTP Hardware Clock, please refer to: h ttps://www.kernel.o rg/do c/Do cumentation/ptp/ptp.txt

Steering PTP Traffic to Single RX Ring

As a result of Receive Side Steering (RSS) PTP traffic coming to UDP ports 319 and 320, it may reach the user space application in an out of order manner. In order to prevent this, PTP traffic needs to be steered to single RX ring using ethtool.

Example:

Copy
Copied!
            

# ethtool -u ens7 8 RX rings available Total 0 rules # ethtool -U ens7 flow-type udp4 dst-port 319 action 0 loc 1 # ethtool -U ens7 flow-type udp4 dst-port 320 action 0 loc 0 # ethtool -u ens7 8 RX rings available Total 2 rules Filter: 0 Rule Type: UDP over IPv4 Src IP addr: 0.0.0.0 mask: 255.255.255.255 Dest IP addr: 0.0.0.0 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Src port: 0 mask: 0xffff Dest port: 320 mask: 0x0 Action: Direct to queue 0 Filter: 1 Rule Type: UDP over IPv4 Src IP addr: 0.0.0.0 mask: 255.255.255.255 Dest IP addr: 0.0.0.0 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Src port: 0 mask: 0xffff Dest port: 319 mask: 0x0 Action: Direct to queue 0

RoCE Time-Stamping allows you to stamp packets when they are sent to the wire/received from the wire. The time-stamp is given in raw hardware cycles but could be easily converted into hardware referenced nanoseconds based time. Additionally, it enables you to query the hardware for the hardware time, thus stamp other application's event and compare time.

Query Capabilities

Time-stamping is available if and only the hardware reports it is capable of reporting it. To verify whether RoCE Time-Stamping is available, run ibv_exp_query_device.
For example:

Copy
Copied!
            

struct ibv_exp_device_attr attr; ibv_exp_query_device(context, &attr); if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_WITH_TIMESTAMP_MASK) { if (attr.timestamp_mask) { /* Time stamping is supported with mask attr.timestamp_mask */ } } if (attr.comp_mask & IBV_EXP_DEVICE_ATTR_WITH_HCA_CORE_CLOCK) { if (attr.hca_core_clock) { /* reporting the device's clock is supported. */  /* attr.hca_core_clock is the frequency in MHZ */ } }

Creating a Time-Stamping Completion Queue

To get time stamps, a suitable extended Completion Queue (CQ) must be created via a special call to ibv_exp_create_cq verb.

Copy
Copied!
            

cq_init_attr.flags = IBV_EXP_CQ_TIMESTAMP; cq_init_attr.comp_mask = IBV_EXP_CQ_INIT_ATTR_FLAGS; cq = ibv_exp_create_cq(context, cqe, node, NULL, 0, &cq_init_attr);

Warning

In ConnectX-3 family devices, this CQ cannot report SL or SLID information. The value of sl and sl_id fields in struct ibv_exp_wc are invalid. Only the fields indicated by the exp_wc_flags field in struct ibv_exp_wc contains a valid and usable value.

Warning

In ConnectX-3 family devices, when using Time Stamping, several fields of struct ibv_exp_wc are not available resulting in RoCE UD / RoCE traffic with VLANs failure.

Warning

In ConnectX-4 family devices, Time Stamping in not available when CQE zipping is used.

Polling a Completion Queue

Polling a CQ for time stamp is done via the ibv_exp_poll_cq verb.

Copy
Copied!
            

ret = ibv_exp_poll_cq(cq, 1, &wc_ex, sizeof(wc_ex)); if (ret > 0) { /* CQ returned a wc */ if (wc_ex.exp_wc_flags & IBV_EXP_WC_WITH_TIMESTAMP) { /* This wc contains a timestamp */ timestamp = wc_ex.timestamp; /* Timestamp is given in raw hardware time */ } }

Warning

CQs that are opened with the ibv_exp_create_cq verbs should be always be polled with the ibv_exp_poll_cq verb.

Querying the Hardware Time

Querying the hardware for time is done via the ibv_exp_query_values verb. For example:

Copy
Copied!
            

ret = ibv_exp_query_values(context, IBV_EXP_VALUES_HW_CLOCK, &queried_values); if (!ret && queried_values.comp_mask & IBV_EXP_VALUES_HW_CLOCK) queried_time = queried_values.hwclock;

To change the queried time in nanoseconds resolution, use the IBV_EXP_VALUES_HW_CLOCK_NS
flag along with the hwclock_ns field.

Copy
Copied!
            

ret = ibv_exp_query_values(context, IBV_EXP_VALUES_HW_CLOCK_NS, &queried_values); if (!ret && queried_values.comp_mask & IBV_EXP_VALUES_HW_CLOCK_NS) queried_time_ns = queried_values.hwclock_ns;

Warning

In ConnectX-3 family devices, querying the Hardware Time is available only on physical functions / native machines.

Warning

This feature is supported on ConnectX-4 adapter cards family and above only.

1PPS is a time synchronization feature that allows the adapter to be able to send or receive 1 pulse per second on a dedicated pin on the adapter card using an SMA connector (SubMiniature version A). Only one pin is supported and could be configured as 1PPS in or 1PPS out.
For further information, refer to HowTo Test 1PPS on NVIDIA Adapters Community post.

© Copyright 2023, NVIDIA. Last updated on May 23, 2023.