Packet Pacing

Warning

This feature is supported in firmware v12.17.1016 and above.

Packet pacing, also known as “rate limit,” defines a maximum bandwidth allowed for a TCP connection. Limitation is done by hardware where each QP (transmit queue) has a rate limit value from which it calculates the delay between each packet sent.

Procedure_Heading_Icon.PNG

To enable Packet Pacing in firmware

  1. Create a file with the following content.

    Copy
    Copied!
                

    # vim /tmp/enable_packet_pacing.txt MLNX_RAW_TLV_FILE 0x00000004 0x0000010c 0x00000000 0x00000001

  2. Update firmware configuration to enable Packing Pacing:

    Copy
    Copied!
                

    mlxconfig -d pci0:<x>:0:0 -f /tmp/enable_packet_pacing.txt set_raw

  3. Reset the firmware.

    Copy
    Copied!
                

    mlxfwreset -d pci0:<x>:0:0 reset

Warning

Packet Pacing and Quality of Service (QoS) features do not co-exist.

Rates that are being used with packet pacing must be defined in advance.

New Rates Configuration.01`00000

  • Newly configured rates must be within a certain range, determined by the firmware, and they can be read through sysctl.

    • For a minimum value, run:

      Copy
      Copied!
                  

      sysctl dev.mce.<N>.rate_limit.tx_limit_min

    • For a maximum value, run:

      Copy
      Copied!
                  

      sysctl dev.mce.<N>.rate_limit.tx_limit_max

  • The number of configured rates is also determined by the firmware. In order to check how many rates can be defined, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_rates_max

  • To add a new rate:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_limit_add=800000

    This will add the defined rate to the next available index. If all rates were already defined with an index, the new rate will not be added.

    Warning

    Rates are determined and then saved in bits per second.
    Rates requested for a new socket are added in bytes per second.

  • To remove a rate limit, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_limit_clr=80000

Deviation: The user can specify a maximum deviation of the rate via sysctl. If the rate limit table cannot satisfy the requirement, rate limiting will be disabled.

  • For minimum value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_min

  • For maximum value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_max

  • For changing the deviation value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation=10000

  • For reading the current deviation value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation

Limitation: Rate values must be multiples of 1000.

Burst size is determined by the hardware, and can be configured via sysctl:

  • For a minimum value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_burst_size_min

  • For a maximum value, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_burst_size_max

  • For changing burst level, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_burst_size=150

  • To read which burst level was defined, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_burst_size

  • For displaying the packet pacing configuration, run:

    Copy
    Copied!
                

    sysctl dev.mce.<N>.rate_limit.tx_rate_show ENTRY BURST RATE [bit/s] ------------------------------------ 0 150 800000 1 150 40000 2 150 1000000 3 150 25000000000   ENTRY BURST RATE [bit/s] ------------------------------------ 0 3 800000 1 3 40000 2 3 1000000 3 3 25000000000

    where:

    Entry

    Rate limit table entry

    Burst

    Burst size configured for rate limit traffic

    Rate

    Rate configured for the relevant index

Warning

All rates are shown in bits per second.

1. Create a rate-limited socket according to the desired rate using the setsockopt() interface based on the previous section:

Copy
Copied!
            

setsockopt(s, SOL_SOCKET, SO_MAX_PACING_RATE, pacing_rate, sizeof(pacing_rate))

SO_MAX_PACING_RATE

Marks the socket as a rate limited socket

pacing_rate

Defined rate in bytes/sec. The type is unsigned int.

Note

: The same value entered via sysctl in bytes instead of bits.

  • A rate-limited ring corresponding to the requested rate will be created and associated to the relevant socket.

  • Rate-limited traffic will be transmitted when data is sent via the socket.

2. Modify the rate-limited value using the same socket.

3. Destroy the relevant ring upon TCP socket completion.

Error Detection Detecting failures can be done using the getsockopt() interface to query a specific socket.

  • MLNX_OFED for FreeBSD supports up to 100,000 rate limited TCP connections.

  • Each TCP connection is mapped to a specific SQ

  • Max rate limited rings is 100,000

  • Min rate: 1 Kbps

  • Max rate: 100 Gbps

    Copy
    Copied!
                

    #> sysctl -a | grep rate_limit sysctl dev.mce.<N>.rate_limit.tx_limit_min: 1000 sysctl dev.mce.<N>.rate_limit.tx_limit_max: 100000000000

The following settings are recommended for a large number of connections to reduce the amount of overhead related to connection processing, as well as to handle the increased use of network buffers.

  • Increase size of rate limit send queue:

    Copy
    Copied!
                

    # sysctl dev.mce.<N>.rate_limit.tx_queue_size=1024

  • Reduce number of completion events per rate limit send queue:

    Copy
    Copied!
                

    # sysctl dev.mce.<N>.rate_limit.tx_completion_fact=-1

  • Increase non-rate-limit send queue size:

    Copy
    Copied!
                

    # sysctl dev.mce.<N>.conf.tx_queue_size=16384

  • Reduce number of completion events per send queue:

    Copy
    Copied!
                

    # sysctl dev.mce.<N>.conf.tx_completion_fact=-1

  • Increase receive queue size and allow many packets to be accumulated.

    This gives better TX burst performance:

    Copy
    Copied!
                

    # sysctl dev.mce.<N>.conf.rx_queue_size=16384 # sysctl dev.mce.<N>.conf.rx_coalesce_usecs=250 # sysctl dev.mce.<N>.conf.rx_coalesce_pkts=4096

  • Note for production. Allow high number of connections to terminate simultaneously:

    Copy
    Copied!
                

    # sysctl net.inet.icmp.icmplim=-1

  • Increase memory pool for network buffers:

    Copy
    Copied!
                

    # sysctl kern.ipc.nmbufs=100000000

© Copyright 2023, NVIDIA. Last updated on May 24, 2023.