This feature is supported in firmware v12.17.1016 and above.
Packet pacing, also known as “rate limit,” defines a maximum bandwidth allowed for a TCP connection. Limitation is done by hardware where each QP (transmit queue) has a rate limit value from which it calculates the delay between each packet sent.
To enable Packet Pacing in firmware
Create a file with the following content.
# vim /tmp/enable_packet_pacing.txt MLNX_RAW_TLV_FILE
0x00000004
0x0000010c
0x00000000
0x00000001
Update firmware configuration to enable Packing Pacing:
mlxconfig -d pci0:<x>:
0
:0
-f /tmp/enable_packet_pacing.txt set_rawReset the firmware.
mlxfwreset -d pci0:<x>:
0
:0
reset
Packet Pacing and Quality of Service (QoS) features do not co-exist.
Rates that are being used with packet pacing must be defined in advance.
New Rates Configuration.01`00000
Newly configured rates must be within a certain range, determined by the firmware, and they can be read through sysctl.
For a minimum value, run:
sysctl dev.mce.<N>.rate_limit.tx_limit_min
For a maximum value, run:
sysctl dev.mce.<N>.rate_limit.tx_limit_max
The number of configured rates is also determined by the firmware. In order to check how many rates can be defined, run:
sysctl dev.mce.<N>.rate_limit.tx_rates_max
To add a new rate:
sysctl dev.mce.<N>.rate_limit.tx_limit_add=
800000
This will add the defined rate to the next available index. If all rates were already defined with an index, the new rate will not be added.
NoteRates are determined and then saved in bits per second.
Rates requested for a new socket are added in bytes per second.To remove a rate limit, run:
sysctl dev.mce.<N>.rate_limit.tx_limit_clr=
80000
Deviation: The user can specify a maximum deviation of the rate via sysctl. If the rate limit table cannot satisfy the requirement, rate limiting will be disabled.
For minimum value, run:
sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_min
For maximum value, run:
sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation_max
For changing the deviation value, run:
sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation=
10000
For reading the current deviation value, run:
sysctl dev.mce.<N>.rate_limit.tx_allowed_deviation
Limitation: Rate values must be multiples of 1000.
Burst size is determined by the hardware, and can be configured via sysctl:
For a minimum value, run:
sysctl dev.mce.<N>.rate_limit.tx_burst_size_min
For a maximum value, run:
sysctl dev.mce.<N>.rate_limit.tx_burst_size_max
For changing burst level, run:
sysctl dev.mce.<N>.rate_limit.tx_burst_size=
150
To read which burst level was defined, run:
sysctl dev.mce.<N>.rate_limit.tx_burst_size
For displaying the packet pacing configuration, run:
sysctl dev.mce.<N>.rate_limit.tx_rate_show ENTRY BURST RATE [bit/s] ------------------------------------
0
150
800000
1
150
40000
2
150
1000000
3
150
25000000000
ENTRY BURST RATE [bit/s] ------------------------------------0
3
800000
1
3
40000
2
3
1000000
3
3
25000000000
where:
Entry
Rate limit table entry
Burst
Burst size configured for rate limit traffic
Rate
Rate configured for the relevant index
All rates are shown in bits per second.
1. Create a rate-limited socket according to the desired rate using the setsockopt() interface based on the previous section:
setsockopt(s, SOL_SOCKET, SO_MAX_PACING_RATE, pacing_rate, sizeof(pacing_rate))
SO_MAX_PACING_RATE |
Marks the socket as a rate limited socket |
pacing_rate |
Defined rate in bytes/sec. The type is unsigned int. Note: The same value entered via sysctl in bytes instead of bits. |
A rate-limited ring corresponding to the requested rate will be created and associated to the relevant socket.
Rate-limited traffic will be transmitted when data is sent via the socket.
2. Modify the rate-limited value using the same socket.
3. Destroy the relevant ring upon TCP socket completion.
Error Detection
Detecting failures can be done using the getsockopt() interface to query a specific socket.
MLNX_OFED for FreeBSD supports up to 100,000 rate limited TCP connections.
Each TCP connection is mapped to a specific SQ
Max rate limited rings is 100,000
Min rate: 1 Kbps
Max rate: 100 Gbps
#> sysctl -a | grep rate_limit sysctl dev.mce.<N>.rate_limit.tx_limit_min:
1000
sysctl dev.mce.<N>.rate_limit.tx_limit_max:100000000000
The following settings are recommended for a large number of connections to reduce the amount of overhead related to connection processing, as well as to handle the increased use of network buffers.
Increase size of rate limit send queue:
# sysctl dev.mce.<N>.rate_limit.tx_queue_size=
1024
Reduce number of completion events per rate limit send queue:
# sysctl dev.mce.<N>.rate_limit.tx_completion_fact=-
1
Increase non-rate-limit send queue size:
# sysctl dev.mce.<N>.conf.tx_queue_size=
16384
Reduce number of completion events per send queue:
# sysctl dev.mce.<N>.conf.tx_completion_fact=-
1
Increase receive queue size and allow many packets to be accumulated.
This gives better TX burst performance:
# sysctl dev.mce.<N>.conf.rx_queue_size=
16384
# sysctl dev.mce.<N>.conf.rx_coalesce_usecs=250
# sysctl dev.mce.<N>.conf.rx_coalesce_pkts=4096
Note for production. Allow high number of connections to terminate simultaneously:
# sysctl net.inet.icmp.icmplim=-
1
Increase memory pool for network buffers:
# sysctl kern.ipc.nmbufs=
100000000