Performance Tuning
This section describes how to modify Windows registry parameters in order to improve performance.
Modifying the registry incorrectly might lead to serious problems, including the loss of data, system hang, and you may need to reinstall Windows. As such it is recommended to backup the registry on your system before implementing recommendations included in this section. If the modifications you apply lead to serious problems, you will be able to restore the original registry state. For more details about backing up and restoring the registry, please visit www.microsoft.com.
To achieve the best performance for Windows, you may need to modify some of the Windows registries.
Registry Tuning
The registry entries that may be added/changed by this “General Tuning” procedure:
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
Disable TCP selective acks option for better CPU utilization:
Registry Key
Type
Value
SackOpts
REG_DWORD
0
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:
Enable fast datagram sending for UDP traffic:
Registry Key
Type
Value
FastSendDatagramThreshold
REG_DWORD
64K
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Ndis\Parameters:
Set RSS parameters:
Registry Key
Type
Value
RssBaseCpu
REG_DWORD
1
Enable RSS
Enabling Receive Side Scaling (RSS) is performed by running the following command:
“netsh int
tcp set global rss = enabled”
Improving Live Migration
In order to improve live migration over SMB direct performance, please set the following registry key to 0 and reboot the machine:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanServer\Parameters\RequireSecuritySignature
Ethernet Performance Tuning
The user can configure the Ethernet adapter by setting some registry keys. The registry keys may affect Ethernet performance.
To improve performance, activate the performance tuning tool as follows:
Start the "Device Manager" (open a command line window and enter: devmgmt.msc).
Open "Network Adapters".
Right click the relevant Ethernet adapter and select Properties.
Select the "Advanced" tab
Modify performance parameters (properties) as desired.
Performance Known Issues
On Intel I/OAT supported systems, it is highly recommended to install and enable the latest I/OAT driver (download from www.intel.com).
With I/OAT enabled, sending 256-byte messages or larger will activate I/OAT. This will cause a significant latency increase due to I/OAT algorithms. On the other hand, throughput will increase significantly when using I/OAT.
Ethernet Bandwidth Improvements
To improve Ethernet Bandwidth:
Check you are running on the closest NUMA.
In the PowerShell run: Get-NetAdapterRss -Name "adapter name"
Validate that the IndirectionTable CPUs are located at the closest NUMA.
As illustrated in the figure above, the CPUs are 0:0 - 0:7, CPU 0 -7 and the distance from the NUMA is 0, 0:0/0 - 0:7/0, unlike CPU 14-27/32767.If the CPUs are not close to the NUMEA, change the "RSS Base Processor Number" and "RSS Max Processor Number" settings under the Advance tab to point to the closest CPUs.
WarningFor high performance, it is recommended to work with at least 8 processors.
Check the Ethernet bandwidth, run ntttcp.exe.
Server side: ntttcp -r -m 32,*,server_ip
Client side: ntttcp -s -m 32,*,server_ip
IPoIB Performance Tuning
The user can configure the IPoIB adapter by setting some registry keys. The registry keys may affect IPoIB performance.
To improve performance, activate the performance tuning tool as follows:
Start the "Device Manager" (open a command line window and enter: devmgmt.msc).
Open "Network Adapters".
Right click the relevant IPoIB adapter and select Properties.
Select the "Advanced" tab
Modify performance parameters (properties) as desired.
The following is a list of key parameters for performance tuning.
Parameter |
Description |
Additional Options |
Jumbo Packet |
The maximum available size of the transfer unit, also known as the Maximum Transmission Unit (MTU). The MTU of a network can have a substantial impact on performance. A 4K MTU size improves performance for short messages, since it allows the OS to coalesce many small messages into a large one. Valid MTU values range for an Ethernet driver is between 614 and 9614. Note: All devices on the same physical network, or on the same logical network, must have the same MTU. This is applicable to the SoC MTU when using BlueField devices as well. |
- |
Receive Buffers |
The number of receive buffers (default 512). |
- |
Send Buffers |
The number of sent buffers (default 2048). |
- |
Performance Options |
Configures parameters that can improve adapter performance. |
Interrupt Moderation Moderates or delays the interrupts’ generation. Hence, optimizes network throughput and CPU utilization (default Enabled).
|
Receive Side Scaling (RSS Mode) Improves incoming packet processing performance. RSS enables the adapter port to utilize the multiple CPUs in a multi-core system for receiving incoming packets and steering them to the designated destination. RSS can significantly improve the number of transactions, the number of connections per second, and the network throughput. This parameter can be set to one of the following values:
Note: I/OAT is not used while in RSS mode. |
||
Receive Completion Method Sets the completion methods of the received packets, and can affect network throughput and CPU utilization.
|
||
Rx Interrupt Moderation Type Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize network throughput and CPU utilization. The default setting (Adaptive) adjusts the interrupt rates dynamically depending on the traffic type and network usage. Choosing a different setting may improve network and system performance in certain configurations. |
||
Send Completion Method Sets the completion methods of the Send packets and it may affect network throughput and CPU utilization. |
||
Offload Options |
Allows you to specify which TCP/IP offload settings are handled by the adapter rather than the operating system. Enabling offloading services increases transmission performance as the offload tasks are performed by the adapter hardware rather than the operating system. Thus, freeing CPU resources to work on other tasks. |
IPv4 Checksums Offload Enables the adapter to compute IPv4 checksum upon transmit and/or receive instead of the CPU (default Enabled). |
TCP/UDP Checksum Offload for IPv4 packets Enables the adapter to compute TCP/UDP checksum over IPv4 packets upon transmit and/or receive instead of the CPU (default Enabled). |
||
TCP/UDP Checksum Offload for IPv6 packets Enables the adapter to compute TCP/UDP checksum over IPv6 packets upon transmit and/or receive instead of the CPU (default Enabled). |
||
Large Send Offload (LSO) Allows the TCP/UDP stack to build a TCP/UDP message up to 64KB long and sends it in one call down the stack. The adapter then re-segments the message into multiple TCP/UDP packets for transmission on the wire with each pack sized according to the MTU. This option offloads a large amount of kernel processing time from the host CPU to the adapter. |