General Performance Optimization and Tuning
To achieve the best performance for Windows, you may need to modify some of the Windows registries.
IND2QueuePairsPool
The interface is an extension to the Network Direct SPI version 2. It reduces the creation time of the IND2QueuePair and IND2CompletionQueue interfaces, hence improves the client-server connection establishment time.
The interface exposes a pool of pre-allocated IND2QueuePair and IND2CompletionQueue interfaces associated with it. Pre-allocation is done using a background thread when a pre-configured threshold is reached.
The API for this interface is documented in the SDK header file ndspi_ext_mlx.h.
Using IND2QueuePairsPool:
- Create a pool using IND2Adapter: QueryInterface with IID_IND2QueuePairsPool. 
- Set pool configuration using the SetQueuePairParams and SetCompletionQueueParams methods. 
- Set background creation thresholds using the SetLimits method 
- Fill the pool using the Fill method. 
- Create items IND2QueuePair and IND2CompletionQueue associated with it using the CreateObjects method. 
- Statistics about the utilization of the resource pool are available to allow the programmer to select optimal thresholds 
Nd2AdapterControlSetCqInterruptModeration
The method is an extension to the second version of the Network Direct SPI. It controls the amount of events received per completion, which reduces the amount of interrupts, and thereby improves the performance.
The method allows the user to control the amount of completions that will trigger an event, and the amount of time required before the next completion can occur, until an event is sent, as long as the completion number limit has not been reached.
The API for this interface is documented in the ndspi_ext_mlx.h SDK header file.
Using Nd2AdapterControlSetCqInterruptModeration
The usage of the Nd2AdapterControlSetCqInterruptModeration is similar to the usage of the function NDK_FN_CONTROL_CQ_INTERRUPT_MODERATION in MSDN NDK SPI. For more information, see: https://msdn.microsoft.com/en-us/library/windows/hardware/jj552973(v=vs.85).aspx
The ModerationInterval will always be rounded down to its limit, thus the ModerationCount will never solely control the interrupt moderation on the CQ.
The registry entries that may be added/changed by this “General Tuning” procedure are:
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
- Disable TCP selective acks option for better cpu utilization: - SackOpts, type REG_DWORD, value set to - 0.
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:
- Enable fast datagram sending for UDP traffic: - FastSendDatagramThreshold, type REG_DWORD, value set to 64K. 
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Ndis\Parameters:
- Set RSS parameters: - RssBaseCpu, type REG_DWORD, value set to - 1.
Enabling Receive Side Scaling (RSS) is performed by means of the following command:
            
            “netsh int tcp set global rss = enabled”
    The IPoIB Network Adapter tuning can be performed either during installation by modifying some of Windows registries as explained in Registry Tuning above, or can be set post-installation manually.
To improve the network adapter performance, activate the performance tuning tool as follows:
- Start the "Device Manager" (open a command line window and enter: devmgmt.msc). 
- Open "Network Adapters". 
- Select Mellanox IPoIB adapter, right click and select Properties. 
- Select the “Performance tab”. 
- Choose one of the tuning scenarios: 
 • Single port traffic - Improves performance for running single port traffic each time.
 • Dual port traffic - Improves performance for running traffic on both ports simultaneously.
 • Forwarding traffic - Improves performance for running scenarios that involve both ports (for example: via IXIA)
 • Multicast traffic - Improves performance when the main traffic runs on multicast.
- Click on “Run Tuning” button. 
 Clicking the “Run Tuning” button changes several registry entries (described below), and checks for system services that may decrease network performance. It also generates a log including the applied changes.
 Users can view this log to restore the previous values. The log path is:- %HOMEDRIVE%\Windows\System32\LogFiles\PerformanceTunning.log - This tuning is required to be performed only once after the installation is completed, and on one adapter only (as long as these entries are not changed directly in the registry, or by some other installation or script). 
A reboot may be required for the changes to take effect.
The Ethernet Network Adapter general tuning can be performed during installation by modifying some of Windows registries as explained in Registry Tuning above. Specific scenarios tuning can be set post-installation manually.
To improve the network adapter performance, activate the performance tuning tool as follows:
- Start the "Device Manager" (open a command line window and enter: devmgmt.msc). 
- Open "Network Adapters". 
- Select Mellanox Ethernet adapter, right click and select Properties. 
- Select the "Performance tab". 
- Choose one of the tuning scenarios: 
 • Single port traffic - Improves performance for running single port traffic each time.
 • Single stream traffic - Optimizes tuning for applications with single connection.
 • Dual port traffic - Improves performance for running traffic on both ports simultaneously.
 • Forwarding traffic - Improves performance for running scenarios that involve both ports (for example: via IXIA)
 • Multicast traffic - Improves performance when the main traffic runs on multicast.
- Click on “Run Tuning” button.   - Clicking the "Run Tuning" button activates the general tuning as explained above and changes several driver registry entries for the current adapter and its sibling device once the sibling is an Ethernet device as well. It also generates a log including the applied changes. 
 Users can view this log to restore the previous values. The log path is:- %HOMEDRIVE%\Windows\System32\LogFiles\PerformanceTunning.log - This tuning is required to be performed only once after the installation is completed, and on one adapter only (as long as these entries are not changed directly in the registry, or by some other installation or script). Warning- Please note that a reboot may be required for the changes to take effect. 
You can also activate the performance tuning through a script called perf_tuning.exe. This script has 4 options, which include the 3 scenarios described above and an additional manual tuning through which you can set the RSS base and number of processors for each Ethernet adapter. The adapters you wish to tune are supplied to the script by their name according to the “Network Connections”.
Synopsis
            
            perf_tuning.exe -s -c1 <first connection name> [-c2 <second connection name>]
perf_tuning.exe -d -c1 <first connection name> -c2 <second connection name>
perf_tuning.exe -f -c1 <first connection name> -c2 <second connection name>
perf_tuning.exe -m -c1 <first connection name> -b <base RSS processor number> -n <number of RSS processors>
perf_tuning -st -c1 <first connection name> [-c2 <second connection name>]
    Performance Tuning Tool Application Options
| Flag | Description | 
| -s | Single port traffic scenario. 
 Additionally, this option chooses the best processors to assign to: 
 | 
| -d | Dual port traffic scenario. 
 Additionally, this option chooses the best processors to assign to: 
 | 
| -f | Forwarding traffic scenario. 
 Additionally, this option chooses the best processors to assign to: 
 | 
| -m | Manual configuration 
 Additionally, this option assigns the following with processors inside the range: 
 | 
| -r | Restore default settings. 
 | 
| -c1 | Specifies first connection name. See examples. | 
| -c2 | Specifies second connection name. See examples. | 
| -b | Specifies base RSS processor number. See examples. | 
| -n | Specifies number of RSS processors. See examples. | 
| -st | Single stream traffic scenario. 
 Additionally, this option chooses the best processors to assign to: 
 | 
Examples
For example, if the adapter is represented by "Local Area Connection 6" and "Local Area Connection 7"
            
            For single port stream tuning type:
perf_tuning.exe -s -c1 "Local Area Connection 6" -c2 "Local Area Connection 7"
or to set one adapter only:
perf_tuning.exe -s -c1 "Local Area Connection 6"
For single stream tuning type:
perf_tuning.exe -st -c1 "Local Area Connection 6" -c2 "Local Area Connection 7"
or to set one adapter only:
perf_tuning.exe -st -c1 "Local Area Connection 6"
For dual port streams tuning type:
perf_tuning.exe -d -c1 "Local Area Connection 6" -c2 "Local Area Connection 7"
For forwarding streams tuning type:
perf_tuning.exe -f -c1 "Local Area Connection 6" -c2 "Local Area Connection 7"
For manual tuning of the first adapter to use RSS on CPUs 0-3:
perf_tuning.exe -m -c1 "Local Area Connection 6" -b 0 -n 4
In order to restore defaults type:
perf_tuning.exe -r -c1 "Local Area Connection 6" -c2 "Local Area Connection 7"
    To achieve best performance on SR-IOV VF, please run the following powershell commands on the host:
            
            Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -IovQueuePairsRequested 4
OR
Set-VMNetworkAdapter -Name "Network Adapter" -VMName vm1 -IovQueuePairsRequested 8
for 40GbE
    In order to improve live migration over SMB direct performance, please set the following registry key to 0 and reboot the machine:
            
            HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanServer\Parameters\RequireSecuritySignature