Introduction to VMA
NVIDIA® Messaging Accelerator (VMA) library is a network-traffic offload, dynamically-linked user-space Linux library which serves to transparently enhance the performance of socket-based networking-heavy applications over an Ethernet network. VMA has been designed for latency-sensitive and throughput-demanding, unicast, and multicast applications. VMA can be used to accelerate producer applications and consumer applications and enhance application performance by orders of magnitude without requiring any modification to the application code.
The VMA library accelerates TCP and UDP socket applications, by offloading traffic from the user-space directly to the network interface card (NIC) or Host Channel Adapter (HCA), without going through the kernel and the standard IP stack (kernel-bypass). VMA increases overall traffic packet rate, reduces latency, and improves CPU utilization.
The VMA library utilizes the direct hardware access and advanced polling techniques of RDMA-capable network cards. Utilizing Ethernet’s direct hardware access enables the VMA kernel bypass, which causes the VMA library to bypass the kernel’s network stack for all IP network traffic transmit and receive socket API calls. Thus, applications using the VMA library gain many benefits, including:
Reduced context switches and interrupts, which result in:
Lower latencies
Improved CPU utilization
Minimal buffer copies between user data and hardware – VMA needs only a single copy to transfer a unicast or multicast offloaded packet between hardware and the application’s data buffers.
Good application candidates for VMA include, but are not limited to:
Fast transaction-based network applications requiring a high rate of request-response type operations over TCP or UDP unicast, such as a Market Data Order Gateway application working with an exchange.
Market-data feed-handler software that produces and consumes multicast data feeds, such as Wombat WDF and Reuters RMDS, or any home-grown feed handlers.
Any other applications that make heavy use of multicast or unicast that require any combination of the following:
Higher Packets per Second (PPS) rates than with kernel
Lower data distribution latency
Lower CPU utilization by the multicast consuming/producing application in order to support further application scalability
The VMA library provides several significant advantages:
The underlying wire protocol used for the unicast and multicast solution is standard TCP and UDP IPv4, which is interoperable with any TCP/UDP/IP networking stack. Thus, the opposite side of the communication can be any machine with any OS, and can be located on an Ethernet network
WarningVMA uses a standard protocol that enables an application to use the VMA for asymmetric acceleration purposes. A “TCP server side” only application, a “multicast consuming” only or “multicast publishing” only application can leverage this, while remaining compatible with Ethernet peers.
Kernel bypass for unicast and multicast transmit and receive operations. This delivers much lower CPU overhead since TCP/IP stack overhead is not incurred
Reduced number of context switches. All VMA software is implemented in user space in the user application’s context. This allows the server to process a significantly higher packet rate than would otherwise be possible
Minimal buffer copies. Data is transferred from the hardware (NIC/HCA) straight to the application buffer in user space, with only a single intermediate user space buffer and zero kernel IO buffers
Fewer hardware interrupts for received/transmitted packets
Fewer queue congestion problems witnessed in standard TCP/IP applications
Supports legacy socket applications – no need for application code rewrite
Maximizes Messages per second (MPS) rates
Minimizes message latency
Reduces latency spikes (outliers)
Lowers the CPU usage required to handle traffic