NVIDIA Docs Hub NVIDIA Networking Accelerator Software NVIDIA Messaging Accelerator (VMA) Documentation Rev 9.6.4 Introduction to VMA

Introduction to VMA

VMA Overview

NVIDIA® Messaging Accelerator (VMA) library is a network-traffic offload, dynamically-linked user-space Linux library which serves to transparently enhance the performance of socket-based networking-heavy applications over an Ethernet network. VMA has been designed for latency-sensitive and throughput-demanding, unicast, and multicast applications. VMA can be used to accelerate producer applications and consumer applications and enhance application performance by orders of magnitude without requiring any modification to the application code.

The VMA library accelerates TCP and UDP socket applications, by offloading traffic from the user-space directly to the network interface card (NIC) or Host Channel Adapter (HCA), without going through the kernel and the standard IP stack (kernel-bypass). VMA increases overall traffic packet rate, reduces latency, and improves CPU utilization.

Basic Features

The VMA library utilizes the direct hardware access and advanced polling techniques of RDMA-capable network cards. Utilizing Ethernet’s direct hardware access enables the VMA kernel bypass, which causes the VMA library to bypass the kernel’s network stack for all IP network traffic transmit and receive socket API calls. Thus, applications using the VMA library gain many benefits, including:

Reduced context switches and interrupts, which result in:
- Lower latencies
- Improved CPU utilization
Minimal buffer copies between user data and hardware – VMA needs only a single copy to transfer a unicast or multicast offloaded packet between hardware and the application’s data buffers.

Target Applications

Good application candidates for VMA include, but are not limited to:

Fast transaction-based network applications requiring a high rate of request-response type operations over TCP or UDP unicast, such as a Market Data Order Gateway application working with an exchange.
Market-data feed-handler software that produces and consumes multicast data feeds, such as Wombat WDF and Reuters RMDS, or any home-grown feed handlers.
Any other applications that make heavy use of multicast or unicast that require any combination of the following:
- Higher Packets per Second (PPS) rates than with kernel
- Lower data distribution latency
- Lower CPU utilization by the multicast consuming/producing application in order to support further application scalability

Advanced VMA Features

The VMA library provides several significant advantages:

The underlying wire protocol used for the unicast and multicast solution is standard TCP and UDP IPv4, which is interoperable with any TCP/UDP/IP networking stack. Thus, the opposite side of the communication can be any machine with any OS, and can be located on an Ethernet network

Warning

VMA uses a standard protocol that enables an application to use the VMA for asymmetric acceleration purposes. A “TCP server side” only application, a “multicast consuming” only or “multicast publishing” only application can leverage this, while remaining compatible with Ethernet peers.
Kernel bypass for unicast and multicast transmit and receive operations. This delivers much lower CPU overhead since TCP/IP stack overhead is not incurred
Reduced number of context switches. All VMA software is implemented in user space in the user application’s context. This allows the server to process a significantly higher packet rate than would otherwise be possible
Minimal buffer copies. Data is transferred from the hardware (NIC/HCA) straight to the application buffer in user space, with only a single intermediate user space buffer and zero kernel IO buffers
Fewer hardware interrupts for received/transmitted packets
Fewer queue congestion problems witnessed in standard TCP/IP applications
Supports legacy socket applications – no need for application code rewrite
Maximizes Messages per second (MPS) rates
Minimizes message latency
Reduces latency spikes (outliers)
Lowers the CPU usage required to handle traffic

On This Page

Introduction to VMA

VMA Overview

Basic Features

Target Applications

Advanced VMA Features