High-Performance IPsec Offload with the Silicom IAONIC SmartNIC
Netadvia Research & Development
The Silicom IAONIC (Intel® architecture on NIC) SmartNIC (Intel codename Phantom Lake), based on the Intel® NetSec Accelerator reference design, is an exciting new technology that provides additional cost-efficient compute resources for existing server infrastructure. The Silicom IAONIC SmartNIC enables x86-based network applications to be offloaded without modification as it comes packed with the Intel® P5000 series Atom processor(either 8 or 16 cores), an integrated Intel® QuickAssist (QAT) engine and an optional Flexible Packet Processor (FPP).
In a previous article by Netadvia, Optimising Network Packet Processing using the Silicom IAONIC SmartNIC, it was discussed how one of the key use cases for the Silicom IAONIC SmartNIC is the offloading of network security functions. A very popular protocol used to enhance network security is IPsec. This is currently considered an extremely secure protocol for point-to-point encryption as it includes encryption of both the IP header and the payload. This article presents the results of our benchmarking into the performance of an IPsec implementation based on FD.io VPP (Vectorized Packet Processor), DPDK (Data Plane Development Kit) and the Intel® QuickAssist engine available on the 100Gbps version of the Silicom IAONIC SmartNIC.
What is IPsec and Why Offload to a SmartNIC?
IPsec, or Internet Protocol Security, is a suite of protocols that provides authentication, encryption, and data integrity for IP traffic. It is a widely used security protocol and is supported by a wide range of devices and software.
IPsec works by encapsulating IP packets in a new IP packet with additional security headers. The security headers contain information about the authentication and encryption algorithms used, as well as other security-related information.
The most common use of IPsec within organizations is to create secure VPNs between multiple networks. For example, an IPsec-based VPN can be used to connect remote offices, labs or remotely working employees.
There are a number of benefits to deploying an IPsec implementation to a SmartNIC. While routers that support IPsec are readily available, they are neither throughput nor latency optimised. By deploying IPsec to a SmartNIC, such as the Silicom IAONIC SmartNIC, the solution can be optimised for throughput performance and/or latency performance, as well as providing control and scalability of the IPsec solution across the network.
The test configuration used to benchmark IPsec throughput and latency performance on the Silicom IAONIC SmartNIC included a single HP ProLiant server along with two 100Gbps Silicom IAONIC SmartNICs connected back-to-back. Details of the physical hardware are as follows:
On the host server, CentOS 8.3 Linux was installed and the TRex network traffic generator was configured to use DPDK and therefore achieve maximum performance. A port was enabled in TRex for each Silicom IAONIC SmartNIC 100Gbps port. As far as the server is aware, these ports are presented to DPDK and the Operating System as standard network ports. Therefore, on the host, there are no modifications or custom drivers required.
Both Silicom IAONIC SmartNICs were also using CentOS 8.3 Linux and had identical configurations. VPP was installed and configured to use DPDK enabled network ports. IPsec encrypted tunnels were created between the SmartNICs using the VPP IPsec implementation. To accelerate the encryption and decryption of data, the onboard Intel QAT engine was enabled. The Flexible Packet Processor in each NIC was configured to forward network packets from the physical network port interface and host network interface to the virtual network interfaces in-use by VPP.
A number of benchmark tests were executed to identify the base network throughput and latency capabilities of the Silicom IAONIC SmartNIC. Benchmark tests were also developed to highlight the benefits of using the available QAT engine over VPPs software-based IPsec implementation.
Network Throughput Performance
The first benchmark was developed to identify the maximum possible network throughput using VPP’s hardware-accelerated IPsec implementation (QAT-enabled) deployed to the Silicom IAONIC SmartNICs. The benchmark test was executed across a range of IPsec tunnels (1 to 200 tunnels) to highlight the impact that an increasing number of tunnels might have on performance. It was expected that the performance would degrade as more tunnels were enabled due to the routing of packets across the increasing number of tunnels. The benchmark was executed using 3 different packets sizes (64, 512 and 1024 bytes) to provide a complete picture of the expected network throughput achievable with this use case. This benchmark was first executed on a single core to identify the base per-core throughput performance.
The results identified that the maximum achievable network throughput for small packet sizes (64-byte packets) was 1.9Gbps on a single core. This increased to 12.2Gbps for medium packet sizes (512-byte packets) whilst close to 25Gbps was achieved with larger packet sizes (1024-byte packets). It is important to note that this high level of network throughput performance was achieved without a single resource being consumed on the server host. What’s more impressive is that these results reflect performance of a single core on the Intel Silicom IAONIC SmartNIC.
The benchmark was then executed on 3 cores using a symmetric RSS (Receive Side Scaling) algorithm to load balance the network traffic across the enabled cores.
The results showed that, although the increase in network throughput performance was not exactly linear across cores (this is expected behaviour as the QAT engines are a shared resource), there was a large increase in the network throughput performance observed with just 3 cores enabled. As the Silicom IAONIC SmartNIC is available in both 8 core and 16 core models, there is significant bandwidth available to increase this throughput further, or to use the available cores for other filtering and/or monitoring related functions.
Network Latency Performance
The next benchmark was focused on network latency performance. In this configuration, 64-byte packets were transmitted at a rate of 1Gbps.
In this case, the latency increased with the number of IPsec tunnels as expected although still remained below an impressive 250us with 200 IPsec tunnels enabled.
IPsec Software vs QAT
The final benchmark was developed to highlight the network throughput performance when using QAT versus using the software-based IPsec implementation. QAT is not always available and so it’s availability on the Silicom IAONIC SmartNIC has the potential to provide a significant performance boost.
There was a clear performance benefit identified when using QAT, with the largest difference for larger 1024-byte packets. The IPsec software-only implementation achieved 5.6Gbps whist the QAT-enabled IPsec implementation resulted in close to five times that performance at just under 25Gbps.
The benchmark results obtained with IPsec deployed on the Silicom IAONIC SmartNIC were extremely impressive. The identified network throughput and latency performance metrics were achieved without consuming a single resource from the host server on which the SmartNICs were installed. As the Silicom IAONIC SmartNIC uses the Intel® P5000 series Atom processor, there was no modification whatsoever required for the VPP IPsec implementation to be configured, including configuration of the QAT engine.
The Silicom IAONIC SmartNIC truly appears to be a cost effective, efficient, and easy-to-deploy solution for offloading network applications whilst releasing host resources for other critical applications.
If you believe that you can benefit from our expertise in the integration and application of the Silicom IAONIC series of SmartNICs (based on the Intel® NetSec Accelerator reference design), please contact our Netadvia team.