Introduction
Recent advances in Zero Trust and cloud-hosted cybersecurity have significantly enhanced data and network protection with their block by default, allow by exception approach. However, these security solutions are both negatively impacted by, and at the same time contribute to the massive increases in packet delay variation (PDV)— more commonly known as jitter in today’s network and application environments. It’s largely driven by:
- Today’s near real-time web, streaming, IoT, AR, VR, and AI applications that typically transmit data in random bursts.
- Cloud and edge environments that now host most applications, where competition for virtual and physical CPU, memory, storage and network resources creates random delays.
- Last-mile wireless networks impacted by fading and RF interference.
- Newer network technologies such as 5G, characterized by higher frequencies with poorer propagation characteristics, smaller cells, and clear path requirements that result in any obstacle causing signals to be deflected, refracted, or diffracted, leading to variation in packet delivery times.
Most importantly, jitter has a far more serious knock-on effect than random packet delays that add latency— just modest amounts of jitter can cause throughput to collapse and applications to stall. This occurs even when the network isn’t saturated and ample bandwidth is available. Frustrated users, reduced productivity, and lost revenue are the end results. The consequences can be devastating if critical systems are involved.
The source of jitter-induced throughput collapse lies in the fact that TCP, the most widely used network protocol, and the one that also predominates in public cloud services like AWS and MS Azure, consistently interprets jitter as a sign of congestion, even when it’s caused by other factors such as application behavior, virtualization, wireless network issues, or security processes like complex encryption/decryption algorithms and DPI. To prevent data loss, TCP responds by retransmitting packets and throttling traffic, even when ample bandwidth is available. As a result, throughput collapses regardless of actual network saturation. This not only affects TCP traffic, but also indirectly impacts UDP and other traffic sharing a network, as TCP’s response to jitter consumes more available bandwidth.
Jitter and Zero Trust
Zero Trust security solutions are particularly vulnerable to jitter because they operate on the principle that every access request could pose a threat. Adhering to this principle requires continuous authentication, authorization, frequent encryption/decryption, deep packet inspection(DPI), and analysis. Even small variations in packet delivery to these security processes can slow them down significantly, and the effect is cumulative.
Along with being affected by jitter, Zero Trust can add to it. The more complex the encryption protocols, the greater the variability in packet payload sizes and the higher the volume of traffic, the more jitter it generates, compounding jitter already present in the network. In some cases the impact on performance is so significant that organizations will turn off or limit some features such as DPI, or continuous monitoring, as well as slow roll their Zero Trust deployments.
Moreover, many organizations are moving to cloud-hosted Zero Trust security solutions managed by third-party vendors such as MS Azure Active Directory, Google BeyondCorp, and Okta. While this shift reduces the need for extensive on-premises security infrastructure, it also introduces jitter from resource competition in virtualized environments. Additionally, many Zero Trust solutions rely on Cloud-Native Functions (CNFs) comprised of distributed application components. By distributing CNF components across cloud and edge, security policies can be enforced closer to users, minimizing latency and improving response times for authentication, encryption, and policy enforcement. However, this approach also introduces jitter and latency due to the multiple network hops between cloud and edge environments hosting CNF components, and their varying processing speeds.
Why Most Network Performance Solutions Fall Short
Throughput collapse in response to jitter is triggered in the network transport layer (layer 4 of the OSI stack) by TCP’s congestion control algorithms (CCAs). However, the standard network performance solutions most administrators turn to, including those integrated with many Zero Trust and other cybersecurity solutions, generally don’t operate at the transport layer. When they do, they do little or nothing to address jitter’s impact on throughput, and sometimes make it worse:
- Jitter Buffers – Jitter buffers work at the application layer (layer7) by reordering packets and realigning packet timing to adjust for jitter before packets are sent to an application. This this may work for some applications, but packet reordering and realignment create random delays that can ruin performance for real-time applications and create more jitter.
- Bandwidth Upgrades – Bandwidth upgrades are a physical layer 1 solution that only works in the short run, because the underlying problem isn’t addressed. Traffic increases to the capacity of the added bandwidth, and the incidence of jitter-induced throughput collapse goes up in tandem, leading to another round of upgrades.
- SD-WAN – There’s a widespread assumption that SD-WAN can optimize performance merely by choosing the best available path among broadband, LTE, 5G, MPLS, broadband, Wi-Fi or any other available link. The problem is SD-WAN makes decisions based on measurements at the edge, but has no control beyond it. What if all paths are bad?
- QoS techniques – Often implemented in conjunction with advanced networking solutions, these include: packet prioritization; traffic shaping to smooth out traffic bursts and control the rate of data transmission for selected applications and users; and resource reservation to reserve bandwidth for high priority applications and users. But performance tradeoffs are made, and QoS does nothing to alter TCP’s behavior in response to jitter. In some cases QoS adds jitter, because techniques such as packet prioritization will create variable delays for lower priority traffic.
- TCP Optimization – Focuses on the CCAs at layer 4 by increasing the size of the congestion window, using selective ACKs, adjusting timeouts, etc. However, improvements are limited, generally in the range of 10-15%. The reason is that these solutions like all the others, don’t address the fundamental problem of how TCP’s CCAs consistently respond to jitter.
Apparently, this isn’t an easy problem to overcome. Even MIT Research recently cited TCP’s CCAs as having a significant and growing negative impact on network performance because of their response to jitter, but couldn’t offer a practical solution.1
The CCAs would have to be modified or replaced to remove the bottleneck they create by their inability to differentiate between jitter caused by congestion versus other factors. However, to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications that rely on it. It must also co-exist with ADCs, SD-WANs, VPNs and other network infrastructure already in place.
The Only Proven Solution and Cost-Effective Solution
Only Badu Networks’ patented WarpEngineTM carrier-grade optimization technology meets the key requirements outlined above. WarpEngine’s single-ended proxy architecture requires no modifications to client or server applications, network stacks, or existing network infrastructure, so there’s no rip-and-replace. WarpEngine determines in real-time whether jitter is due to congestion, and prevents throughput from collapsing when it’s not. As a result, bandwidth that would otherwise be wasted is recaptured. WarpEngine also includes other performance-enhancing features like optimized flow control, ensuring no session bandwidth is wasted, benefiting not only TCP, but also GTP (used by 5G and LTE), UDP and other traffic. These capabilities enable WarpEngine to deliver massive network throughput improvements ranging from 2-10x or more for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes.2 In large part, these results reflect the huge impact jitter now has on network performance, and the lack of alternatives to overcome it. In addition, WarpEngine delivers these improvements with existing network infrastructure, at a fraction of the cost of upgrades.
WarpEngine is available in three different form factors depending on the implementation environment:
- A hardware appliance for deployment at cell tower base stations, access points supporting Wi-Fi networks of any scale, in front of hundreds or thousands of servers in a corporate or cloud data center, or carrier’s core network.
- As software installed on servers provided by customers and partners.
- A VM form factor designed specifically for cloud and virtualized edge environments, making it ideal for cloud-hosted Zero Trust and other cybersecurity solutions. It works equally well with either VM or container-based network functions.
WarpEngine’s VM form factor branded as WarpVMTM , installs in minutes in AWS, Azure, VMWare, or KVM environments. WarpVM has also been certified by NutanixTM for use with their multi-cloud platform, showing similar performance numbers to those cited above.3 In addition, WarpVM performance and compatibility tests have also been conducted with F5’s Big-IP – the foundation for F5’s Advanced WAF (web application firewall) and can integrate with F5 Distributed Cloud Services, including ZTNA (Zero Trust Network Access). When combined with WarpVM, Big-IP’s performance improved by over 3X in the AWS environment used for testing.
As these proof points demonstrate, WarpVM can be deployed with cloud-hosted cybersecurity and Zero Trust solutions to boost performance for a small fraction of the cost of cloud network and server upgrades as implementations grow. For example, WarpVM has been shown to improve AWS Direct Connect throughput by more than 3X for over 80% less than the cost of Direct Connect upgrades to achieve the same result. The savings are actually much greater when the additional AWS servers required to support a standard Direct Connect upgrade are taken into account. WarpVM can also be deployed in the public and private cloud and edge environments customers and partners use for other applications to deliver the same benefits.
Conclusion
As cloud-hosted cybersecurity and Zero Trust solutions become more widespread, their performance challenges—particularly jitter—will escalate. WarpVM, the VM form factor of WarpEngine’s patented optimization technology overcomes these challenges in cloud and edge environments without expensive network and server upgrades, or other disruptive changes to existing infrastructure. In its software and hardware appliance form factors, WarpEngine can do the same for non-virtualized environments. Don’t sacrifice performance for security.
Unlock the full potential of your Zero Trust and cybersecurity solutions—and the networks and applications that depend on them—by requesting a 15-day free trial today.
Notes
- Starvation in End-to-End Congestion Control, August 2022: https://people.csail.mit.edu/venkatar/cc-starvation.pdf
- Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf
- Nutanix Technology Partners: https://www.nutanix.com/partners/technology-alliances/badu-networks