Introduction

Recent developments in networking such as AI-driven optimization, Multi-Cloud Networking (MCN), 5G-Advanced, and Zero Trust, promise major improvements in performance, scalability, and security. However, these technologies are also affected by massive increases in packet delay variation (PDV) or jitter in today’s network environments.   Real-time and near real-time applications such as IoT, AR, VR, AI and streaming video send data in unpredictable bursts, creating jitter. When these applications compete for virtual and physical resources in the cloud environments that host them, random delays in packet transmission result. This creates virtualization jitter that compounds jitter caused by their irregular data transmissions. Last-mile mobile and Wi-Fi networks subject to RF interference and fading add still more jitter, impacting the entire network path between the user’s device and the server hosting their application.

Most importantly, jitter has a far more serious knock-on effect than random delays that add latency – just modest amounts of jitter can cause throughput to collapse and applications to stall. This knock-on effect is increasingly being felt, and its impact can be devastating if critical systems are involved. Ironically, new network technologies like those cited above also produce jitter, partially undermining their benefits.

The source of jitter-induced throughput collapse lies in the fact that TCP, the most widely used network protocol, and one that’s also predominant in public cloud services like AWS and MS Azure, consistently interprets jitter as a sign of congestion, even when it’s caused by other factors like those discussed above. To prevent data loss, TCP responds by retransmitting packets and throttling traffic, even when ample bandwidth is available. As a result, throughput collapses and applications stall, regardless of actual network saturation. This not only affects TCP traffic, but also indirectly impacts UDP and other traffic sharing a network regardless of protocol, as TCP’s response to jitter consumes available bandwidth

In this blog, we’ll explore the benefits of each of the latest networking advancements, how each one is not only affected by, but also contributes to jitter, and why conventional performance solutions fall short in addressing jitter’s impact on throughput.  Finally, we’ll look at an approach to overcoming jitter-induced throughput collapse that’s far more effective and less costly than network and server upgrades, or any of the other performance solutions most administrators turn to.

 AI-Driven Network Optimization: Vulnerable to Jitter

AI-driven network optimization performs real-time analysis of packet data, flow data, metadata and log files from network devices like routers, switches, and firewalls. Based on this analysis, network configurations are automatically adjusted. This reduces the need for error-prone manual intervention, optimizes resource use, and improves network performance and security.

However, AI networking’s reliance on real-time data makes its analysis highly susceptible to jitter. Minor fluctuations can distort real-time data, causing AI systems to misinterpret network conditions, resulting in suboptimal configuration changes and degraded performance. Moreover, aside from components deployed on network devices at customer sites, AI networking solutions are largely cloud-hosted, making them subject to virtualization jitter. 

Furthermore, AI-driven systems can add jitter when they dynamically adjust network parameters in real-time. By doing so, they can rapidly changing traffic routing and resource allocation, introducing variability in packet transmission times.

Multi-Cloud and Hybrid Cloud Networking (MCN): Flexibility at the Cost of Stability

MCN enables organizations to connect and manage multiple public and private clouds, as well as edge environments by creating a virtual network fabric across them. A cloud-hosted management control plane is provided on top of the MCN virtual network fabric for centralized administration. In addition, MCN delivers flexibility and scalability by implementing network functions like routers, switches, firewalls, and load balancers as cloud-hosted virtual appliances, reducing the need for specialized hardware.

However, MCN’s reliance on the cloud also makes it subject to virtualization jitter. In busy cloud environments, applications compete for CPU, memory, storage, and network resources, creating random delays in packet transmission. The variability between cloud environments—with different latencies, routing paths, and congestion levels amplifies these random delays. 

Furthermore, Cloud-Native Network Functions (CNFs) built on distributed, containerized microservices architectures are increasingly replacing traditional VM-based virtualized network functions (VNFs).  This means that instead of having all the functionality required to support a virtual router or other network device running entirely within a VM, the set of microservices needed to complete a network function may be distributed across containers running at multiple cloud and edge locations. Although CNFs offer modularity, and move some data and processing to the edge to improve response times and reduce bandwidth usage, network hops between cloud and edge environments also go up, adding latency and jitter

5G-Advanced: New Benefits, Same Old Problems?

5G Advanced builds on 5G’s support for high volume low-latency applications with several key improvements:

  • Higher Frequencies: Utilizes additional frequency bands, such as millimeter-wave (mmWave) and sub-terahertz (sub-THz) bands, to increase capacity and data throughput. These bands typically range from 24 GHz to beyond 100 GHz.
  • Enhanced Network Slicing: Enables more dynamic and efficient resource allocation by creating virtual network slices tailored to specific application and user requirements.
  • Improved Cloud and Edge Support: Reduces latency and improves responsiveness by relying heavily on Cloud-Native Functions (CNFs), decreasing reliance on dedicated hardware.
  • New AI/ML integration APIs to support AI Network Optimization.

Despite its enhancements, 5G-Advanced inherits 5G’s jitter-prone architecture, characterized by:

  • Higher Frequency Challenges: While higher frequencies like mmWave provide higher data rates, they suffer from poor propagation, making them more susceptible to interference and signal degradation.
  • Line-of-Sight Requirements: 5G signals often require a clear path; obstacles can cause reflections, refractions, or diffractions, resulting in multiple signal paths with varying lengths and causing packet delivery variations.
  • Dense Deployment and Frequent Handoffs: The dense deployment of small cells improves capacity and coverage but leads to more frequent handoffs and base station switching, especially in urban areas, which contributes to jitter.

Both 5G and 5G-Advanced support lower frequencies (below 6 GHz) that allow for larger cells with greater coverage, and penetrate obstacles more effectively, making them less jitter-prone. However, lower frequencies offer less bandwidth, resulting in reduced capacity and slower data rates, which may not meet the demands of today’s high volume ultra-low latency applications.

Additionally, 5G-Advanced’s increased reliance on CNFs running in virtualized cloud and edge environments introduces virtualization jitter. The dynamic allocation of bandwidth slices through network slicing to meet requirements for specific applications and users can also cause latency variations, further contributing to jitter. Finally, given the jitter-related issues surrounding AI Networking Optimization discussed earlier, AI integration with 5G-Advanced will become another contributor to jitter in 5G networks.

Zero Trust Security: Strengthening Protection, Weakening Performance

Zero trust security operates on the principle that every access request could pose a threat, necessitating continuous authentication, authorization, packet inspection, and analysis. Supporting this requires frequent encryption and decryption, delaying network traffic. Any Jitter present in the network environment will amplify these delays.

Moreover, many organizations are moving to cloud-hosted zero trust security solutions managed by third-party vendors such as MS Azure Active Directory, Google BeyondCorp, and Okta. While this shift can reduce the need for extensive on-premises security infrastructure, it also adds virtualization jitter.

Additionally, zero trust security can contribute to network jitter. Complex encryption protocols, higher traffic volumes, and major differences in payload sizes can lead to significant variation in packet delivery times, compounding jitter from other sources.

Why Most Network Performance Solutions Fail to Address Jitter’s Impact on Throughput

Before any of the technologies discussed above can achieve their full potential, jitter’s negative impact on their performance, as well as its impact on network performance in general must be overcome..

Jitter-induced throughput collapse is triggered in the network transport layer (layer 4 of the OSI stack) by TCP’s congestion control algorithms (CCAs). These algorithms have no ability to determine whether jitter is due to actual congestion or other factors such as application behavior, virtualization, or wireless network issues.  However, the standard approaches network administrators turn to, including those that AI Networking solutions also employ for improving network performance, generally don’t operate at the transport layer. When they do, they do little or nothing to address jitter-induced throughput collapse, and sometimes make it worse:

  • Jitter Buffers –  Jitter buffers work at the application layer (layer7) by reordering packets and realigning packet timing to adjust for jitter before packets are passed to an application. While this may work for some applications, packet reordering and realignment creates random delays that can ruin performance for real-time applications and create more jitter.
  • Bandwidth Upgrades –  Bandwidth upgrades are a physical layer 1 solution that only works in the short run, because the underlying problem of jitter-induced throughput collapse isn’t addressed. Traffic increases to the capacity of the added bandwidth, and the incidence of jitter-induced throughput collapse goes up in tandem.
  • SD-WAN – There’s a widespread assumption that SD-WAN can optimize performance merely by choosing the best available path among broadband, LTE, 5G, MPLS, broadband, Wi-Fi or any other available link. The problem is SD-WAN makes decisions based on measurements at the edge, but has no control beyond it. What if all paths are bad?
  • QoS techniques – Often implemented in conjunction with advanced networking solutions, these include: packet prioritization; traffic shaping to smooth out traffic bursts and control the rate of data transmission for selected applications and users; and resource reservation to reserve bandwidth for high priority applications and users. But performance tradeoffs are made, and QoS does nothing to alter TCP’s behavior in response to jitter. In some cases, implementing QoS adds jitter, because techniques such as packet prioritization will create variable delays for lower priority traffic.
  • TCP Optimization – Focuses on the CCAs at layer 4 by increasing the size of the congestion window, using selective ACKs, adjusting timeouts, etc. However, improvements are limited, generally in the range of 10-15%. The reason is that these solutions like all the others, don’t address the fundamental problem of how TCP’s CCAs consistently respond to jitter.

Apparently, jitter-induced throughput collapse isn’t an easy problem to solve. MIT Research recently cited TCP’s CCAs as having a significant and growing negative impact on network performance because of their response to jitter, but offered no practical solution.1   

TCP’s CCAs would have to be modified or replaced to remove the bottleneck they create due to their inability to differentiate between jitter caused by congestion versus other factors. However, to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications that rely on it. It must also co-exist with ADCs, SD-WANs, VPNs and other network infrastructure already in place

The Only Proven Solution and Cost-Effective Solution  

Only Badu Networks’ patented WarpEngineTM carrier-grade optimization meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine’s single-ended proxy architecture means no modifications to client or server applications or network stacks are required. It works with existing network infrastructure, so there’s no rip-and-replace. WarpEngine determines in real-time whether jitter is due to congestion, and prevents throughput from collapsing and applications from stalling when it’s not. As a result, bandwidth that would otherwise be wasted is recaptured. WarpEngine builds on this with other performance and security enhancing features that benefit not only TCP, but also GTP used by 5G, UDP and other traffic. These capabilities enable WarpEngine to deliver massive network throughput improvements ranging from 2-10x or more for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes.2 These results also reflect the huge impact jitter has on network performance, and the lack of alternatives to effectively deal with it. In addition, WarpEngine delivers these improvements with existing network infrastructure, at a fraction of the cost of upgrades. 

WarpEngine can be deployed at core locations as well as the network edge as a hardware appliance, or as software installed on servers provided by customers and partners. It can be installed in a carrier’s core network, or in front of hundreds or thousands of servers in a corporate or cloud data center. WarpEngine can also be deployed at cell tower base stations, or with access points supporting public or private Wi-Fi networks of any scale. MCN and AI networking vendors can integrate it into their solutions without any engineering effort. 

WarpVMTM, the VM form factor of WarpEngine, is designed specifically for cloud and edge environments. WarpVM installs in minutes in AWS, Azure, VMWare, or KVM environments. In addition, WarpVM has been certified by NutanixTM , for use with their multi-cloud platform, which includes Nutanix’s Flow Networking MCN solution. Performance tests conducted during Nutanix’s certification process demonstrated results similar to those cited above.3  Performance and compatibility tests have also been conducted with F5’s Big-IP – the foundation for F5’s Big-IP Cloud Edition MCN solution. When combined with WarpVM, Big-IP’s performance was boosted by over 3X in the cloud environment used for the tests.

As these proof points demonstrate, MCN and AI networking vendors can install WarpVM in cloud environments hosting their solutions to boost performance for a competitive edge, and dramatically reduce the cost of cloud network and server upgrades as they grow their install base.  For example, WarpVM has been shown to improve AWS Direct Connect throughput by 3X for over 80% less than the cost of cloud network and server upgrades to achieve the same result.  WarpVM can also be deployed in the public and private cloud environments their customers use for other applications to achieve many of the same benefits, for a small fraction of the cost of cloud network and server upgrades.

In addition, MCN and AI networking vendors can offer WarpEngine to their enterprise customers to deploy on-premises with their Wi-Fi access points, or at the edge of their networks between the router and firewall for dramatic WAN, broadband or FWA throughput improvements. 

Conclusion

As AI, IoT, AR, VR and other high-traffic, ultra-low-latency applications increasingly rely on the cloud and emerging network technologies, jitter-related challenges will intensify. By integrating WarpEngine or WarpVM with advanced solutions like MCN, AI-driven networking, 5G-Advanced, and zero trust security, organizations can overcome these challenges. And they can do so at a fraction of the cost and without the disruption of traditional network and server upgrades that only provide a short-term fix. Unlike other solutions that fail to address the root cause – and can even contribute to it – WarpEngine and WarpVM unlock the full potential of your existing network infrastructure, delivering the performance, reliability and security that today’s applications demand.

To learn more and request a free trial, click the button below.

Notes 

  1. Starvation in End-to-End Congestion Control, August 2022:    https://people.csail.mit.edu/venkatar/cc-starvation.pdf
  2. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf
  3. Nutanix Technology Partners: https://www.nutanix.com/partners/technology-alliances/badu-networks