Beyond Multi-Cloud Networking and AIOps: What Today’s Networks Really Need

Introduction

In a recent survey cited by Forbes, senior IT professionals were asked to name their top priority. 90% responded that it was having a secure, virtualized network spanning their enterprise data centers, edge locations, cloud services and telco providers, with a single management control plane. 31% rated having this in place as critical.¹

The survey identified three major trends driving these results:

Organizations are moving to distributed multi-cloud and AI applications, and need to connect applications and data across multiple cloud and edge environments.
Network operators – whether enterprise, cloud, or telco –want a way to connect and manage diverse network connectivity as if they were managing a single network. Telco providers are especially interested, because they see opportunities for their global networks to become more than just “dumb pipes” by enabling new integration services on top of them.
Network managers are struggling with a fragmented collection of networking silos – they need better data, visibility, and cost control across all their networks.

Hence the rise of Multi-Cloud Networking (MCN) vendors such as Aviatrix, Alkira, Arrcus, Graphiant, Itential, Prosimo, and Versa Networks. In addition, we’re seeing acquisitions like Cloudflare’s recent purchase of Nefeli in March of this year. Also, legacy vendors including Arista, Cisco, F5, HPE, IBM, Juniper, and VMware (acquired by Broadcom) are rapidly moving into the MCN space with new product offerings and acquisitions.

While the term ‘MCN’ implies these solutions are entirely cloud-based, this isn’t the case. MCN creates a virtual network fabric that spans multiple public and private clouds as well as on-premises and edge environments, with some MCN components installed on network devices at the edge. The portion that is entirely cloud-based is the management control plane that sits on top of the virtual network, allowing for centralized management across the physical networks that underlie it.

Automating Multi-Cloud Networking with AI

The complexity of managing multiple network environments is compounded by the wider range of traffic types, more extensive as well as unpredictable data flows, and greater security threats networks now handle. These challenges are primarily driven by several factors:

The real-time web, streaming, IoT, AR, VR, and more recently generative AI applications that typically transmit data in random bursts.
The virtualized cloud and edge environments that now host most applications, where competition between applications for virtual and physical CPU, memory, storage and network resources creates random delays.
The distributed containerized microservices architectures applications increasingly rely on that generate additional network hops between cloud and edge.
Last-mile wireless networks impacted by fading and RF interference
Newer network technologies such as 5G, characterized by smaller cells, poorer propagation characteristics, and clear path requirements resulting in any obstacle causing signals to be deflected, refracted, or diffracted, leading to variation in packet delivery times.

MCN vendors, Network as a Service (NaaS) vendors, and IT organizations that manage their own infrastructure have turned to AI Networking, a subset of AIOps, to handle the increasing variability in network traffic from the sources mentioned above. AI Networking solutions monitor the real-time packet data, flow data, and metadata about the communication between routers, switches, firewalls, access points and other network devices, as well as the log files from these devices. Then, based on an AI analysis of the data, configuration changes are made automatically to avoid performance bottlenecks and protect against cyber threats. Leading solutions in the AI Networking space include: Juniper Mist, Cisco DNA, HPE Aruba Networking, and IBM Watson AIOps. Like the MCN and NaaS solutions that often include them, AI Networking solutions are primarily cloud-hosted, with components installed on devices at the network edge as needed. Some vendors deliver pre-packaged integration between their MCN and AI Networking solutions, as in the case of Juniper Contrail Enterprise MultiCloud’s integration with Mist AI. In some cases, multi-cloud and MCN vendors partner with AI Networking vendors. For example, Juniper and Nutanix have partnered to make Juniper Mist available with Nutanix’s multi-cloud platform, and Nutanix’s Flow Networking MCN solution.

A Leading Cause of Poor Network Performance Still Isn’t Addressed

Even with the powerful combination of MCN and AI Networking, a significant challenge remains unmet. The increasingly massive and unpredictable data flows previously mentioned that characterize today’s network traffic—driven by factors like application behavior, virtualization, wireless networks, and new technologies such as 5G—lead to packet delay variation (PDV), more commonly referred to as jitter. And it has a far more serious knock-on effect than mere latency caused by random delays in packet delivery.

TCP, the network protocol widely used by applications that require guaranteed packet delivery, and public cloud services such as AWS and MS Azure, consistently treats jitter as a sign of congestion. To prevent data loss, TCP responds by retransmitting packets and throttling traffic, even when plenty of bandwidth is available. Just modest amounts of jitter can cause throughput to collapse and applications to stall. This occurs even when the network isn’t saturated and plenty of bandwidth is available, adversely impacting not only TCP, but also UDP and other non-TCP traffic sharing a network.

Throughput collapse in response to jitter is triggered in the network transport layer (layer 4) by TCP’s congestion control algorithms (CCAs). TCP’s CCAs have no ability to determine whether jitter is due to actual congestion or other factors such as application behavior, virtualization, or wireless network issues. However, the standard recommended approaches that network administrators turn to, which AI Networking solutions also employ for improving network performance don’t operate at the transport layer, or if they do, they do little or nothing to address jitter-induced throughput collapse, and sometimes make it worse:

Jitter Buffers – Jitter buffers work at the application layer (layer7) by reordering packets and realigning packet timing to adjust for jitter before packets are passed to an application. While this may work for some applications, packet reordering and realignment creates random delays that can ruin performance for real-time applications and create more jitter.
Bandwidth Upgrades – Bandwidth upgrades are a physical layer 1 solution that only works in the short run, because the underlying problem of jitter-induced throughput collapse isn’t addressed. Traffic increases to the capacity of the added bandwidth, and the incidence of jitter-induced throughput collapse goes up in tandem.
SD-WAN – There’s a widespread assumption that SD-WAN can optimize performance merely by choosing the best available path among broadband, LTE, 5G, MPLS, broadband, Wi-Fi or any other available link. The problem is SD-WAN makes decisions based on measurements at the edge, but has no control beyond it. What if all paths are bad?
QoS techniques – Often implemented in conjunction with SD-WAN, these include: packet prioritization; traffic shaping to smooth out traffic bursts and control the rate of data transmission for selected applications and users; and resource reservation to reserve bandwidth for high priority applications and users. But performance tradeoffs are made, and QoS does nothing to alter TCP’s behavior in response to jitter. In some cases, implementing QoS adds jitter, because the techniques it uses such as packet prioritization can create variable delays for lower priority traffic.
TCP Optimization – Focuses on the CCAs at layer 4 by increasing the size of the congestion window, using selective ACKs, adjusting timeouts, etc. However, improvements are limited, generally in the range of 10-15%. The reason is that these solutions like all the others, don’t address the fundamental problem of how TCP’s CCAs consistently respond to jitter.

Apparently, jitter-induced throughput collapse is not an easy problem to overcome. MIT Research recently cited TCP’s CCAs as having a significant and growing impact on network performance because of their response to jitter, but offered no practical solution.²

Jitter-induced throughput collapse can only be resolved by modifying or replacing TCP’s congestion control algorithms to remove the bottleneck they create. However, to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications. It must also co-exist with ADCs, SD-WANs, VPNs and other network infrastructure already in place.

The Only Proven and Cost-Effective Solution

Only Badu Networks’ patented WarpEngine^TM carrier-grade optimization meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine’s single-ended proxy architecture means no modifications to client or server applications or network stacks are required. It works with existing network infrastructure, so there’s no rip-and-replace. WarpEngine determines in real-time whether jitter is due to congestion, and prevents throughput from collapsing and applications from stalling when it’s not. As a result, bandwidth that would otherwise be wasted is recaptured. WarpEngine builds on this with other performance and security enhancing features that benefit not only TCP, but also GTP, UDP and other traffic. These capabilities enable WarpEngine to deliver massive network throughput improvements ranging from 2-10x or more for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes.³ These results also reflect the huge impact jitter has on network performance, and the lack of alternatives to effectively deal with it. In addition, WarpEngine delivers these improvements with existing network infrastructure, at a fraction of the cost of upgrades.

WarpEngine can be deployed at core locations as well as the network edge as a hardware appliance, or as software installed on the server provided by the customer/partner. It can be installed in a carrier’s core network, or in front of hundreds or thousands of servers in a corporate or cloud data center. WarpEngine can also be deployed at cell tower base stations, or with access points supporting public or private Wi-Fi networks of any scale. AI networking vendors can integrate it into their solutions without any engineering effort. They can offer WarpEngine to their enterprise customers to deploy on-prem with their Wi-Fi access points, or at the edge of their networks between the router and firewall for dramatic WAN, broadband or FWA throughput improvements.

WarpVM^TM, the VM form factor of WarpEngine, is designed specifically for cloud and edge environments. WarpVM installs in minutes in AWS, Azure, VMWare, or KVM environments. In addition, WarpVM has been certified by Nutanix^TM , for use with their multi-cloud platform, which includes Nutanix’s Flow Networking MCN solution. Performance tests conducted during Nutanix’s certification process demonstrated results similar to those cited above.⁴ Performance and compatibility tests have also been conducted with F5’s Big-IP – the foundation for F5’s Big-IP Cloud Edition MCN solution. When combined with WarpVM, Big-IP’s performance was boosted by over 3X in the cloud environment used for the tests.

As these proof points demonstrate, MCN and AI networking vendors can install WarpVM in the cloud environments hosting their solutions to boost performance for a competitive edge, and avoid the cost of cloud network and server upgrades as they grow their install base. Their customers can also deploy WarpVM in the cloud environments they use for other applications to achieve many of the same benefits, for a small fraction of the cost of cloud network and server upgrades.

Conclusion

As AI, IoT, AR, VR and similar applications combine with 5G and other new network technologies to drive innovation, jitter-related performance challenges will only grow. MCN and AI Networking solutions can significantly help manage the complexity by creating an automated virtual network fabric overlaying the cloud, edge and telco provider network infrastructures that support these applications. However, like all other cloud and network performance solutions, they lack the ability to effectively deal with jitter’s impact and the massive hidden costs it imposes. WarpEngine is the only network optimization solution that tackles TCP’s reaction to jitter head-on at the transport layer, and incorporates other performance enhancing features that benefit not only TCP, but also GTP, UDP and other traffic. By deploying WarpEngine in combination with comprehensive MCN and AI Networking solutions, you can ensure your networks always operate at their full potential.

To learn more and request a free trial, click the button below.

Request Free Trial

Notes

1. Why Networking is Key to Multicloud and Hybrid Cloud, Forbes, October 26, 2023: https://www.forbes.com/sites/rscottraynovich/2023/10/26/why-networking-is-key-to-multicloud-and-hybrid-cloud/?sh=14ecd4384e21 ;

Futuriom Multicloud Networking and NaaS Survey, October 2023: https://www.futuriom.com/articles/news/2023-mulicloud-networking-and-naas-survey-report/2023/10

2. Starvation in End-to-End Congestion Control, August 2022: https://people.csail.mit.edu/venkatar/cc-starvation.pdf

3. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf

4. https://www.nutanix.com/partners/technology-alliances/badu-networks