New Cloud Services Like Amazon Bedrock are Enabling Gen AI’s iPhone Moment, but They Won’t be CheapHere's how you can dramatically reduce the cost of these new AI services, along with all your cloud costs

Introduction

Many industry observers believe AI is having its “iPhone moment” with the rapid rise of emerging large language models (LLMs) and generative AI (Gen AI) applications such as OpenAI’s ChatGPT. A big part of what sets Gen AI applications apart is the sheer number of parameters they manage. Some of these applications process billions, or even trillions of parameters due to the LLMs they use. As a result, Gen AI workloads require large clusters of very high-end servers equipped with thousands of GPUs, TPUs, and other accelerated processors between them. Moreover, the massive volume of traffic between these servers requires a data center-scale fabric built on non-standard network infrastructure, with support for technologies like Remote Direct Memory Access (RDMA). RDMA reduces performance overhead by enabling data to be copied from one server’s memory to another, completely bypassing the operating system’s network stack.

The cost to build and maintain the specialized server and network infrastructure to support Gen AI is enormous, and unique skillsets are required, making it impractical for all but the most well-funded AI vendors. The cost of GPUs alone is significant, with Nvidia A100 chips at $10,000 apiece. However, a recent development is making AI accessible to businesses of all sizes, enabling AI’s iPhone moment to become more of a reality: the advent of public cloud-based AI services such as Amazon Bedrock, Microsoft Azure AI and Google AI. These offerings are not without their costs, but they eliminate the huge expense of building and maintaining a separate infrastructure for hosting Gen AI platforms. Enterprise developers can use prebuilt configurations and models to test and deploy AI applications on a pay-as-you-go basis, with access to a virtually unlimited pool of resources to scale up or down as needed.

For example, Amazon Bedrock is a fully managed service for AWS users that makes foundation models (FMs) from leading AI companies available through a single API. FMs are very large machine learning models pre-trained on vast amounts of data. According to Amazon, the flexibility of their FMs makes them applicable to a wide range of use cases, powering everything from search to content creation to drug discovery.

AI’s Performance Challenges Go Beyond Massive Data Volumes

Even with new cloud services like Amazon Bedrock, the performance challenges go beyond the massive data volumes that characterize AI workloads. AI applications, and the virtualized environments they are typically deployed in generate enormous amounts of packet delay variation(PDV), more commonly referred to as jitter.

Gen AI models adapt in real-time to improve responses based on new data and interactions as they occur, leading to unpredictable changes in packet transmission rates. Moreover, many AI applications are comprised of containerized microservices distributed across multiple servers at cloud and edge locations. While it’s often desirable to move some data and processing to the edge to improve response times and reduce bandwidth usage, network hops between cloud and edge environments with varying processing speeds also go up. In addition, unpredictable bursts of traffic can result from the frequent synchronization of data models and configurations required between edge and cloud components to maintain consistency and reliability.

Jitter caused by AI application behavior is further compounded in virtualized environments when these applications compete for virtual and physical CPU, memory, storage and network resources in the cloud and virtualized edge environments that host them, causing random delays in packet transfers from the cloud to the network. The resulting virtualization jitter is further compounded by fading and RF interference from last-mile mobile and Wi-Fi networks that cloud vendors have no control over.

Another factor contributing to jitter is that AI applications frequently make use of 5G to take advantage of the high data volumes and low latency it supports. Unfortunately, 5G’s smaller cells, higher frequencies and mmWave technology have poorer propagation characteristics than LTE, causing signals to fade in and out. Moreover, 5G signals often require a clear line-of-sight path between transmitter and receiver. Any obstacle can cause signals to be reflected, refracted, or diffracted, resulting in multiple signal paths with different lengths and different transmission times, leading to variation in packet delivery times. 5G networks can use various technologies to address these sources of jitter, such as beamforming, and MIMO (Multiple Input Multiple Output), to improve signal quality and reduce the effects of multipath interference. However, these technologies only mitigate jitter’s impact, they don’t eliminate it.

Additionally, 5G’s small cell architecture has much heavier infrastructure requirements than LTE. This has driven many network providers to the cloud to reduce costs. However, the shift to cloud-native 5G networks adds virtualization jitter to jitter already generated due to 5G’s architecture.

Jitter’s Serious Knock-On Effect

Jitter has a far more serious knock-on effect on cloud network throughput and hosted application performance than random delays that add latency. This knock-on effect can render AI applications, especially those requiring real-time or near-real-time responsiveness virtually unusable, and even dangerous if critical systems are involved.

TCP, the network protocol widely used by public cloud services such as AWS and MS Azure consistently treats jitter as a sign of congestion. To guarantee packet delivery and prevent data loss, TCP responds to jitter by retransmitting packets and throttling traffic, even when plenty of bandwidth is available. Just modest amounts of jitter can cause throughput to collapse and applications to stall.

RDMA technologies avoid this issue because they bypass the TCP stack. However, RDMA requires non-standard network infrastructure as noted previously, and only works well at LAN-scale within the same data center. To extend RDMA’s capabilities beyond the LAN, iWARP (Internet Wide Area RDMA Protocol) can be used. However, iWARP encapsulates RDMA operations within standard TCP packets. Although this removes requirements for specialized network infrastructure, it makes iWARP traffic subject to TCP’s behavior.

TCP’s reaction to jitter has become a leading and increasingly common cause of poor network performance. For cloud users, it results in more than network bandwidth being wasted. Cloud vendors typically charge egress fees based on the amount of data transferred from their network. Whether it’s original data or packet retransmissions caused by TCP’s reaction to jitter, costs will be incurred.

Why Most Network Performance Solutions Fall Short

TCP’s response to jitter is triggered in the network transport layer by its congestion control algorithms (CCAs). However, the solutions network administrators typically use to address performance problems caused by jitter, like increasing bandwidth, and using jitter buffers, have little or no impact on the CCAs’ response to jitter, and in some cases make it worse. Increasing bandwidth is just a temporary fix; as network traffic grows to match the added bandwidth, the incidence of jitter-induced throughput collapse goes up in tandem, because the root cause hasn’t been addressed.

Jitter buffers, commonly used to mitigate jitter’s effect on network and application performance can sometimes make it worse. This is because jitter buffers reorder and realign packets for consistent timing before delivering them to an application. However, packet reordering and realignment introduces additional, often random delays, which can worsen jitter and negatively impact performance for real-time applications like live video streaming.

QoS techniques offer some benefit by prioritizing packets and controlling the rate of data transmission for selected applications and users. But performance tradeoffs will be made, and QoS does nothing to alter TCP’s behavior in response to jitter. In some cases, implementing QoS adds jitter because packet prioritization can create variable delays for lower priority application traffic.

TCP optimization solutions that do focus on the CCAs rely on techniques such as increasing congestion window size, using selective ACKs, adjusting timeouts, etc. However, improvements are limited, generally in the range of 10-15%, because these solutions like all the others don’t address the fundamental problem –TCP’s CCAs have no ability to determine whether jitter is due to congestion, or other factors like application behavior, virtualization, or wireless network issues.

Apparently, this is not a trivial problem to overcome. Even MIT Research recently cited TCP’s CCAs as having a significant and growing negative impact on network performance, but was unable to offer a practical solution.¹ Resolving it would require modifying or replacing TCP’s CCAs to remove the bottleneck they create. But to be acceptable and scale in a production environment, a viable solution can’t require changes to the TCP stack itself, or any client or server applications that rely on it. It must also co-exist with ADCs, SD-WANs, VPNs and other network infrastructure already in place

There is A Proven and Cost-Effective Solution

Badu Networks’ patented WarpEngine^TM carrier-grade optimization technology, with its single-ended transparent proxy architecture meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine determines in real-time whether jitter is due to network congestion, and prevents throughput from collapsing and applications from stalling when it’s not. It builds on this with other performance enhancing features like improved flow control and QoS features such as packet prioritization that benefit not only TCP, but also GTP (used by 5G and LTE), UDP and other network traffic. As a result, WarpEngine is able to deliver massive performance gains for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes. WarpEngine can be deployed at core locations as well as the network edge as a hardware appliance, or as software installed on servers provided by a customer or partner. It can be installed in a carrier’s core network, or in front of hundreds or thousands of servers in a corporate or cloud data center. WarpEngine can also be deployed at cell tower base stations, or with access points supporting public or private Wi-Fi networks of any scale. Enterprise customers can implement WarpEngine on-prem with their Wi-Fi access points, or at the edge of their networks between the router and the firewall for dramatic WAN, broadband and FWA throughput improvements of 2-10X or more.²

WarpVM^TM, the VM form factor for WarpEngine, is designed specifically for cloud and virtualized edge environments. WarpVM is a VM-based transparent proxy that installs in minutes as a VNF in AWS, MS Azure, VMWare, or KVM environments. No modifications to client or server applications or network stacks are required. WarpVM has also been certified by Nutanix^TM for use with their multi-cloud platform, and has delivered performance numbers similar to those cited above.³ Nutanix’s multi-cloud platform also supports their recently announced GPT-in-a-Box^TM AI solution.

Comparing the Cost of WarpVM to the Cost of Cloud Network and Server Upgrades

If you’re considering taking advantage of any of the managed AI services from vendors such as AWS, or considering upgrades to improve performance for cloud-native 5G or any cloud or edge hosted applications already deployed, you should understand how the cost of implementing WarpVM compares to the cost of cloud server and network upgrades to achieve the same result.

For example, assume an AWS customer currently paying for a Direct Connect port with 10G capacity wants to improve cloud network throughput by WarpVM’s average of 3X. To do this they would typically pay for two additional AWS Direct Connect 10G ports, and two additional AWS servers to support each of them. They would then balance load across each at 30% utilization to allow headroom for peak traffic. This approach is required in many cases to provide excess bandwidth to accommodate TCP’s jitter response. As a result, the customer is leaving expensive cloud network and server resources underutilized and wasting money most of the time.

To improve throughput by 3X using WarpVM in this scenario, the customer would only pay for one instance of WarpVM at a cost that’s nearly 80% less than the standard approach described above. The total savings are actually much greater, because no additional AWS servers are needed for load balancing to allow for peak traffic, and no egress fees will be incurred due to unnecessary packet retransmissions. Moreover, network bandwidth that was previously wasted is recaptured, making it less likely that another round of upgrades would be needed in the near future.

Cloud service providers (CSPs) embedding WarpVM in their services can also benefit by avoiding the cost of provisioning additional infrastructure to deliver the equivalent of a major upgrade. These CSPs could choose to pass some or all the savings along to their customers, and gain an immediate price/performance edge. Additionally, WarpVM would enable these CSPs to onboard more customers and generate greater revenue with their existing infrastructure footprint.

Conclusion

As AI continues to drive innovation and transformation, jitter-related performance challenges will only grow. Failing to address them will result in infrastructure costs that put Gen AI out of reach for too many organizations, and bring its iPhone moment to an end.

Badu Networks’ WarpVM offers a proven, cost-effective solution for overcoming these challenges, for AI and any other cloud use cases. By tackling TCP’s reaction to jitter head-on at the transport layer, and incorporating other performance enhancing features that benefit TCP, GTP, UDP and other network traffic, WarpVM ensures that your AI, cloud-native 5G and other cloud and edge applications operate at their full potential, and do so at the lowest possible cost.

To learn more about WarpVM and request a free trial with your AI, cloud-native 5G, or other cloud applications to see how much you can save, click the button below.

Learn More

Notes

1. Starvation in End-to-End Congestion Control, August 2022: https://people.csail.mit.edu/venkatar/cc-starvation.pdf

2. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf

3. https://www.nutanix.com/partners/technology-alliances/badu-networks

About Badu

Blog Categories

Archives

New Cloud Services Like Amazon Bedrock are Enabling Gen AI’s iPhone Moment, but They Won’t be Cheap

Here’s how you can dramatically reduce the cost of these new AI services, along with all your cloud costs

About Badu

Blog Categories

Archives