The cost benefits of replacing physical network infrastructure with virtualized software are undeniable. That’s why mobile network operators (MNOs) have embraced network functions virtualization (NFV) to support 5G’s vastly greater infrastructure requirements relative to LTE. In addition, MNOs are taking advantage of other NFV benefits in combination with 5G such as:
- Scalability – NFV enables MNOs to scale network resources up or down based on demand, ensuring they can meet the needs of their customers.
- Service innovation – NFV provides a platform for service innovation that in combination with 5G’s data speeds, enables MNOs to develop and deploy new services quickly and cost-effectively. MNOs can differentiate themselves, improve customer satisfaction, and drive revenue growth.
- Network slicing – NFV provides the foundation for 5G network slicing, enabling MNOs to create virtualized networks tailored to specific use cases, such as autonomous vehicles, healthcare, IoT, and a wide range of applications.
However, the benefits of running network functions in a virtualized environment, especially in combination with 5G, can come at a high price in terms of performance overhead.
The NFV Performance Penalty
NFV’s performance overhead is a product of virtualization. In a cloud environment, multiple virtual machines (VMs) or containers, each running a different network function, share the same physical resources such as CPU, memory, and storage. The higher the number of VMs running on a single physical server, the higher the likelihood of resource contention. If one VM monopolizes resources, others wait, delaying packet processing. The result is packet delay variation (PDV), more commonly referred to as jitter. The jitter caused by resource contention can impact all hosted applications, and it’s compounded by other factors in cloud environments:
- Hypervisor Overhead – The hypervisor, which is the software layer that manages the VMs, adds some degree of processing overhead. This can introduce packet delays, especially if the hypervisor is not properly configured or if the system is under heavy load.
- Virtual Switching – Within the host system, network traffic often passes through a virtual switch before reaching the physical network. This step can introduce additional delay, especially under high load or with complex network policies.
- Network Overlays – In many virtualized environments, network overlays such as VXLAN or GRE are used to create virtual networks. The encapsulation and decapsulation process for these overlays can introduce additional delays and hence jitter.
- Inefficient Packet Scheduling/Processing – If the packet scheduling or processing within the virtual network functions (VNFs) is inefficient, it can result in variable packet delay, contributing to jitter.
- Physical Network Issues – Once traffic leaves the virtual environment and enters the physical network, it is subject to the usual sources of jitter including network congestion, route changes, fading and RF interference that VNFs have no control over.
The impacts of NFV jitter include:
- Poor User Experience: In services that rely on real-time data transmission like VoIP and video streaming, high jitter can result in poor service quality, leading to a degraded user experience. This could be choppy voice calls, lagging video streams, or even service interruptions.
- Out-of-Order Packets: High jitter can lead to packets arriving at their destination out of order. This not only complicates the reassembly process, but can also lead to data loss or corruption, impacting overall network performance.
- Increased Buffering: To manage network jitter, additional buffering might be required. This can increase latency and demand on system resources, which can impact other network functions in a shared NFV environment.
5G’s Jittery Nature
5G has several inherent characteristics that compound NFV’s jitter with jitter of its own:
- Higher Frequencies and mmWave Technology – 5G networks can use much higher frequency bands than previous generations, up to and including millimeter wave (mmWave) frequencies. These higher frequencies can deliver high data rates and low latency, but they also have poorer propagation characteristics and are more susceptible to interference and signal degradation, which can lead to increased jitter.
- Network Slicing – 5G introduces the concept of network slicing, where different services are provided on separate logical networks that share the same physical network infrastructure. While this allows for more efficient resource usage and better quality of service (QoS) management for different types of applications as mentioned earlier, it also introduces additional complexity that can lead to jitter, especially if the network slices aren’t properly isolated or adequately provisioned.
- Denser Networks – To provide high data rates and low latency, 5G networks use a denser deployment of base stations, often including small cells. While this helps with capacity and coverage, it also means that devices can switch base stations more frequently, leading to jitter.
- Increased Traffic – 5G is designed to support a massive increase in traffic, both from higher data rates and from a larger number of devices. Managing this traffic efficiently without causing jitter can be a challenge, particularly in scenarios with high user mobility or large numbers of devices, as in the case of IoT applications.
- Line-of-Sight Requirement – High-frequency 5G signals often require a clear line-of-sight path between the transmitter and receiver. Any obstacle can cause the signal to be reflected, refracted, or diffracted, resulting in multiple signal paths with different lengths. These varying path lengths can cause packets to arrive at different times, leading to jitter.
The random delays that generate jitter impact the performance of VNFs and the 5G networks that use them. These delays slow the movement of data between VNFs over the same virtual subnet in a virtual private cloud (VPC), or VMs and containers in a more general context that support any cloud or edge-hosted application. Moreover, many of the applications 5G NFV implementations support such as IoT and streaming, tend to send data in unpredictable bursts, generating more jitter.
To make matters worse, jitter generated by the combination of 5G, NFV and many of the applications they support, has a far more serious knock-on effect on performance beyond the random delays discussed above. Widely used network protocols such as TCP consistently interpret jitter as a sign of congestion, and respond by retransmitting packets and slowing traffic to prevent data loss, even when the network isn’t saturated and plenty of bandwidth is available. Just modest amounts of jitter can cause throughput to collapse and applications to stall, or in the case of VNFs, disrupt the network services they provide. And not only TCP traffic is affected. For operational efficiency, applications using TCP generally share the same network infrastructure and compete for bandwidth and other resources with applications using UDP and other protocols. More bandwidth than would otherwise be needed is often allocated to applications using TCP to compensate for its reaction to jitter, especially under peak load. This means bandwidth that could be available for applications using UDP and other protocols is wasted, and the performance of all applications sharing a network suffers.
Most Network Performance Solutions Fall Short or Make the Problem Worse
TCP’s reaction to jitter is triggered by its congestion control algorithms (CCAs) which operate in the network transport layer (layer 4 of the OSI stack). The solutions network administrators generally rely on to address poor performance either don’t operate at the transport layer, or if they do, they have little or no impact on TCP’s CCAs. As a result, these solutions – upgrades, Quality of Service (QoS), jitter buffers and TCP optimization – fail to address the root cause of jitter-induced throughput collapse and sometimes make it worse, as explained below:
- Upgrades – In addition to being costly and disruptive, network bandwidth upgrades are a physical layer 1 approach that offers only a temporary solution. Traffic eventually increases to fill the additional capacity, and the incidence of jitter-induced throughput collapse goes up in tandem because the root cause was never dealt with.
- QoS techniques include:
- Packet prioritization to insure that higher priority application traffic is given preference over other application traffic.
- Traffic shaping that controls the rate at which data is sent over the network, smoothing out traffic bursts in an attempt to avoid congestion.
- Resource reservation that reserves bandwidth for specific applications or users, to maintain a minimum service level.
QoS operates primarily at the network layer (layer 3) and the transport layer because the mechanisms it relies on use the IP addresses and port numbers managed at those layers to prioritize traffic and avoid congestion. However, TCP’s CCAs that also operate at the transport layer are not dealt with. As a result, the effectiveness of QoS is limited, and bandwidth and other network resources end up being wasted due to jitter’s impact on throughput.
- Jitter buffers are a network application layer solution that reorders packets and realigns packet timing to adjust for jitter before packets are passed to an application. The random delays created by packet reordering and realignment can ruin performance for real-time applications, and become yet another source of latency and jitter contributing to throughput collapse.
To eliminate jitter-induced throughput collapse, the focus must be on the transport layer, and deal specifically with TCP’s CCAs that become a bottleneck by reducing throughput in reaction to jitter, even when the network isn’t saturated and ample bandwidth is available. Most TCP optimization solutions that do focus on the transport layer and the CCAs try to address this bottleneck by managing the size of TCP’s congestion window to let more traffic through a connection, using selective ACKs that notify the sender which packets need to be retransmitted, adjusting idle timeouts and tweaking a few other parameters. While these techniques can offer some modest improvement, generally in the range of ten to fifteen percent, they don’t eliminate jitter-induced throughput collapse, the resulting waste of bandwidth, or its impact on UDP and other traffic sharing a network.
Jitter-induced throughput collapse can only be resolved by modifying or replacing TCP’s congestion control algorithms to remove the bottleneck they create, regardless of the network or application environment. However, to be acceptable and scale in a production environment, a viable solution can’t require any changes to the TCP stack itself, or any client or server applications. It must also co-exist with ADCs, SD-WANs, VPNs, VNFs and other network infrastructure already in place.
There’s Only One Proven and Cost-Effective Solution
Only Badu Networks’ WarpEngine™ optimization technology, with its single-ended proxy architecture meets the key requirements outlined above for eliminating jitter-induced throughput collapse. WarpEngine determines in real-time whether jitter is due to network congestion, and prevents throughput from collapsing and applications from stalling when it’s not. WarpEngine builds on this with other performance enhancing features that benefit not only TCP, but also UDP and other traffic sharing a network as well, to deliver massive performance gains for some of the world’s largest mobile network operators, cloud service providers, government agencies and businesses of all sizes.1
WarpVM™, the VM form factor of WarpEngine, is designed specifically for virtualized environments. WarpVM is deployed as a VNF that acts as a virtual router. With WarpEngine’s capabilities built in, WarpVM optimizes all traffic coming in and out of a cloud environment, such as a VPC supporting a 5G core network. WarpVM can boost cloud network throughput and VNF and other hosted application performance by up to 80% under normal operating conditions, and 2-10X or more in high traffic, high latency, jitter-prone environments.WarpVM achieves these results with existing infrastructure, for over 70% less than the cost of upgrading cloud network bandwidth and servers.
WarpVM’s transparent proxy architecture enables it to be deployed in minutes in AWS, Azure, VMWare, or KVM environments. WarpVM has also been certified by Nutanix™ for use with its multi-cloud platform.2 No modifications to network stacks, or client or server applications are required. All that’s needed are a few DNS changes at the client site, or simple routing changes at the cloud.
To learn more about WarpVM and request a free trial click the button below.
Notes
1. Badu Networks Performance Case Studies: https://www.badunetworks.com/wp-content/uploads/2022/11/Performance-Case-Studies.pdf
2. https://www.nutanix.com/partners/technology-alliances/badu-networks