By Jake Adriaens, Staff Software Engineer
Google Cloud customers now enjoy significantly improved intra-zone network latency with the release of Andromeda 2.1, a software-defined network (SDN) stack that underpins all of Google Cloud Platform (GCP). The latest version of Andromeda reduces network latency between Compute Engine VMs by 40% over Andromeda 2.0 and by nearly a factor of 8 since we first launched Andromeda in 2014.
This kind of network performance is especially important as more applications move into the cloud and are accessed via web browsers. While the headline metric is often bandwidth, network latency is frequently the more important determiner of application performance. For example, low latency is essential for financial transactions, ad-tech, video, gaming and retail, as well as workloads such as HPC applications, memcache and in-memory databases. Likewise, HTTP-based microservices will see significant improvement in responsiveness with reduced latency.
Andromeda 2.1 latency improvements come from a form of hypervisor bypass that builds on virtio, the Linux paravirtualization standard for device drivers. Andromeda 2.1 enhancements enable the Compute Engine guest VM and the Andromeda software switch to communicate directly via shared memory network queues, bypassing the hypervisor completely for performance-sensitive per-packet operations.
In our previous approach, the hypervisor thread served as a bridge between the guest VM and the Andromeda software switch. Packets flowed from the VM to a hypervisor thread, to the local host’s Andromeda software switch, then over the physical network to another Andromeda software switch, and back up through the hypervisor to the VM. Further, any time the thread wasn’t bridging packets, it was descheduled, increasing tail latency for new packet processing. In many cases, a single network round-trip required four costly hypervisor thread wakeups!
|Andromeda 2.1’s optimized datapath using hypervisor bypass.|
The new Andromeda 2.1 stack delivers noteworthy reductions in VM-to-VM network latency. The figure below shows the factor by which the latency has reduced over time compared to the median round-trip time of the original stack.
|Factor by which latency has improved over time|
This reduction in network round-trip times translates into real-world performance boosts for latency sensitive applications. Take Aerospike, a high-performance in-memory NoSQL database. The new Andromeda stack delivers both a reduction in request latency and improved request throughput for Aerospike, as shown below.
Considering Andromeda SDN is a foundational building block of Google Cloud, you should see similar improvements in intra-zone latency, regardless of what applications you’re running.
Andromeda SDN enables more flexibility than other hardware-based stacks. With SDN, we can quickly develop and overhaul our entire virtual network infrastructure. We can roll out new cloud network services and features, apply security patches and gain significant performance improvements. Better yet, we can confidently deploy to Google Cloud with no downtime, reboots or even VM migrations, because the flexibility of SDN allows us to thoroughly test our code. Watch this space to learn about the new features and enhanced network performance made possible by our Andromeda SDN foundation.