Posted by Vahab Mirrokni, Principal Scientist, Morteza Zadimoghaddam, Research Scientist, NYC Algorithms Team

Running a large-scale web service, such as content hosting, necessarily requires load balancing — distributing clients *uniformly* across multiple servers such that none get overloaded. Further, it is desirable to find an allocation that does not change very much over time in a *dynamic* environment in which both clients and servers can be added or removed at any time. In other words, we need the allocation of clients to servers to be *consistent* over time.

In collaboration with Mikkel Thorup, a visiting researcher from university of Copenhagen, we developed a new efficient allocation algorithm for this problem with *tight guarantees* on the maximum load of each server, and studied it theoretically and empirically. We then worked with our Cloud team to implement it in Google Cloud Pub/Sub, a scalable event streaming service, and observed substantial improvement on uniformity of the load allocation (in terms of the maximum load assigned to servers) while maintaining consistency and stability objectives. In August 2016 we described our algorithm in the paper “Consistent Hashing with Bounded Loads”, and shared it on ArXiv for potential use by the broader research community.

Three months later, Andrew Rodland from Vimeo informed us that he had found the paper, implemented it in haproxy (a widely-used piece of open source software), and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck. He recently summarized this story in a blog post detailing his use case. Needless to say, we were excited to learn that our theoretical research was not only put into application, but also that it was useful *and* open-sourced.

**Background**

While the concept of consistent hashing has been developed in the past to deal with load balancing in dynamic environments, a fundamental issue with all the previously developed schemes is that, in certain scenarios, they may result in sub-optimal load balancing on many servers.

Additionally, both clients and servers may be added or removed periodically, and with such changes, we do not want to move too many clients. Thus, while the dynamic allocation algorithm has to always ensure a proper load balancing, it should also aim to minimize the number of clients moved after each change to the system. Such allocation problems become even more challenging when we face hard constraints on the capacity of each server – that is, each server has a capacity that the load may not exceed. Typically, we want capacities close to the average loads.

In other words, we want to simultaneously achieve both *uniformity* and *consistency* in the resulting allocations. There is a vast amount of literature on solutions in the much simpler case where the set of servers is fixed and only the client set is updated, but in this post we discuss solutions that are relevant in the fully *dynamic* case where both clients and servers can be added and removed.

**The Algorithm**

We can think about the servers as bins and clients as balls to have a similar notation with well-studied balls-to-bins stochastic processes. The uniformity objective encourages all bins to have a load roughly equal to the average density (the number of balls divided by the number of bins). For some parameter ε, we set the capacity of each bin to either floor or ceiling of the average load times (1+ε). This extra capacity allows us to design an allocation algorithm that meets the consistency objective in addition to the uniformity property.

Imagine a given range of numbers overlaid on a circle. We apply a hash function to balls and a separate hash function to bins to obtain numbers in that range that correspond to positions on that circle. We then start allocating balls in a specific order independent of their hash values (let’s say based on their ID). Then each ball is moved clockwise and is assigned to the first bin with spare capacity.

Consider the example above where 6 balls and 3 bins are assigned using two separate hash functions to random locations on the circle. For the sake of this instance, assume the capacity of each bin is set to 2. We start allocating balls in the increasing order of their ID values. Ball number 1 moves clockwise, and goes to bin C. Ball number 2 goes to A. Balls 3 and 4 go to bin B. Ball number 5 goes to bin C. Then ball number 6 moves clockwise and hits bin B first. However bin B has capacity 2 and already contains balls 3 and 4. So ball 6 keeps moving to reach bin C but that bin is also full. Finally, ball 6 ends up in bin A that has a spare slot for it.

Upon any update in the system (ball or bin insertion/deletion), the allocation is recomputed to keep the uniformity objective. The art of the analysis is to show that a small update (a few number of insertions and deletions) results in minor changes in the state of the allocation and therefore the consistency objective is met. In our paper we show that every ball removal or insertion in the system results in O(1/ε^{2}) movements of other balls. The most important thing about this upper bound is that it is independent of the total number of balls or bins in the system. So if the number of balls or bins are doubled, this bound will not change. Having an upper bound independent of the number of balls or bins introduces room for scalability as the consistency objective is not violated if we move to bigger instances. Simulations for the number of movements (relocations) per update is shown below when an update occurs on a bin/server.

The red curve shows the average number of movements and the blue bars indicate the variance for different values of ε (the x-axis). The dashed curve is the upper bound suggested by our theoretical results which fits nicely as a prediction of the actual number of movements. Furthermore, for any value of ε, we know the load of each bin is at most (1+ε) times the average load. Below we see the load distribution of bins for different values of ε=0.1, ε=0.3 and ε=0.9.

As one can see there is a tradeoff — a lower ε helps with uniformity but not with consistency, while larger ε values help with consistency. A lower ε will ensure that many loads will be equal to the hard capacity limit of (1+ε) times the average, and the rest have a decaying distribution.

When providing content hosting services, one must be ready to face a variety of instances with different characteristics. This consistent hashing scheme is ideal for such scenarios as it performs well even for worst-case instances.

While our internal results are exciting, we are even more pleased that the broader community found our solution useful enough to open-source, allowing anyone to use this algorithm. If you are interested in further details of this research, please see the paper on ArXiv, and stay tuned for more research from the NYC Algorithms Team!

**Acknowledgements:**

We would like to thank Alex Totok, Matt Gruskin, Sergey Kondratyev and Haakon Ringberg from the Google Cloud Pub/Sub team, and of course Mikkel Thorup for his invaluable contributions to this paper.

Source: Consistent Hashing with Bounded Loads

除非特别声明，此文章内容采用知识共享署名 3.0许可，代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Post Views:
0

Tags:
AdWords

- Start planning your Google I/O 2017 schedule!
- 研究博客：Research at Google 与 ICLR 2017
- SafetyNet attestation, a building block for anti-abuse
- Next steps toward more connection security
- Introducing the Google Assistant SDK
- Better VM Rightsizing Recommendations with the Stackdriver Agent
- 通过初始试用在您的网站上测试 amp-bind
- Saddle up and meet us in Texas for OSCON 2017
- Manage your gRPC APIs with Google Cloud Endpoints
- PhotoScan：为冲印的照片拍摄无眩光照片

- 谷歌招聘软件工程师 (18,735)
- Google 推出的 31 套在线课程 (15,519)
- 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (10,910)
- Seti UI 主题: 让你编辑器焕然一新 (9,068)
- Android Studio 2.0 稳定版 (7,888)
- Android N 最初预览版：开发者 API 和工具 (7,578)
- 像 Sublime Text 一样使用 Chrome DevTools (5,304)
- Google I/O 2016: Android 演讲视频汇总 (5,178)
- 生还是死？Android 进程优先级详解 (4,350)
- 用 Google Cloud 打造你的私有免费 Git 仓库 (4,272)
- 面向普通开发者的机器学习应用方案 (3,856)
- 面向 Web 开发者的 Sublime Text 插件 (3,842)
- 适配 Android N 多窗口特性的 5 个要诀 (3,652)
- 参加 Google I/O Extended，观看 I/O 直播，线下聚会！ (3,347)

© 2017 中国谷歌开发者社区 - ChinaGDG