Cloud Bouncer - Distributed Rate Limiting at Yahoo

Most platforms at Yahoo serve dozens of Yahoo Properties and Apps at any given time (think Mail, Finance, Flickr etc.) and are operated as multi-tenant services on shared infrastructure for cost efficiency and on demand elasticity. As such, platforms need to protect against applications exceeding quotas, erroneously sending a flood of requests and causing systemic outages.

Enter Cloud Bouncer.

Developed and integrated into various Yahoo platforms in 2014, Cloud Bouncer now stands as a pillar, protecting against abusive usage patterns, enforcing quotas and bolstering the stability of our platforms.

Overview

Cloud Bouncer is a distributed rate limiter, packaged as an efficient, light-weight software suite which effortlessly plugs into any host serving requests. Based on user-defined global policies, Cloud Bouncer makes the decision whether an incoming request on a given node should be served or denied. These policies can be based on any attributes, such as reads, writes or bytes uploaded, which are counted through the software. Despite policies being enforced at a global level, the decision of whether to serve the request is made locally without external communication during the request. Upon receiving the decision from Cloud Bouncer, the application can take the relevant action.

Here are some of the key principles which Cloud Bouncer adheres to:

Distributed and Decentralized - A largely decentralized rate-limiting solution which can enforce policies across a cluster of machines.
Real-time - Responds to changes in traffic patterns quickly.
Scalability - Serving Nodes can be added and removed seamlessly. Cloud Bouncer works well for large clusters (1,000+ nodes) and a large set of policies (10,000+ policies).
Low latency - No more than a few microseconds of overhead per request.
Ease of integration - Cloud Bouncer provides a straightforward REST API for managing policies and its library provides a simple set of functions/hooks to use per request.

Architecture

Cloud Bouncer is operated through a set of user-defined policies which specify the limits permitted on any attribute of an application.

The Policy Manager is a hosted service where users register their application and policies, which are currently stored in a relational database (e.g. MySQL).
All the nodes in the cluster where Cloud Bouncer is used run a background process known as the Controller. The Controller periodically polls the Policy Manager to check if any policies have been added, modified or removed. It then caches the policies locally in memory.
Integrated into the application is the Cloud Bouncer library which exposes a simple API to check whether a policy is able to satisfy the incoming request.
All nodes also run a process known as the Distributed Rate Limiter (DRL) which performs Cloud Bouncer’s core functions and coordinates asynchronous communication between nodes

Every node in the cluster stores its traffic information locally and this information is shared between nodes using gossip protocol over UDP, which involves little communication over the wire. To keep the gossip packet size to a minimum, the packets are optimized to send traffic information for just the last second. This also enables Cloud Bouncer to react quickly to traffic spikes.

Consensus between nodes on the global traffic usage is near real-time. The above graph illustrates a policy at 8K, where the time lag between the traffic spike to 12K and denials commencing is negligible. More specifically however, the system achieves consensus with a time complexity of O(log n), as is the nature of the gossip protocol.

Cloud Bouncer’s rate limiting solution is based on a token bucket algorithm. A token bucket is defined by:

the rate at which tokens are added to the bucket
the maximum number of tokens the bucket can hold

On receiving the application request, the library checks its local token bucket and if there are enough tokens available to satisfy the request the user application is returned a success. At the end of the configured time period (generally a second), new tokens are generated and added to the bucket as per the rate configured in the Cloud Bouncer policy. Depending on whether the policy has been exceeded locally or globally, Cloud Bouncer optimizes the gossip communication for efficient network bandwidth consumption.

Some of the other features available in Cloud Bouncer include:

Flexible Traffic Distribution - Cloud Bouncer supports two traffic patterns, the first where traffic is expected to be evenly distributed among the nodes, and the second where traffic distribution is uneven and individual nodes can develop hot spots. Applications can choose to allow such uneven traffic distribution where one node might receive a large proportion of the requests, but is still correctly rate-limiting at the global limits.
Hierarchical Policies - a parent-child relationship between policies allows applications to model complex dependencies.
Rate Limit the Rate Limiter - restrict the amount of network I/O the Cloud Bouncer components use at any given time.
Configurable Token Period - the time interval after which the tokens in the bucket are replenished.
Dynamic Host Addition and Deletion - nodes can be simply added to the cluster for inclusion in the policy’s global count without service interruption.

Cloud Bouncer in Sherpa (Yahoo’s NoSQL Data Service)

Its versatility has enabled Cloud Bouncer to be adopted in a variety of contexts, both as components within systems for generic overload protection and in multi-tenant platforms for quota enforcement. Let’s examine the latter use case where Cloud Bouncer has been deployed at significant scale.

Sherpa is Yahoo’s NoSQL data service, used by nearly every Yahoo Property for metadata storage. For those not versed in its architecture, the relevant components of this multi-tenant platform are the Storage Units and Routers.

The Storage Units persist shards of user tables.
All incoming requests from users are load balanced across the Routers in a given data center and the Router subsequently proxies the request to the correct Storage Unit for reading/writing.

Sherpa is present in over 10 geographic regions around the world, hosts thousands of user tables and serves over 40 billion requests per day with hundreds of Routers and Storage Units per region. Each Sherpa region acts as a separate Cloud Bouncer cluster. Every user table in Sherpa is provisioned for a specified request throughput per region the table is present in. Based on the provisioned rates and availability planning for traffic failover, Sherpa computes appropriate throughput thresholds. These values are used as the tokens generated per time period to create two Cloud Bouncer policies per user table per region, the first at 90% of their threshold and second at the threshold itself. When the first policy is exceeded in Cloud Bouncer, Sherpa begins redirecting requests to other regions to throttle the application and apply back pressure, and when the final threshold is exceeded Sherpa denies requests.

An iconic success story from recent weeks is the case of Yahoo Movies. The Movies pipeline uses a table to process feeds of showtimes and other flavors of TV data, which is ingested from Hadoop to Sherpa. A faulty version of the software found itself without the necessary filters working and began writing into Sherpa at 10 times their provisioned limits. Such a spike in write throughput would have led to excessive I/O on the Storage Units and the messaging layers, which would have impacted other properties and applications in that region. However, in our case since we have Cloud Bouncer, the policies implemented for the table kicked in, and the requests for the table were rate limited across the various routers (metrics as illustrated above).

Looking to the Future

Some of the challenges we face in Cloud Bouncer surround scale and efficiency, which we look forward to solving.

Rate-limiting policies across data centers - the challenge here pertains to high latencies between servers compromising Cloud Bouncer’s ability quickly react to traffic spikes. A possible approach involves aggregating traffic information over groups of hosts rather than gossiping traffic information at the host level. However, initial investigation of this approach reveals a loss of granularity and approximations to predict traffic spikes, which betrays Cloud Bouncer’s key principles.
Reduce network I/O - gossip protocol leads to a fair amount of redundant messages, for which we are looking at building a tree-like structure to reduce communication

By Varad Kishore, Software Systems Development Engineer, and Aravind Sethuraman, Senior Software Apps Development Engineer