Flexible Caching for Web Applications

Web applications and services rely heavily on caching to reduce the latency of accessing data and the load on a backing data store. The wide adoption of Memcached and Redis (key-value caching), Guava (local object caching), and Varnish (front-end HTTP caching) speak to this demand, as does their point-and-click availability on cloud platforms like Heroku via MemCachier, EC2 via ElastiCache, and Azure via Azure Redis Cache.

The performance of a cache is determined by the workload and the caching algorithm, or the strategy for prioritizing items for eviction when the cache is full. All of the above services employ a fixed caching algorithm, typically LRU. But the needs of each application vary, and significant performance gains can be achieved by tailoring the caching strategy to the application: for example, incorporating item size, cost of fetching, or other features; or prioritizing the items in an entirely new way.

Eschewing Data Structures Gives Flexibility

Existing strategies lack the flexibility we seek, unfortunately, because they rely on data structures (e.g., priority queues) to track the ordering of cached items. Recency-based algorithms (e.g., LRU and its variants) use time-of-access to order items, a measure that is difficult to extend—for example, incorporating costs requires a completely new design (e.g., GreedyDual). Frequency-based algorithms (e.g., LFU and its variants) are easier to modify, but any major change to item priorities—such as changing the cost of a group of items—results in expensive operations on the underlying data structure. Some algorithms (e.g., marking algorithms) maintain only a partial ordering, but the coarse resolution actually makes it harder to incorporate new features. Several theoretical studies formulate caching as an optimization problem unconstrained by any data structure, but their solutions are approximated by online heuristics that, once again, rely on data structures.

Therefore if we eschew ordering data structures entirely, we can unlock the design space and potentially find the best priority function for our needs. But without a data structure, how do we find the next item to evict? Psounis and Prabhakar provide a solution: they show that randomly sampling items and evicting the lowest-priority one works remarkably well. This can be applied to any (explicit) priority function. The authors analyzed the theoretical properties of their eviction scheme; they did not suggest any new priority functions. In fact, all the priority functions they tested were originally designed for data structures.

Our Framework

Motivated by the above, we propose a simple caching framework that combines lazy evaluation of an (arbitrary) priority function with random sampling to evict items. We use this framework to design a new caching algorithm for modern web applications, called hyperbolic caching. We begin with a simple but novel theoretical model for web workloads that leads to an optimal solution based on frequency. However, we overcome the drawbacks of frequency by incorporating the time an
item spends in the cache:

p = (number of requests)/(time in cache)

This deceptively simple modification already invalidates a data structure approach, because item priorities decay at different rates and are continuously being reordered. In addition, we can customize hyperbolic caching to different usage scenarios by adding modules to the priority function. We describe modules for item cost, expiration time, and windowing. Notably, we introduce the notion of cost classes to manage groups of related items, e.g., items materialized by the same database query.

By using different modules, we design a hyperbolic caching variant for several different production systems from leading cloud providers, and evaluate them on real traces from those providers. We also implement a cost-aware variant for whole-page caching in the Django web framework and for Redis, supporting both costs classes and per-item costs. Overall, we find that hyperbolic caching reduces misses over competitive baselines tailored to the application, and improves end-to-end system throughput by 5-10%. We complement these results with simulations on various static and dynamic workloads.

Project Contributions

To summarize, we make the following contributions:

  • We eschew data structures as a design principle and propose a framework for caching based on lazy evaluation and random sampling. We use this to design hyperbolic caching, a flexible caching scheme that prioritizes items in a radically different way.
  • We define modules for incorporating item cost, expiration time, and windowing, and use them to customize hyperbolic caching to three production systems.
  • We introduce the notion of cost classes to manage groups of related items effectively.
  • We implement hyperbolic caching in Django and Redis and demonstrate performance improvements for several backing applications.

The freedom to prioritize items as we wish means that we can naturally incorporate many variations on our scheme to deal with different workloads and usage scenarios. Implementing these variations just requires defining new priority functions, because our eviction strategy just samples eviction candidates, evaluates each candidate’s priority, and evicts the lowest priority candidate. In this paper, we design and evaluate three priority variations to support per-item costs, grouped costs undergoing potentially dramatic changes, per-item expiration times, and various windowing strategies to deal with dynamic item popularities. In addition to providing better performance in a variety of situations, these variations demonstrate the flexibility achieved from decoupling prioritization and the actual mechanics of eviction.