Cloud Resource Sharing

Cloud platforms like Amazon Web Services have given rise to new business models, in which third-party “tenants” deploy their services within the cloud provider’s datacenters and pay only for the incremental compute, storage, or network resources used. While shared back-end storage services enjoy wide adoption in commercial clouds, most of these systems provide weak performance isolation between tenants, if at all. At best, today’s approaches for multi-tenant resource allocation are based either on per-VM allocations or hard rate limits that assume uniform workloads across a tenant’s storage partitions. Instead, we achieve  performance isolation and fairness across an entire cluster’s aggregate server resources, while avoiding either low utilization or strong assumptions about the workload’s characteristics.

Our new shared key-value storage system, PISCES, provides tenants with weighted fair shares (or minimal rates) of the shared service’s aggregate resources. The allocation is fair and work-conserving, even when different tenants’ partitions are colocated and when demand to different partitions is skewed, time-varying, or bottlenecked by different server resources. PISCES does so through a novel decomposition of the global fairness problem into four mechanisms (based on the primal-allocation method of distributed convex optimization): partition placement, weight allocation and multilateral weight swapping, congestion-driven replica selection, and weighted fair queuing extended to resource vectors and dominant resource fairness. These mechanisms operate on different timescales and with different levels of system-wide visibility.

In Libra, we extend the PISCES IO scheduler  to support tenant reservations in terms of application-level throughput for disk-IO bound workloads on an SSD-backed key-value storage system. Providing app-request guarantees has proven elusive due to the complexities inherent to modern storage stacks: non-uniform IO amplification, unpredictable IO interference, and non-linear IO performance. To tackle these challenges, Libra leverages two techniques. First, Libra tracks the IO resource consumption of a tenant’s application-level requests across complex storage stack interactions, down to low-level IO operations. This allows Libra to allocate per-tenant IO re- sources for achieving app-request reservations based on their dynamic IO usage profile. Second, Libra uses a disk-IO cost model based on virtual IO operations (VOP) that captures the non-linear relationship between SSD IO bandwidth and IO operation (IOP) throughput. Using VOPs, Libra can both account for the true cost of an IOP and determine the amount of provisionable IO resources available under IO interference

Main Publications

  • From application requests to Virtual IOPs: Provisioned key-value storage with Libra
    David Shue and Michael J. Freedman
    Proc. European Conference on Computer Systems
    (EuroSys ’14). Amsterdam, Netherlands, April 2014. 14 pages. [pdf] [slides pdf, ppt]
  • Performance Isolation and Fairness for Multi-Tenant Cloud Storage
    David Shue, Michael J. Freedman, and Anees Shaikh.
    Proc. Symposium on Operating Systems Design and Implementation
    (OSDI ’12). Hollywood, CA, October 2012.  14 pages. [pdf] [slides pdf, ppt]

Other Publications

  • Fairness and Isolation in Multi-Tenant Storage as Optimization Decomposition
    David Shue, Michael J. Freedman, and Anees Shaikh.
    ACM SIGOPS Operating System Review
    (OSR) Vol 47, Num 1, January 2013. 6 pages. [pdf]
  • Towards Predictable Multi-Tenant Shared Cloud Storage
    David Shue, Michael J. Freedman, and Anees Shaikh.
    Proc. Large-Scale Distributed Systems and Middleware
    (LADIS ’12). Madeira, Portugal, July 2012. 3 pages. [pdf] [slides pdf, ppt]