Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Autoscaling Tiered Cloud Storage in Anna

Lucas Bleme
February 07, 2022

Autoscaling Tiered Cloud Storage in Anna

Presentation of the paper "Autoscaling Tiered Cloud Storage in Anna", presented at the Massive Data Processing discipline's seminar at DCC/UFMG 2022/01.

Presentation (PT-BR): https://www.youtube.com/watch?v=gzspGRsFVEs
Original paper published by VLDB 19: https://dsf.berkeley.edu/jmh/papers/anna_vldb_19.pdf

Lucas Bleme

February 07, 2022
Tweet

More Decks by Lucas Bleme

Other Decks in Science

Transcript

  1. Challenges on cloud KVS • Data Volume variation As overall

    workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease.
  2. Challenges on cloud KVS • Data Volume variation As overall

    workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease. • Skewness (skewed vs uniform workloads) Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys.
  3. Challenges on cloud KVS • Data Volume variation As overall

    workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease. • Skewness (skewed vs uniform workloads) Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys. • Shifting Hotspots Hot data may become cold and vice versa. Systems should prioritize data in the new hot set and demote data in the old one.
  4. Anna architecture Responds to workload changes and meet the SLO

    Modifies resource allocation based on metrics from the policy engine Vertical tiering to support hot data. RAM from EC2 to reduce latency. Vertical tiering to support cold data. EBS disk to balance cost.
  5. Storage Kernel • Worker threads vary according to hardware Memory

    nodes = num of CPU cores Disk nodes = num of CPU cores x4
  6. Storage Kernel • Worker threads vary according to hardware Memory

    nodes = num of CPU cores Disk nodes = num of CPU cores x4 • Hash Rings store key to resource Global rings determines which nodes stores each key. Local rings determines which worker threads stores each key.
  7. Storage Kernel • Worker threads vary according to hardware Memory

    nodes = num of CPU cores Disk nodes = num of CPU cores x4 • Hash Rings store key to resource Global rings determines which nodes stores each key. Local rings determines which worker threads stores each key. • Periodical multicast to pair nodes Shared-nothing, asynchronous messaging scheme. High CPU utilization (90%) while being multi-mastered.
  8. Data Movement & Hot-Key Replication • If a key's access

    frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement.
  9. Data Movement & Hot-Key Replication • If a key's access

    frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement. • If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier
  10. Data Movement & Hot-Key Replication • If a key's access

    frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement. • If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier • Hot-keys are replicated across memory-tier replicas, depending on its access frequency Data is replicated across nodes and across cores (intra-node replicas). Replicating across nodes is preferable. Replicating across cores causes all threads from a single node to compete for the same network bandwidth.
  11. Elasticity • Node addition happens when there is insufficient storage

    or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests).
  12. Elasticity • Node addition happens when there is insufficient storage

    or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests). • Node removal Queries the hash ring and update it to remove itself. Broadcasts its absence to all nodes in the system (storage, monitoring, and routing).
  13. Elasticity • Node addition happens when there is insufficient storage

    or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests). • Node removal Queries the hash ring and update it to remove itself. Broadcasts its absence to all nodes in the system (storage, monitoring, and routing). • Grace periods When resource allocation is modified it briefly increasing request latency. Key demotion, hot-key replication, and elasticity actions are all delayed till after the grace period.
  14. Dynamic Workload Skew & Volume • Latency SLO satisfied 97%

    of the time • 12 memory nodes with latency objective of 3.3 ms At minute 3 the high contention workload starts (Zipfian = 2) After a brief spike, highly contended keys are replicated At minute 13 workload reduces, increase the volume x4 (Zipfian = 0.5) Policy engine reduce replication factor, and adds 4 new nodes
  15. Limitations • Reactive Policy Design creates barriers for meeting SLOs

    and SLAs. Proactive approaches could be a better fit.
  16. Limitations • Reactive Policy Design creates barriers for meeting SLOs

    and SLAs. Proactive approaches could be a better fit. • Autoscaling Overheads imposes a considerable time penalty for adding and removing nodes. Maintaining a "pre warmed" pool of nodes could help (not cost-effective). Take leverage on VM research for micro VMs such as Firecracker. A short workload spike to trigger elasticity, followed by an immediate decrease would lead Anna to allocate unnecessary nodes.
  17. Conclusion Anna is efficient on handling non-trivial distributions of access

    patterns for key-value storage. https://github.com/hydro-project/anna
  18. Conclusion Anna is efficient on handling non-trivial distributions of access

    patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. https://github.com/hydro-project/anna
  19. Conclusion Anna is efficient on handling non-trivial distributions of access

    patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting and demoting frequently accessed keys. https://github.com/hydro-project/anna
  20. Conclusion Anna is efficient on handling non-trivial distributions of access

    patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting and demoting frequently accessed keys. Shifting hotspots are supported by the hot-key replication, both intra nodes and across nodes. https://github.com/hydro-project/anna