Autoscaling Tiered Cloud Storage in Anna

Processamento de Dados Massivos em Nuvem Lucas Bleme

Challenges on cloud KVS • Data Volume variation As overall
workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease.

workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease. • Skewness (skewed vs uniform workloads) Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys.

workload grows, the aggregate throughput of the system must grow. The system should automatically increase resource allocation. When workload decreases, resource usage and cost should decrease. • Skewness (skewed vs uniform workloads) Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys. • Shifting Hotspots Hot data may become cold and vice versa. Systems should prioritize data in the new hot set and demote data in the old one.

Anna architecture Responds to workload changes and meet the SLO
Modifies resource allocation based on metrics from the policy engine Vertical tiering to support hot data. RAM from EC2 to reduce latency. Vertical tiering to support cold data. EBS disk to balance cost.

Storage Kernel • Worker threads vary according to hardware Memory
nodes = num of CPU cores Disk nodes = num of CPU cores x4

nodes = num of CPU cores Disk nodes = num of CPU cores x4 • Hash Rings store key to resource Global rings determines which nodes stores each key. Local rings determines which worker threads stores each key.

nodes = num of CPU cores Disk nodes = num of CPU cores x4 • Hash Rings store key to resource Global rings determines which nodes stores each key. Local rings determines which worker threads stores each key. • Periodical multicast to pair nodes Shared-nothing, asynchronous messaging scheme. High CPU utilization (90%) while being multi-mastered.

Data Movement & Hot-Key Replication • If a key's access
frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement.

frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement. • If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier

frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement. • If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier • Hot-keys are replicated across memory-tier replicas, depending on its access frequency Data is replicated across nodes and across cores (intra-node replicas). Replicating across nodes is preferable. Replicating across cores causes all threads from a single node to compete for the same network bandwidth.

Elasticity • Node addition happens when there is insufficient storage
or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests).

or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests). • Node removal Queries the hash ring and update it to remove itself. Broadcasts its absence to all nodes in the system (storage, monitoring, and routing).

or insufficient compute capacity The policy engine computes the number of nodes required based on data size, subject to cost constraints. Node addition never happens on EBS tier (memory tier supports 15x the requests). • Node removal Queries the hash ring and update it to remove itself. Broadcasts its absence to all nodes in the system (storage, monitoring, and routing). • Grace periods When resource allocation is modified it briefly increasing request latency. Key demotion, hot-key replication, and elasticity actions are all delayed till after the grace period.

Dynamic Workload Skew & Volume • Latency SLO satisfied 97%
of the time • 12 memory nodes with latency objective of 3.3 ms At minute 3 the high contention workload starts (Zipfian = 2) After a brief spike, highly contended keys are replicated At minute 13 workload reduces, increase the volume x4 (Zipfian = 0.5) Policy engine reduce replication factor, and adds 4 new nodes

Limitations • Reactive Policy Design creates barriers for meeting SLOs
and SLAs. Proactive approaches could be a better fit.

Limitations • Reactive Policy Design creates barriers for meeting SLOs
and SLAs. Proactive approaches could be a better fit. • Autoscaling Overheads imposes a considerable time penalty for adding and removing nodes. Maintaining a "pre warmed" pool of nodes could help (not cost-effective). Take leverage on VM research for micro VMs such as Firecracker. A short workload spike to trigger elasticity, followed by an immediate decrease would lead Anna to allocate unnecessary nodes.

Conclusion Anna is efficient on handling non-trivial distributions of access
patterns for key-value storage. https://github.com/hydro-project/anna

patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. https://github.com/hydro-project/anna

patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting and demoting frequently accessed keys. https://github.com/hydro-project/anna

patterns for key-value storage. It supports data volume variation applying its elasticity mechanisms for adding and removing nodes. It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting and demoting frequently accessed keys. Shifting hotspots are supported by the hot-key replication, both intra nodes and across nodes. https://github.com/hydro-project/anna

Thank you! https://speakerdeck.com/andreybleme Lucas Bleme [email protected]

Autoscaling Tiered Cloud Storage in Anna

Autoscaling Tiered Cloud Storage in Anna

Lucas Bleme

More Decks by Lucas Bleme

Other Decks in Science

Featured

Transcript

Processamento de Dados Massivos em Nuvem Lucas Bleme

Challenges on cloud KVS • Data Volume variation As overall

Challenges on cloud KVS • Data Volume variation As overall

Challenges on cloud KVS • Data Volume variation As overall

Anna architecture Responds to workload changes and meet the SLO

Storage Kernel • Worker threads vary according to hardware Memory

Storage Kernel • Worker threads vary according to hardware Memory

Storage Kernel • Worker threads vary according to hardware Memory

Data Movement & Hot-Key Replication • If a key's access

Data Movement & Hot-Key Replication • If a key's access

Data Movement & Hot-Key Replication • If a key's access

Elasticity • Node addition happens when there is insufficient storage

Elasticity • Node addition happens when there is insufficient storage

Elasticity • Node addition happens when there is insufficient storage

Dynamic Workload Skew & Volume • Latency SLO satisfied 97%

Limitations • Reactive Policy Design creates barriers for meeting SLOs

Limitations • Reactive Policy Design creates barriers for meeting SLOs

Conclusion Anna is efficient on handling non-trivial distributions of access

Conclusion Anna is efficient on handling non-trivial distributions of access

Conclusion Anna is efficient on handling non-trivial distributions of access

Conclusion Anna is efficient on handling non-trivial distributions of access

Thank you! https://speakerdeck.com/andreybleme Lucas Bleme [email protected]