Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Autoscaling Tiered Cloud Storage in Anna

Lucas Bleme
February 07, 2022

Autoscaling Tiered Cloud Storage in Anna

Presentation of the paper "Autoscaling Tiered Cloud Storage in Anna", presented at the Massive Data Processing discipline's seminar at DCC/UFMG 2022/01.

Presentation (PT-BR): https://www.youtube.com/watch?v=gzspGRsFVEs
Original paper published by VLDB 19: https://dsf.berkeley.edu/jmh/papers/anna_vldb_19.pdf

Lucas Bleme

February 07, 2022
Tweet

More Decks by Lucas Bleme

Other Decks in Science

Transcript

  1. Processamento de Dados Massivos em Nuvem
    Lucas Bleme

    View Slide

  2. Challenges on cloud KVS
    ● Data Volume variation
    As overall workload grows, the aggregate throughput of the system must grow.
    The system should automatically increase resource allocation.
    When workload decreases, resource usage and cost should decrease.

    View Slide

  3. Challenges on cloud KVS
    ● Data Volume variation
    As overall workload grows, the aggregate throughput of the system must grow.
    The system should automatically increase resource allocation.
    When workload decreases, resource usage and cost should decrease.
    ● Skewness (skewed vs uniform workloads)
    Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys.

    View Slide

  4. Challenges on cloud KVS
    ● Data Volume variation
    As overall workload grows, the aggregate throughput of the system must grow.
    The system should automatically increase resource allocation.
    When workload decreases, resource usage and cost should decrease.
    ● Skewness (skewed vs uniform workloads)
    Even at a fixed volume, a highly skewed workload will make many requests to a small subset of keys.
    ● Shifting Hotspots
    Hot data may become cold and vice versa.
    Systems should prioritize data in the new hot set and demote data in the old one.

    View Slide

  5. Anna architecture
    Responds to workload changes
    and meet the SLO
    Modifies resource allocation
    based on metrics from the policy
    engine
    Vertical tiering to support
    hot data. RAM from EC2 to
    reduce latency.
    Vertical tiering to support
    cold data. EBS disk to
    balance cost.

    View Slide

  6. Storage Kernel
    ● Worker threads vary according to hardware
    Memory nodes = num of CPU cores
    Disk nodes = num of CPU cores x4

    View Slide

  7. Storage Kernel
    ● Worker threads vary according to hardware
    Memory nodes = num of CPU cores
    Disk nodes = num of CPU cores x4
    ● Hash Rings store key to resource
    Global rings determines which nodes stores each key.
    Local rings determines which worker threads stores each key.

    View Slide

  8. Storage Kernel
    ● Worker threads vary according to hardware
    Memory nodes = num of CPU cores
    Disk nodes = num of CPU cores x4
    ● Hash Rings store key to resource
    Global rings determines which nodes stores each key.
    Local rings determines which worker threads stores each key.
    ● Periodical multicast to pair nodes
    Shared-nothing, asynchronous messaging scheme.
    High CPU utilization (90%) while being multi-mastered.

    View Slide

  9. Data Movement & Hot-Key Replication
    ● If a key's access frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier
    If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement.

    View Slide

  10. Data Movement & Hot-Key Replication
    ● If a key's access frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier
    If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement.
    ● If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier

    View Slide

  11. Data Movement & Hot-Key Replication
    ● If a key's access frequency exceeds a threshold P, Anna promotes a EBS replica to the memory tier
    If aggregate storage is full, Anna adds nodes to increase capacity before performing data movement.
    ● If it falls below a threshold D, and the keys already have replicas, they are demoted to the EBS tier
    ● Hot-keys are replicated across memory-tier replicas, depending on its access frequency
    Data is replicated across nodes and across cores (intra-node replicas).
    Replicating across nodes is preferable.
    Replicating across cores causes all threads from a single node to compete for the same network bandwidth.

    View Slide

  12. Elasticity
    ● Node addition happens when there is insufficient storage or insufficient compute capacity
    The policy engine computes the number of nodes required based on data size, subject to cost constraints.
    Node addition never happens on EBS tier (memory tier supports 15x the requests).

    View Slide

  13. Elasticity
    ● Node addition happens when there is insufficient storage or insufficient compute capacity
    The policy engine computes the number of nodes required based on data size, subject to cost constraints.
    Node addition never happens on EBS tier (memory tier supports 15x the requests).
    ● Node removal
    Queries the hash ring and update it to remove itself.
    Broadcasts its absence to all nodes in the system (storage, monitoring, and routing).

    View Slide

  14. Elasticity
    ● Node addition happens when there is insufficient storage or insufficient compute capacity
    The policy engine computes the number of nodes required based on data size, subject to cost constraints.
    Node addition never happens on EBS tier (memory tier supports 15x the requests).
    ● Node removal
    Queries the hash ring and update it to remove itself.
    Broadcasts its absence to all nodes in the system (storage, monitoring, and routing).
    ● Grace periods
    When resource allocation is modified it briefly increasing request latency.
    Key demotion, hot-key replication, and elasticity actions are all delayed till after the grace period.

    View Slide

  15. Dynamic Workload Skew & Volume
    ● Latency SLO satisfied 97% of the time
    ● 12 memory nodes with latency objective of 3.3 ms
    At minute 3 the high contention workload starts (Zipfian = 2)
    After a brief spike, highly contended keys are replicated
    At minute 13 workload reduces, increase the volume x4 (Zipfian = 0.5)
    Policy engine reduce replication factor, and adds 4 new nodes

    View Slide

  16. Limitations
    ● Reactive Policy Design creates barriers for meeting SLOs and SLAs.
    Proactive approaches could be a better fit.

    View Slide

  17. Limitations
    ● Reactive Policy Design creates barriers for meeting SLOs and SLAs.
    Proactive approaches could be a better fit.
    ● Autoscaling Overheads imposes a considerable time penalty for adding and removing nodes.
    Maintaining a "pre warmed" pool of nodes could help (not cost-effective).
    Take leverage on VM research for micro VMs such as Firecracker.
    A short workload spike to trigger elasticity, followed by an immediate decrease would lead Anna to allocate unnecessary nodes.

    View Slide

  18. Conclusion
    Anna is efficient on handling non-trivial distributions of access patterns for key-value storage.
    https://github.com/hydro-project/anna

    View Slide

  19. Conclusion
    Anna is efficient on handling non-trivial distributions of access patterns for key-value storage.
    It supports data volume variation applying its elasticity mechanisms for adding and removing nodes.
    https://github.com/hydro-project/anna

    View Slide

  20. Conclusion
    Anna is efficient on handling non-trivial distributions of access patterns for key-value storage.
    It supports data volume variation applying its elasticity mechanisms for adding and removing nodes.
    It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting
    and demoting frequently accessed keys.
    https://github.com/hydro-project/anna

    View Slide

  21. Conclusion
    Anna is efficient on handling non-trivial distributions of access patterns for key-value storage.
    It supports data volume variation applying its elasticity mechanisms for adding and removing nodes.
    It is capable of handling bursts on skewed workloads by applying data movement between tiers, promoting
    and demoting frequently accessed keys.
    Shifting hotspots are supported by the hot-key replication, both intra nodes and across nodes.
    https://github.com/hydro-project/anna

    View Slide

  22. Thank you! https://speakerdeck.com/andreybleme
    Lucas Bleme
    [email protected]

    View Slide