Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharding Strategies

Sharding Strategies

Let's explore the three main sharding strategies - Lookup, Range, and Hash - and dive into their unique characteristics, benefits, and ideal use cases with simple examples.

Matteo Bertozzi

June 01, 2023
Tweet

More Decks by Matteo Bertozzi

Other Decks in Programming

Transcript

  1. Sharding S t r a t e g i e

    s Shard C Shard B Shard A Horizontal Scalability Thread/Shard per core Architecture
  2. Sharding Scaling your system by adding more machines to handle

    increased load or data volume. Horizontal Scalability
  3. Thread/Shard per core Architecture Scaling your system by adding more

    machines to handle increased load or data volume. Sharding Horizontal Scalability
  4. Thread/Shard per core Architecture Distributing data for scalable performance Scaling

    your system by adding more machines to handle increased load or data volume. Sharding Horizontal Scalability
  5. Sharding distributes data across shards using a routing map that

    directs requests based on a specific key. allowing logical categorization/segregation The Lookup Strategy
  6. Sharding Shard A The Lookup Strategy distributes data across shards

    using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  7. Sharding Shard B Shard A The Lookup Strategy distributes data

    across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  8. Sharding Shard C Shard B Shard A The Lookup Strategy

    distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  9. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  10. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  11. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  12. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation = Shard C
  13. Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:*

    Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation = Shard C
  14. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C = Shard A
  15. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A = Shard C
  16. Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products

    Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) “Tenant2:News:IPO:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C = Shard B
  17. Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:*

    Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) = Shard B “Tenant2:News:IPO:…” “Tenant1:Products:Car:…” -> getShard(“Tenant1:Products”) = Shard B “Tenant1:Products:Car:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  18. Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:*

    Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) = Shard B “Tenant2:News:IPO:…” “Tenant1:Products:Car:…” -> getShard(“Tenant1:Products”) = Shard B “Tenant1:Products:Car:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation
  19. Sharding distributes data across shards in a sorted manner, allowing

    efficient sequential scans and range-based queries. The Range Strategy
  20. Sharding The Range Strategy distributes data across shards in a

    sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 2 Shard 3
  21. Sharding Shard 1 Key Shard aaaa-jjjj Shard 1 kkkk-ssss Shard

    2 tttt-zzzz Shard 3 aaaa-jjjj The Range Strategy Shard 2 Shard 3 kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries.
  22. Sharding “Hello” Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj The Range

    Strategy kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3
  23. Sharding “Hello” -> getShard(“Hello”) Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj

    The Range Strategy kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3
  24. Sharding Hello “Hello” -> getShard(“Hello”) = Shard 1 Key Shard

    aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3
  25. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World “World” -> getShard(“World”) =

    Shard 3 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Hello” -> getShard(“Hello”) = Shard 1
  26. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test “Test” -> getShard(“Test”)

    = Shard 3 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1
  27. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo “Demo” ->

    getShard(“Demo”) = Shard 1 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1
  28. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase”

    -> getShard(“Showcase”) = Shard 2 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1
  29. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase”

    -> getShard(“Showcase”) = Shard 2 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1 The Range Strategy Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3
  30. Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase”

    -> getShard(“Showcase”) = Shard 2 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1 The Range Strategy Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Get anything from A to E -> getShard(A-E) = Shard 1 Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3
  31. Sharding distributes data across shards based on a hash function,

    ensuring even data distribution and load balancing. The Hash Strategy
  32. Sharding Shard 0 The Hash Strategy Shard 1 Shard 2

    distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  33. Sharding Shard = Shard 0 The Hash Strategy Shard 1

    Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  34. Sharding Shard = (Key) hash Shard 0 The Hash Strategy

    Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  35. Sharding Shard = % NumShards (Key) hash Shard 0 The

    Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  36. Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = %

    NumShards (Key) hash Non-cryptographic hash functions Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  37. Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = %

    NumShards (Key) hash Non-cryptographic hash functions “Hello” Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  38. Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = %

    NumShards (Key) hash Non-cryptographic hash functions “Hello” -> hash(“Hello”) % 3 Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  39. Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = %

    NumShards (Key) hash Non-cryptographic hash functions “Hello” -> hash(“Hello”) % 3 = Shard 1 Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  40. Sharding Shard 0 Shard 1 Shard 2 xxh3 murmur3 sipHash

    spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  41. Sharding Shard 0 Shard 1 Shard 2 “World” -> hash(“World”)

    % 3 = Shard 0 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  42. Sharding Shard 0 Shard 1 Shard 2 “Test” -> hash(“Test”)

    % 3 = Shard 0 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  43. Sharding Shard 0 Shard 1 Shard 2 “Demo” -> hash(“Demo”)

    % 3 = Shard 2 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  44. Sharding Shard 0 Shard 1 Shard 2 xxh3 murmur3 sipHash

    spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Showcase” -> hash(“Showcase”) % 3 = Shard 1 Showcase “Demo” -> hash(“Demo”) % 3 = Shard 2 “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  45. Sharding Shard 0 Shard 1 Shard 2 The Hash Strategy

    xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Showcase” -> hash(“Showcase”) % 3 = Shard 1 Showcase “Demo” -> hash(“Demo”) % 3 = Shard 2 “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.
  46. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance
  47. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  48. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  49. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  50. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  51. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  52. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  53. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  54. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  55. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  56. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  57. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  58. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  59. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  60. Thread/Shard per core Architecture The Hash Strategy Shard = %

    NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution
  61. Sharding Mix multiple Strategies Lookup Strategy to isolate a Tenant

    Resources Hash Strategy to distribute data across them +
  62. Sharding Easy when numShards/machines are fixed Not so Easy When

    “Machines are moving” - Data Migration - Adjust Shard Mapping Consistent Hashing - Rebalance Data