Slide 1

Slide 1 text

Sharding S t r a t e g i e s Shard C Shard B Shard A Horizontal Scalability Thread/Shard per core Architecture

Slide 2

Slide 2 text

Partitioning data for enhanced scalability Sharding

Slide 3

Slide 3 text

Sharding

Slide 4

Slide 4 text

Sharding

Slide 5

Slide 5 text

Sharding

Slide 6

Slide 6 text

Sharding Horizontal Scalability

Slide 7

Slide 7 text

Sharding Scaling your system by adding more machines to handle increased load or data volume. Horizontal Scalability

Slide 8

Slide 8 text

Thread/Shard per core Architecture Scaling your system by adding more machines to handle increased load or data volume. Sharding Horizontal Scalability

Slide 9

Slide 9 text

Thread/Shard per core Architecture Distributing data for scalable performance Scaling your system by adding more machines to handle increased load or data volume. Sharding Horizontal Scalability

Slide 10

Slide 10 text

Sharding Strategies

Slide 11

Slide 11 text

The Lookup Strategy

Slide 12

Slide 12 text

Sharding distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation The Lookup Strategy

Slide 13

Slide 13 text

Sharding Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 14

Slide 14 text

Sharding Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 15

Slide 15 text

Sharding Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 16

Slide 16 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 17

Slide 17 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 18

Slide 18 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 19

Slide 19 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation = Shard C

Slide 20

Slide 20 text

Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation = Shard C

Slide 21

Slide 21 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C = Shard A

Slide 22

Slide 22 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A = Shard C

Slide 23

Slide 23 text

Sharding Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) “Tenant2:News:IPO:…” The Lookup Strategy distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C = Shard B

Slide 24

Slide 24 text

Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) = Shard B “Tenant2:News:IPO:…” “Tenant1:Products:Car:…” -> getShard(“Tenant1:Products”) = Shard B “Tenant1:Products:Car:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 25

Slide 25 text

Sharding The Lookup Strategy Key Shard Tenant1:News Shard A Tenant1:* Shard B Tenant2:Products Shard C Tenant2:Users Shard C Tenant2:* Shard B Tenant3:* Shard A Tenant4:* Shard C Shard C Shard B Shard A “Tenant2:Users:Th30z:…” -> getShard(“Tenant2:Users”) = Shard C “Tenant2:Users:Th30z:…” “Tenant3:Users:Foo:…” -> getShard(“Tenant2:Users”) = Shard A “Tenant3:Users:Foo:…” “Tenant2:Products:PC:…” -> getShard(“Tenant2:Products”) = Shard C “Tenant2:Products:PC:…” “Tenant2:News:IPO:…” -> getShard(“Tenant2:News”) = Shard B “Tenant2:News:IPO:…” “Tenant1:Products:Car:…” -> getShard(“Tenant1:Products”) = Shard B “Tenant1:Products:Car:…” distributes data across shards using a routing map that directs requests based on a specific key. allowing logical categorization/segregation

Slide 26

Slide 26 text

The Range Strategy

Slide 27

Slide 27 text

Sharding distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. The Range Strategy

Slide 28

Slide 28 text

Sharding The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 2 Shard 3

Slide 29

Slide 29 text

Sharding Shard 1 Key Shard aaaa-jjjj Shard 1 kkkk-ssss Shard 2 tttt-zzzz Shard 3 aaaa-jjjj The Range Strategy Shard 2 Shard 3 kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries.

Slide 30

Slide 30 text

Sharding “Hello” Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj The Range Strategy kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3

Slide 31

Slide 31 text

Sharding “Hello” -> getShard(“Hello”) Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj The Range Strategy kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3

Slide 32

Slide 32 text

Sharding Hello “Hello” -> getShard(“Hello”) = Shard 1 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3

Slide 33

Slide 33 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World “World” -> getShard(“World”) = Shard 3 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Hello” -> getShard(“Hello”) = Shard 1

Slide 34

Slide 34 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test “Test” -> getShard(“Test”) = Shard 3 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1

Slide 35

Slide 35 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo “Demo” -> getShard(“Demo”) = Shard 1 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1

Slide 36

Slide 36 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase” -> getShard(“Showcase”) = Shard 2 Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz The Range Strategy distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1

Slide 37

Slide 37 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase” -> getShard(“Showcase”) = Shard 2 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1 The Range Strategy Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3

Slide 38

Slide 38 text

Sharding aaaa-jjjj kkkk-ssss tttt-zzzz Hello World Test Demo Showcase “Showcase” -> getShard(“Showcase”) = Shard 2 “Demo” -> getShard(“Demo”) = Shard 1 “Test” -> getShard(“Test”) = Shard 3 “World” -> getShard(“World”) = Shard 3 “Hello” -> getShard(“Hello”) = Shard 1 The Range Strategy Key Shard aaaa-jjjj kkkk-ssss tttt-zzzz distributes data across shards in a sorted manner, allowing efficient sequential scans and range-based queries. Get anything from A to E -> getShard(A-E) = Shard 1 Shard 1 Shard 1 Shard 2 Shard 3 Shard 2 Shard 3

Slide 39

Slide 39 text

Hash Strategy

Slide 40

Slide 40 text

Sharding distributes data across shards based on a hash function, ensuring even data distribution and load balancing. The Hash Strategy

Slide 41

Slide 41 text

Sharding Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 42

Slide 42 text

Sharding Shard = Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 43

Slide 43 text

Sharding Shard = (Key) hash Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 44

Slide 44 text

Sharding Shard = % NumShards (Key) hash Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 45

Slide 45 text

Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 46

Slide 46 text

Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions “Hello” Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 47

Slide 47 text

Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions “Hello” -> hash(“Hello”) % 3 Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 48

Slide 48 text

Sharding xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions “Hello” -> hash(“Hello”) % 3 = Shard 1 Shard 0 The Hash Strategy Shard 1 Shard 2 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 49

Slide 49 text

Sharding Shard 0 Shard 1 Shard 2 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 50

Slide 50 text

Sharding Shard 0 Shard 1 Shard 2 “World” -> hash(“World”) % 3 = Shard 0 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 51

Slide 51 text

Sharding Shard 0 Shard 1 Shard 2 “Test” -> hash(“Test”) % 3 = Shard 0 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 52

Slide 52 text

Sharding Shard 0 Shard 1 Shard 2 “Demo” -> hash(“Demo”) % 3 = Shard 2 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 53

Slide 53 text

Sharding Shard 0 Shard 1 Shard 2 xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Showcase” -> hash(“Showcase”) % 3 = Shard 1 Showcase “Demo” -> hash(“Demo”) % 3 = Shard 2 “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 The Hash Strategy distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 54

Slide 54 text

Sharding Shard 0 Shard 1 Shard 2 The Hash Strategy xxh3 murmur3 sipHash spookyHash city64 … Shard = % NumShards (Key) hash Non-cryptographic hash functions Hello World Test Demo “Showcase” -> hash(“Showcase”) % 3 = Shard 1 Showcase “Demo” -> hash(“Demo”) % 3 = Shard 2 “Test” -> hash(“Test”) % 3 = Shard 0 “World” -> hash(“World”) % 3 = Shard 0 “Hello” -> hash(“Hello”) % 3 = Shard 1 distributes data across shards based on a hash function, ensuring even data distribution and load balancing.

Slide 55

Slide 55 text

The Hash Strategy Shard = % NumShards (Key) hash Sharding Thread/Shard per core Architecture

Slide 56

Slide 56 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance

Slide 57

Slide 57 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 58

Slide 58 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 59

Slide 59 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 60

Slide 60 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 61

Slide 61 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 62

Slide 62 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 63

Slide 63 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 64

Slide 64 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 65

Slide 65 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 66

Slide 66 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 67

Slide 67 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 68

Slide 68 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 69

Slide 69 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 70

Slide 70 text

Thread/Shard per core Architecture The Hash Strategy Shard = % NumShards (Key) hash Sharding Consistent Lookup Performance Balanced Data Distribution

Slide 71

Slide 71 text

Sharding Mix multiple Strategies Lookup Strategy to isolate a Tenant Resources Hash Strategy to distribute data across them +

Slide 72

Slide 72 text

Sharding Easy when numShards/machines are fixed

Slide 73

Slide 73 text

Sharding Easy when numShards/machines are fixed Not so Easy When “Machines are moving”

Slide 74

Slide 74 text

Sharding Easy when numShards/machines are fixed Not so Easy When “Machines are moving” - Data Migration - Adjust Shard Mapping Consistent Hashing - Rebalance Data