Slide 1

Slide 1 text

How & When to Shard

Slide 2

Slide 2 text

Solution Architect Based in London http://www.10gen.com/ @dmroberts [email protected] sdf

Slide 3

Slide 3 text

Agenda Architecture How it Works Choosing a shard key When to Shard • • • • 3

Slide 4

Slide 4 text

http://community.qlikview.com/cfs- filesystemfile.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/theqlikviewblog/Cutting-Grass- with-Scissors-_2D00_-2.jpg

Slide 5

Slide 5 text

http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg

Slide 6

Slide 6 text

MongoDB Scaling - Single Node write read node_a1

Slide 7

Slide 7 text

Read scaling - add Replicas write read node_b1 node_a1

Slide 8

Slide 8 text

Read scaling - add Replicas write read node_c1 node_b1 node_a1

Slide 9

Slide 9 text

Write scaling - Sharding write read shard1 node_c1 node_b1 node_a1

Slide 10

Slide 10 text

Write scaling - add Shards write read shard1 node_c1 node_b1 node_a1 shard2 node_c2 node_b2 node_a2

Slide 11

Slide 11 text

Write scaling - add Shards write read shard1 node_c1 node_b1 node_a1 shard2 node_c2 node_b2 node_a2 shard3 node_c3 node_b3 node_a3

Slide 12

Slide 12 text

MongoDB Sharding Automatic partitioning and management Range based Convert to sharded system with no downtime Fully consistent • • • •

Slide 13

Slide 13 text

Range Based Partitioning > db.posts.save( {age:40} ) -∞ +∞ -∞ 40 41 +∞ Data in inserted Ranges are split into more “chunks” • •

Slide 14

Slide 14 text

How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save( {age:50} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ More Data in inserted Ranges are split into more“chunks” • •

Slide 15

Slide 15 text

How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save( {age:50} ) > db.posts.save( {age:60} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ 61 +∞ 51 60

Slide 16

Slide 16 text

-∞ +∞ 41 +∞ 51 +∞ How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save( {age:50} ) > db.posts.save( {age:60} ) -∞ 40 41 50 61 +∞ 51 60 shard1

Slide 17

Slide 17 text

How MongoDB Sharding works > db.runCommand( { addshard : "shard2" } ); -∞ 40 41 50 61 +∞ 51 60 shard1 shard2 > db.runCommand( { addshard : "shard3" } ); shard3

Slide 18

Slide 18 text

SHARDING ARCHITECTURE

Slide 19

Slide 19 text

Architecture

Slide 20

Slide 20 text

mongos Shard Router Acts just like a MongoD 1 or as many as you want Can run on App Servers Caches meta-data from config servers • • • • •

Slide 21

Slide 21 text

Config Server 3 of them Changes use 2 phase commit If any are down, meta data goes read only System is online as long as 1/3 is up • • • •

Slide 22

Slide 22 text

HOW IT WORKS

Slide 23

Slide 23 text

Keys { name: “Jared”, email: “[email protected]”, } { name: “Scott”, email: “[email protected]”, } { name: “Dan”, email: “[email protected]”, } > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )

Slide 24

Slide 24 text

Chunks -∞ +∞

Slide 25

Slide 25 text

Slide 26

Slide 26 text

Slide 27

Slide 27 text

Chunks -∞ +∞ [email protected] [email protected] [email protected] Split! This is a chunk This is a chunk

Slide 28

Slide 28 text

Slide 29

Slide 29 text

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Chunks Min Key Max Key Shard -∞ [email protected] 1 [email protected] [email protected] 1 [email protected] [email protected] 1 [email protected] +∞ 1 Stored in the config servers Cached in MongoS Used to route requests and keep cluster balanced • • •

Slide 32

Slide 32 text

Balancing Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks!

Slide 33

Slide 33 text

Balancing mongos balancer config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance

Slide 34

Slide 34 text

Balancing mongos balancer Move chunk 1 to Shard 2 config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48

Slide 35

Slide 35 text

Balancing mongos balancer config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1

Slide 36

Slide 36 text

Balancing mongos balancer Chunk 1 now lives on Shard 2 config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48

Slide 37

Slide 37 text

ROUTING

Slide 38

Slide 38 text

Routed Request mongos Shard 1 Shard 2 Shard 3 1 2 3 4 Query arrives at MongoS MongoS routes query to a single shard Shard returns results of query Results returned to client 1. 2. 3. 4.

Slide 39

Slide 39 text

Scatter Gather mongos Shard 1 Shard 2 Shard 3 1 4 Query arrives at MongoS MongoS broadcasts query to all shards Each shard returns results for query Results combined and returned to client 1. 2. 3. 4. 2 2 3 3 2 3

Slide 40

Slide 40 text

Distributed Merge Sort mongos Shard 1 Shard 2 Shard 3 1 3 6 Query arrives at MongoS MongoS broadcasts query to all shards Each shard locally sorts results Results returned to mongos MongoS merge sorts individual results Combined sorted result returned to client 1. 2. 3. 4. 5. 6. 2 2 3 3 4 4 5 2 4

Slide 41

Slide 41 text

Choosing a shard key How does you application query the data? Most common queries Value of the key is important Random distribution of values Cardinality Not incremental Could be compound {a:1,b:1} or concatenated ‘a+b’ • • • • • • • •

Slide 42

Slide 42 text

Only have to keep small portion in ram Right shard "hot" • • Time Based ObjectId Auto Increment • • • Incremental Right Balanced Access

Slide 43

Slide 43 text

Have to keep entire index in ram All shards "warm" • • Hash • Random distribution

Slide 44

Slide 44 text

Have to keep entire index in ram Some shards "warm" • • Month + Hash • Segmented access

Slide 45

Slide 45 text

Impact on Schema Design { _id : "alvin", display: "jonnyeight", addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id : 1 } Lookup by _id hits 1 node Index on { “addresses.country” : 1 }

Slide 46

Slide 46 text

Multiple Identities - Example User can have multiple identities twitter name email address facebook name etc. What is the best sharding key & schema design? • • • •

Slide 47

Slide 47 text

Multiple Identities - Solution 1 { _id: "alvin", display: "jonnyeight", fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin addresses : [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id: 1 } Lookup by _id hits 1 node Lookup by li or fb is scatter gather Cannot create a unique index on li or fb

Slide 48

Slide 48 text

Multiple Identities - Solution 2 identities { type: "_id", val: "alvin", info: "1200-42"} { type: "fb", val: "alvin.richards", info: "1200-42"} { type: "li", val: "alvin.j.richards",info: "1200-42"} info { _id: "1200-42", addresses : [ { state : "CA", country: "USA" }, { country: "UK" }] } Shard identities on { type : 1, val : 1 } Lookup by type & val hits 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id hits one node

Slide 49

Slide 49 text

When to shard? When you are running out of hardware resources Need to scale RAM or Disk IO? Throughput or data size? Shard only if you need to Use Monitoring Tools Mongostat, db.serverStatus(), iostat MMS - http://mms.10gen.com/ Working Set and Indexes in RAM page faults and BTree index misses • • • • • • • • •

Slide 50

Slide 50 text

Data Set larger than RAM? write read shard1 A-M N-P R-Z 300 GB Data 3:1 Data/Mem 96 GB Mem

Slide 51

Slide 51 text

Cache everything in RAM write read shard1 A-M shard2 N-P shard3 R-Z 300 GB Data 1:1 Data/Mem 96 GB Mem

Slide 52

Slide 52 text

Summary Shard to horizontally scale your application Choose Shard Keys wisely Sharding may effect your schema design Shard when you need to: Listen to the metrics Monitor and watch the trends Shard early • • • • • • •

Slide 53

Slide 53 text

Solution Architect Based in London http://www.10gen.com/ @dmroberts [email protected] sdf