Slide 1

Slide 1 text

Software Engineer, 10gen @brandonmblack Brandon Black #MongoDBDays Introduction to Sharding

Slide 2

Slide 2 text

Agenda •  Scaling Data •  MongoDB's Approach •  Architecture •  Configuration •  Mechanics

Slide 3

Slide 3 text

Scaling Data

Slide 4

Slide 4 text

Examining Growth •  User Growth –  1995: 0.4% of the world’s population –  Today: 30% of the world is online (~2.2B) –  Emerging Markets & Mobile •  Data Set Growth –  Facebook’s data set is around 100 petabytes –  4 billion photos taken in the last year (4x a decade ago)

Slide 5

Slide 5 text

Read/Write Throughput Exceeds I/O

Slide 6

Slide 6 text

Working Set Exceeds Physical Memory

Slide 7

Slide 7 text

Vertical Scalability (Scale Up)

Slide 8

Slide 8 text

Horizontal Scalability (Scale Out)

Slide 9

Slide 9 text

Data Store Scalability •  Custom Hardware –  Oracle •  Custom Software –  Facebook + MySQL –  Google

Slide 10

Slide 10 text

Data Store Scalability Today •  MongoDB Auto-Sharding •  A data store that is –  100% Free –  Publicly available –  Open-source (https://github.com/mongodb/mongo) –  Horizontally scalable –  Application independent

Slide 11

Slide 11 text

MongoDB's Approach to Sharding

Slide 12

Slide 12 text

Partitioning •  User defines shard key •  Shard key defines range of data •  Key space is like points on a line •  Range is a segment of that line -∞ +∞ Key Space

Slide 13

Slide 13 text

Data Distribution •  Initially 1 chunk •  Default max chunk size: 64mb •  MongoDB automatically splits & migrates chunks when max reached Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod

Slide 14

Slide 14 text

Routing and Balancing •  Queries routed to specific shards •  MongoDB balances cluster •  MongoDB migrates data to new nodes Shard Shard Shard Mongos 1 2 3 4

Slide 15

Slide 15 text

MongoDB Auto-Sharding •  Minimal effort required –  Same interface as single mongod •  Two steps –  Enable Sharding for a database –  Shard collection within database

Slide 16

Slide 16 text

Architecture

Slide 17

Slide 17 text

What is a Shard? •  Shard is a node of the cluster •  Shard can be a single mongod or a replica set Shard Primary Secondary Secondary Shard or Mongod

Slide 18

Slide 18 text

•  Config Server –  Stores cluster chunk ranges and locations –  Can have only 1 or 3 (production must have 3) –  Not a replica set Meta Data Storage or Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server

Slide 19

Slide 19 text

Routing and Managing Data •  Mongos –  Acts as a router / balancer –  No local data (persists to config database) –  Can have 1 or many App Server Mongos Mongos App Server App Server App Server Mongos or

Slide 20

Slide 20 text

Sharding infrastructure Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server

Slide 21

Slide 21 text

Configuration

Slide 22

Slide 22 text

Example Cluster •  Don’t use this setup in production! -  Only one Config server (No Fault Tolerance) -  Shard not in a replica set (Low Availability) -  Only one mongos and shard (No Performance Improvement) -  Useful for development or demonstrating configuration mechanics Node 1 Secondary Config Server Mongos Mongod Mongod

Slide 23

Slide 23 text

Node 1 Secondary Config Server Starting the Configuration Server •  mongod --configsvr •  Starts a configuration server on the default port (27019)

Slide 24

Slide 24 text

Node 1 Secondary Config Server Mongos Start the mongos Router •  mongos --configdb :27019! •  For 3 configuration servers: mongos --configdb :,:,:! •  This is always how to start a new mongos, even if the cluster is already running

Slide 25

Slide 25 text

Start the shard database •  mongod --shardsvr! •  Starts a mongod with the default shard port (27018) •  Shard is not yet connected to the rest of the cluster •  Shard may have already been running in production Node 1 Secondary Config Server Mongos Mongod Shard

Slide 26

Slide 26 text

Add the Shard •  On mongos: -  sh.addShard(‘:27018’)! •  Adding a replica set: -  sh.addShard(‘/’)! Node 1 Secondary Config Server Mongos Mongod Shard

Slide 27

Slide 27 text

Verify that the shard was added •  db.runCommand({ listshards:1 })! { "shards" : 
 ![{"_id”: "shard0000”,"host”: ”:27018” } ],! "ok" : 1
 }! Node 1 Secondary Config Server Mongos Mongod Shard

Slide 28

Slide 28 text

Enabling Sharding •  Enable sharding on a database ! sh.enableSharding(“”)! •  Shard a collection with the given key ! sh.shardCollection(“.people”,{“country”:1})! •  Use a compound shard key to prevent duplicates ! sh.shardCollection(“.cars”,{“year”:1, ”uniqueid”:1})!

Slide 29

Slide 29 text

Tag Aware Sharding •  Tag aware sharding allows you to control the distribution of your data •  Tag a range of shard keys sh.addTagRange(,,,) •  Tag a shard sh.addShardTag(,)

Slide 30

Slide 30 text

Mechanics

Slide 31

Slide 31 text

Partitioning •  Remember it's based on ranges -∞ +∞ Key Space

Slide 32

Slide 32 text

Chunk is a section of the entire range minKey maxKey minKey maxKey minKey maxKey {x: -20} {x: 13} {x: 25} {x: 100,000} 64MB

Slide 33

Slide 33 text

Chunk splitting •  A chunk is split once it exceeds the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) minKey maxKey minKey 13 14 maxKey

Slide 34

Slide 34 text

Balancing •  Balancer is running on mongos! •  Once the difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod

Slide 35

Slide 35 text

Acquiring the Balancer Lock •  The balancer on mongos takes out a “balancer lock” •  To see the status of these locks: use config! db.locks.find({ _id: “balancer” })! Node 1 Secondary Config Server Mongos Shard 1 Mongos Mongos Shard 2 Mongod

Slide 36

Slide 36 text

Moving the chunk •  The mongos sends a moveChunk command to source shard •  The source shard then notifies destination shard •  Destination shard starts pulling documents from source shard Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod

Slide 37

Slide 37 text

Committing Migration •  When complete, destination shard updates config server -  Provides new locations of the chunks Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod

Slide 38

Slide 38 text

Cleanup •  Source shard deletes moved data -  Must wait for open cursors to either close or time out -  NoTimeout cursors may prevent the release of the lock •  The mongos releases the balancer lock after old chunks are deleted Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos

Slide 39

Slide 39 text

Routing Requests

Slide 40

Slide 40 text

Cluster Request Routing •  Targeted Queries •  Scatter Gather Queries •  Scatter Gather Queries with Sort

Slide 41

Slide 41 text

Cluster Request Routing: Targeted Query Shard Shard Shard Mongos

Slide 42

Slide 42 text

Routable request received Shard Shard Shard Mongos 1

Slide 43

Slide 43 text

Request routed to appropriate shard Shard Shard Shard Mongos 1 2

Slide 44

Slide 44 text

Shard returns results Shard Shard Shard Mongos 1 2 3

Slide 45

Slide 45 text

Mongos returns results to client Shard Shard Shard Mongos 1 2 3 4

Slide 46

Slide 46 text

Cluster Request Routing: Non-Targeted Query Shard Shard Shard Mongos

Slide 47

Slide 47 text

Non-Targeted Request Received Shard Shard Shard Mongos 1

Slide 48

Slide 48 text

Request sent to all shards Shard Shard Shard Mongos 1 2 2 2

Slide 49

Slide 49 text

Shards return results to mongos Shard Shard Shard Mongos 1 2 2 2 3 3 3

Slide 50

Slide 50 text

Mongos returns results to client Shard Shard Shard Mongos 1 2 2 2 3 3 3 4

Slide 51

Slide 51 text

Cluster Request Routing: Non-Targeted Query with Sort Shard Shard Shard Mongos

Slide 52

Slide 52 text

Non-Targeted request with sort received Shard Shard Shard Mongos 1

Slide 53

Slide 53 text

Request sent to all shards Shard Shard Shard Mongos 1 2 2 2

Slide 54

Slide 54 text

Query and sort performed locally Shard Shard Shard Mongos 1 2 2 2 3 3 3

Slide 55

Slide 55 text

Shards return results to mongos Shard Shard Shard Mongos 1 2 2 2 4 4 4 3 3 3

Slide 56

Slide 56 text

Mongos merges sorted results Shard Shard Shard Mongos 1 2 2 2 4 4 4 3 3 3 5

Slide 57

Slide 57 text

Mongos returns results to client Shard Shard Shard Mongos 1 2 2 2 4 4 4 3 3 3 6 5

Slide 58

Slide 58 text

Shard Key

Slide 59

Slide 59 text

Shard Key •  Shard key is immutable •  Shard key values are immutable •  Shard key must be indexed •  Shard key limited to 512 bytes in size •  Shard key used to route queries –  Choose a field commonly used in queries •  Only shard key can be unique across shards –  `_id` field is only unique within individual shard

Slide 60

Slide 60 text

Shard Key Considerations •  Cardinality •  Write Distribution •  Query Isolation •  Reliability •  Index Locality

Slide 61

Slide 61 text

Conclusion

Slide 62

Slide 62 text

Read/Write Throughput Exceeds I/O

Slide 63

Slide 63 text

Working Set Exceeds Physical Memory

Slide 64

Slide 64 text

Sharding Enables Scale •  MongoDB’s Auto-Sharding –  Easy to Configure –  Consistent Interface –  Free and Open-Source

Slide 65

Slide 65 text

•  What’s next? –  Hash-Based Sharding in MongoDB 2.4 (2:50pm) –  Webinar: Indexing and Query Optimization (May 22nd) –  Online Education Program –  MongoDB User Group •  Resources https://education.10gen.com/ http://www.10gen.com/presentations http://www.10gen.com/events http://github.com/brandonblack/presentations

Slide 66

Slide 66 text

Software Engineer, 10gen @brandonmblack Brandon Black #MongoDBDays Thank You