Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DIBI workshop Replication

rozza
October 07, 2013

DIBI workshop Replication

rozza

October 07, 2013
Tweet

More Decks by rozza

Other Decks in Technology

Transcript

  1. Agenda •  Replica Sets Lifecycle •  Developing with Replica Sets

    •  Operational Considerations •  How Replication works
  2. Why Replication? •  How many have faced node failures? • 

    How many have been woken up from sleep to do a fail-over(s)? •  How many have experienced issues due to network latency? •  Different uses for data –  Normal processing –  Simple analytics
  3. Node 1 Secondary Node 2 Secondary Node 3 Primary Replication

    Heartbeat Replication Replica Set – Initialize
  4. Node 1 Secondary Node 2 Primary Replication Heartbeat Node 3

    Recovery Replication Replica Set – Recovery
  5. Node 1 Secondary Node 2 Primary Replication Heartbeat Node 3

    Secondary Replication Replica Set – Recovered
  6. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options
  7. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Primary DC
  8. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Secondary DC Default Priority = 1
  9. > conf = { _id : "mySet", members : [

    {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options Analytics node
  10. Backup node > conf = { _id : "mySet", members

    : [ {_id : 0, host : "A", priority : 3}, {_id : 1, host : "B", priority : 2}, {_id : 2, host : "C"}, {_id : 3, host : "D", hidden : true}, {_id : 4, host : "E", hidden : true, slaveDelay : 3600} ] } > rs.initiate(conf) Configuration Options
  11. Write Concern •  Network acknowledgement •  Wait for error • 

    Wait for journal sync •  Wait for replication
  12. Tagging •  New in 2.0.0 •  Control where data is

    written to, and read from •  Each member can have one or more tags –  tags: {dc: "ny"} –  tags: {dc: "ny", subnet: "192.168", rack: "row3rk7"} •  Replica set defines rules for write concerns •  Rules can change without changing app code
  13. { _id : "mySet", members : [ {_id : 0,

    host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}}], settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}} } } > db.blogs.insert({...}) > db.runCommand({getLastError : 1, w : "someDCs"}) Tagging Example
  14. Driver Primary (SF) Secondary (NY) getLastError write W:allDCs Secondary (Cloud)

    replicate replicate apply in memory Wait for Replication (Tagging)
  15. Read Preference Modes •  5 modes (new in 2.2) – 

    primary (only) - Default –  primaryPreferred –  secondary –  secondaryPreferred –  Nearest When more than one node is possible, closest node is used for reads (all modes but primary)
  16. Tagged Read Preference •  Custom read preferences •  Control where

    you read from by (node) tags –  E.g. { "disk": "ssd", "use": "reporting" } •  Use in conjunction with standard read preferences –  Except primary
  17. •  Single datacenter •  Single switch & power •  Points

    of failure: –  Power –  Network –  Data center –  Two node failure •  Automatic recovery of single node crash Replica Set – 1 Data Center Datacenter 2 Datacenter Member 1 Member 2 Member 3
  18. •  Multi data center •  DR node for safety • 

    Can’t do multi data center durable write safely since only 1 node in distant DC Replica Set – 2 Data Centers Member 3 Datacenter 2 Member 1 Member 2 Datacenter 1
  19. •  Three data centers •  Can survive full data center

    loss •  Can do w= { dc : 2 } to guarantee write in 2 data centers (with tags) Replica Set – 3 Data Centers Datacenter 1 Member 1 Member 2 Datacenter 2 Member 3 Member 4 Datacenter 3 Member 5
  20. Implementation details •  Heartbeat every 2 seconds –  Times out

    in 10 seconds •  Local DB (not replicated) –  system.replset –  oplog.rs •  Capped collection •  Idempotent version of operation stored
  21. > db.replsettest.insert({_id:1,value:1}) { "ts" : Timestamp(1350539727000, 1), "h" : NumberLong("6375186941486301201"),

    "op" : "i", "ns" : "test.replsettest", "o" : { "_id" : 1, "value" : 1 } } > db.replsettest.update({_id:1},{$inc:{value:10}}) { "ts" : Timestamp(1350539786000, 1), "h" : NumberLong("5484673652472424968"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "value" : 11 } } } Op(erations) Log is idempotent
  22. > db.replsettest.update({},{$set:{name : "foo"}, false, true}) { "ts" : Timestamp(1350540395000,

    1), "h" : NumberLong("-4727576249368135876"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 2 }, "o" : { "$set" : { "name" : "foo" } } } { "ts" : Timestamp(1350540395000, 2), "h" : NumberLong("-7292949613259260138"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 3 }, "o" : { "$set" : { "name" : "foo" } } } { "ts" : Timestamp(1350540395000, 3), "h" : NumberLong("-1888768148831990635"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "name" : "foo" } } } Single operation can have many entries
  23. Replica sets •  Use replica sets •  Easy to setup

    –  Try on a single machine •  Check doc page for RS tutorials –  http://docs.mongodb.org/manual/replication/#tutorials
  24. Agenda •  Why shard •  MongoDB's approach •  Architecture • 

    Configuration •  Mechanics •  Solutions
  25. Partition data based on ranges •  User defines shard key

    •  Shard key defines range of data •  Key space is like points on a line •  Range is a segment of that line -∞ +∞ Key Space
  26. Distribute data in chunks across nodes •  Initially 1 chunk

    •  Default max chunk size: 64mb •  MongoDB automatically splits & migrates chunks when max reached Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  27. MongoDB manages data •  Queries routed to specific shards • 

    MongoDB balances cluster •  MongoDB migrates data to new nodes Shard Shard Shard Mongos 1 2 3 4
  28. MongoDB Auto-Sharding •  Minimal effort required –  Same interface as

    single mongod •  Two steps –  Enable Sharding for a database –  Shard collection within database
  29. Data stored in shard •  Shard is a node of

    the cluster •  Shard can be a single mongod or a replica set Shard Primary Secondary Secondary Shard or Mongod
  30. Config server stores meta data •  Config Server –  Stores

    cluster chunk ranges and locations –  Can have only 1 or 3 (production must have 3) –  Two phase commit (not a replica set) or Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server
  31. MongoS manages the data •  Mongos –  Acts as a

    router / balancer –  No local data (persists to config database) –  Can have 1 or many App Server Mongos Mongos App Server App Server App Server Mongos or
  32. Node 1 Secondary Config Server Node 1 Secondary Config Server

    Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server Sharding infrastructure
  33. Example cluster setup •  Don’t use this setup in production!

    -  Only one Config server (No Fault Tolerance) -  Shard not in a replica set (Low Availability) -  Only one Mongos and shard (No Performance Improvement) -  Useful for development or demonstrating configuration mechanics Node 1 Secondary Config Server Mongos Mongod Mongod
  34. Node 1 Secondary Config Server Start the config server • 

    "mongod --configsvr" •  Starts a config server on the default port (27019)
  35. Node 1 Secondary Config Server Mongos Start the mongos router

    •  "mongos --configdb <hostname>:27019" •  For 3 config servers: "mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>" •  This is always how to start a new mongos, even if the cluster is already running
  36. Start the shard database •  "mongod --shardsvr" •  Starts a

    mongod with the default shard port (27018) •  Shard is not yet connected to the rest of the cluster •  Shard may have already been running in production Node 1 Secondary Config Server Mongos Mongod Shard
  37. Add the shard •  On mongos: "sh.addShard(‘<host>:27018’)" •  Adding a

    replica set: "sh.addShard(‘<rsname>/<seedlist>’) •  In 2.2 and later can use sh.addShard(‘<host>:<port>’) Node 1 Secondary Config Server Mongos Mongod Shard
  38. Verify that the shard was added •  db.runCommand({ listshards:1 })

    •  { "shards" : [ { "_id": "shard0000", "host": "<hostname>:27018" } ], "ok" : 1 } Node 1 Secondary Config Server Mongos Mongod Shard
  39. Enabling Sharding •  Enable sharding on a database –  sh.enableSharding("<dbname>")

    •  Shard a collection with the given key –  sh.shardCollection("<dbname>.people",{"country":1}) •  Use a compound shard key to prevent duplicates –  sh.shardCollection("<dbname>.cars",{"year":1, "uniqueid":1})
  40. Tag Aware Sharding •  Tag aware sharding allows you to

    control the distribution of your data •  Tag a range of shard keys –  sh.addTagRange(<collection>,<min>,<max>,<tag>) •  Tag a shard –  sh.addShardTag(<shard>,<tag>)
  41. minKey maxKey minKey maxKey minKey maxKey {x: -20} {x: 13}

    {x: 25} {x: 100,000} 64MB Chunk is a section of the entire range
  42. Chunk splitting •  A chunk is split once it exceeds

    the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) •  If split creates too large of a discrepancy of chunk count across cluster a balancing round starts minKey maxKey minKey 13 14 maxKey
  43. Balancing •  Balancer is running on mongos •  Once the

    difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  44. Acquiring the Balancer Lock •  The balancer on mongos takes

    out a "balancer lock" •  To see the status of these locks: -  use config -  db.locks.find({ _id: "balancer" }) Node 1 Secondary Config Server Mongos Shard 1 Mongos Mongos Shard 2 Mongod
  45. Moving the chunk •  The mongos sends a "moveChunk" command

    to source shard •  The source shard then notifies destination shard •  The destination claims the chunk shard-key range •  Destination shard starts pulling documents from source shard Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  46. Committing Migration •  When complete, destination shard updates config server

    -  Provides new locations of the chunks Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  47. Cleanup •  Source shard deletes moved data -  Must wait

    for open cursors to either close or time out -  NoTimeout cursors may prevent the release of the lock •  Mongos releases the balancer lock after old chunks are deleted Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  48. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 Shards return results to mongos
  49. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 4 Mongos returns results to client
  50. Shard Shard Shard Mongos 1 2 2 2 3 3

    3 Query and sort performed locally
  51. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 Shards return results to mongos
  52. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 5 Mongos merges sorted results
  53. Shard Shard Shard Mongos 1 2 2 2 4 4

    4 3 3 3 6 5 Mongos returns results to client
  54. Shard Key •  Choose a field common used in queries

    •  Shard key is immutable •  Shard key values are immutable •  Shard key requires index on fields contained in key •  Uniqueness of `_id` field is only guaranteed within individual shard •  Shard key limited to 512 bytes in size