Upgrade to Pro — share decks privately, control downloads, hide ads and more …

replication

mpobrien
March 12, 2012
530

 replication

mpobrien

March 12, 2012
Tweet

Transcript

  1. What we’ll cover • What a replica set is and

    why you want it • The mechanics of how a replica set works • How to set it up • How to handle it with drivers • Some dos + don’ts for deployment Monday, March 12, 12
  2. Replica Set • DB nodes whose goal is to create

    a complete copy of data on each node • Only one primary at a time • All other nodes are secondaries • If the primary fails, a secondary is chosen to take over PRIMARY secondary1 secondary2 Monday, March 12, 12
  3. Replica Set • Only one primary at any time •

    Only the primary accepts writes (i.e. writes are strongly consistent) • Secondaries are read-only • Secondaries talk to primary to keep their own copies in sync PRIMARY secondary1 secondary2 Monday, March 12, 12
  4. Why • App can survive a database node failure •

    Extra copies of data = redundancy • Scaling Reads: Sending .find() queries to secondaries • Makes backups easier • Use hidden replicas for secondary workload: analytics, integration with other systems, etc. • Data-center awareness: survive an entire data center outage Monday, March 12, 12
  5. What happens when a node fails? • Replica set members

    monitor each other with heartbeats - ping every 2 seconds • If the primary can’t be reached, an election is triggered - each node gets a vote and knows the total # available votes • If no node can reach a majority, replica set becomes read-only Monday, March 12, 12
  6. How it Works • Change operations are written to an

    oplog (capped collection) on the primary • Secondaries query the oplog and apply the changes • All replicas get their own oplog Monday, March 12, 12
  7. oplog is fixed size • capped collection - like a

    circular queue • defaults to 5% of disk space (on 64 bit) this is usually plenty • eventually it will fill up.... • if a slave falls too far behind, it will need to resync Monday, March 12, 12
  8. { "ts" : { "t" : 1329197785000, "i" : 1

    }, "h" : NumberLong("4816916911793111057"), "op" : "i", "ns" : "test.stuff", "o" : { "_id" : ObjectId("4f39f2d91b645b4d80fb2e86"), "a" : 1 } } timestamp unique identifier operation type namespace operation An oplog entry looks like this: Monday, March 12, 12
  9. oplog entries are idempotent i.e.: replaying the same entry yields

    same result update({age:{$lt:5}}, {$inc:{x:1}}) one update cmd multiple oplog entries set x=1 for ObjectId("4f3bf37bdbb51e2beb325867") set x=1 for ObjectId("4f3bf37ddbb51e2beb325868") set x=1 for ObjectId("4f3bf37ddbb51e2beb325869") etc... Monday, March 12, 12
  10. Launching a Replica Set > var config = { _id

    = “austin”, members:[ {_id:0, host:”host1.weylandyutani.com”}, {_id:1, host:”host2.weylandyutani.com”}, {_id:2, host:”host3.weylandyutani.com”}, ] } > use admin; > rs.initiate(config); Start your mongod processes with --replSet <name>, then: Monday, March 12, 12
  11. Replication Utilities • rs.add(“hostname:port”) - add a new member •

    rs.remove(“hostname:port”) - remove a member • rs.status() - get an overview of replica set health • rs.stepDown() - step down as primary • rs.reconfig(config) updates the replicaset config • rs.slaveOk() - on a secondary, enable read queries Monday, March 12, 12
  12. Replica Set Options • {arbiterOnly:true} makes this node an arbiter

    - votes in elections, but stores no data • {priority: p} set a preference for election as primary - priority:0 means node can never become primary useful for backups, reporting, etc. • {slaveDelay : <seconds>} Number of seconds to remain behind primary. Useful for accident recovery, rolling backups, etc. Monday, March 12, 12
  13. Drivers are replica-set aware! by passing options to getLastError(), we

    can get a guarantee of successful replication from pymongo import ReplicaSetConnection db = ReplicaSetConnection().test db.u.update({“name”:”bob”}, {“$inc”:{“age”:1}}, safe=True, w=2); db.u.update({“name”:”bob”}, {“$inc”:{“age”:1}}, safe=True, w=”majority”); Ensure write on >=2 nodes Ensure write reaches majority of nodes Monday, March 12, 12
  14. Scaling reads with Secondary Nodes • .slaveOk() enables read-queries on

    secondary nodes • Good for read-heavy situations • Not necessarily helpful for write-heavy situations • This does not increase your working set size (need sharding) Monday, March 12, 12
  15. Drivers can handle sending read- queries to secondaries from pymongo

    import ReplicaSetConnection, from pymongo import ReadPreference c = ReplicaSetConnection(read_preference=ReadPreference.SECONDARY) db = c.test db.u.find_one({“name”:”bob”}) These reads are eventually consistent If you need strong consistency, stick with ReadPreference.PRIMARY Monday, March 12, 12
  16. Deployment Strategies • Odd # of members for elections •

    Minimum of 3 members Monday, March 12, 12
  17. 5 nodes On loss of <=2 nodes, survivors can elect

    new primary good 4 nodes Survives 1 failure. On 2 failures, remaining 2 nodes become read-only. bad 3 nodes Survives 1 failure. On 1 failure, elects new primary. good 2 nodes Becomes read-only on loss of a single member bad 1 node this isn’t even a replica set, actually bad! Monday, March 12, 12
  18. Network Setup • Each member should have its own machine

    • Use arbiters for more lightweight setup • If sharding, each shard should be a complete replica set • Up to 12 replica set members, 7 of which are allowed to vote Monday, March 12, 12
  19. primary recovering unlock the secondary catches up automatically (make sure

    your oplog size is big enough) Monday, March 12, 12
  20. similar idea: build indexes on secondaries for each server in

    secondaries: - shut down the server - restart server as standalone - log in to server and build the index - shut down the server - restart server as a secondary again - step down the primary and repeat (rolling update) Monday, March 12, 12