Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Replication and Replicasets

rozza
March 20, 2012

Replication and Replicasets

MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads.

We ran out of time to discuss proper architecture for durability - but examples are in the slides

rozza

March 20, 2012
Tweet

More Decks by rozza

Other Decks in Programming

Transcript

  1. Use cases • High Availability (auto-failover) • Read Scaling (extra

    copies to read from) • Backups • Online, Delayed Copy (fat finger) • Point in Time (PiT) backups • Use (hidden) replica for secondary workload • Analytics • Data-procesing • Integration with external systems Tuesday, 20 March 12
  2. Types of outage Planned • Hardware upgrade • O/S or

    file-system tuning • Relocation of data to new file-system / storage • Software upgrade Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption Tuesday, 20 March 12
  3. Replica Set features • A cluster of N servers •

    All writes to primary • Reads can be to primary (default) or a secondary • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery Tuesday, 20 March 12
  4. How MongoDB Replication works Member 1 Member 2 Member 3

    • Set is made up of 2 or more nodes Tuesday, 20 March 12
  5. How MongoDB Replication works • Election establishes the PRIMARY •

    Data replication from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3 Tuesday, 20 March 12
  6. How MongoDB Replication works • PRIMARY may fail • Automatic

    election of new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master Tuesday, 20 March 12
  7. How MongoDB Replication works • New PRIMARY elected • Replica

    Set re-established Member 1 Member 2 DOWN Member 3 Primary Tuesday, 20 March 12
  8. How MongoDB Replication works • Automatic recovery Member 1 Member

    3 Primary Member 2 Recovering Tuesday, 20 March 12
  9. How MongoDB Replication works • Replica Set re-established Member 1

    Member 3 Primary Member 2 Tuesday, 20 March 12
  10. How's Replication work? • Change operations are written to the

    oplog • The oplog is a capped collection (fixed size) • Must have enough space to allow new secondaries to catch up after copying from a primary • Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary's oplog and apply what they find • All replicas contain an oplog Tuesday, 20 March 12
  11. mongod --replSet <replica_name> --oplogSize <MB> > cfg = { _id

    : "myset", members : [ { _id : 0, host : "berlin1.acme.com" }, { _id : 1, host : "berlin2.acme.com" }, { _id : 2, host : "berlin3.acme.com" } ] } > use admin > db.runCommand( { replSetInitiate : cfg } ) Creating a Replica Set Tuesday, 20 March 12
  12. Managing a Replica Set • rs.conf() • Shell helper: get

    current configuration • rs.initiate(<cfg>); • Shell helper: initiate replica set • rs.reconfig(<cfg>) • Shell helper: reconfigure a replica set • rs.add("hostname:<port>") • Shell helper: add a new member • rs.remove("hostname:<port>") • Shell helper: remove a member Tuesday, 20 March 12
  13. Managing a Replica Set • rs.status() • Reports status of

    the replica set from one node's point of view • rs.stepDown(<secs>) • Request the primary to step down • rs.freeze(<secs>) • Prevents any changes to the current replica set configuration (primary/secondary status) • Use during backups Tuesday, 20 March 12
  14. Priorities • Priority, floating point number between 0 and 100

    • Used during an election: • Most up to date • Highest priority • Less than 10s behind failed Primary • Allows weighting of members during failover Tuesday, 20 March 12
  15. Priorities - example • Assuming all members are up to

    date • Members A or B will be chosen first • Highest priority • Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date • Member E is never chosen • priority:0 means it cannot be elected A p:10 B p:10 C p:1 D p:1 E p:0 Tuesday, 20 March 12
  16. Writes Concerns db.runCommand({getLastError: 1, w : 1}) • ensure write

    is synchronous • command returns after primary has written to memory w: n or w: 'majority' • n is the number of nodes data must be replicated to • driver will always send writes to Primary w: 'my_tag' • Each member is "tagged" e.g. "allDCs" • Ensure that the write is executed in each tagged "region" Tuesday, 20 March 12
  17. Writes Concerns fsync: true • Ensures changed disk blocks are

    flushed to disk j: true • Ensures changes are flushed to the Journal Tuesday, 20 March 12
  18. Tagging • Control over where data is written to. •

    Each member can have one or more tags: tags: {dc: "ber"} tags: {dc: "ber", ip: "192.168", rack: "row3-rk7"} • Replica set defines rules for where data resides • Rules can change without change application code Tuesday, 20 March 12
  19. {_id : "mySet", members : [ {_id : 0, host

    : "A", tags : {"dc": "ber"}}, {_id : 1, host : "B", tags : {"dc": "ber"}}, {_id : 2, host : "C", tags : {"dc": "lon"}}, {_id : 4, host : "E", tags : {"dc": "nyc"}}] settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}}} } > db.post.insert({...}) > db.runCommand({getLastError : 1, w : "allDCs"}) Tagging - example Tuesday, 20 March 12
  20. Using Replicas for Reads • Read Preference / Slave Okay

    • driver will always send writes to Primary • driver will send read requests to Secondaries • Python examples • Connection(read_preference=ReadPreference.PRIMARY) • db.read_preference = ReadPreference.SECONDARY_ONLY • db.test.read_preference = ReadPreference.SECONDARY • db.test.find(read_preference=ReadPreference.SECONDARY) Tuesday, 20 March 12
  21. Using Replicas for reads • Warning! • Secondaries may be

    out of date • Not applicable for all applications • Sharding provides consistent scaling of reads Tuesday, 20 March 12
  22. Replication features • Reads from Primary are always consistent •

    Reads from Secondaries are eventually consistent • Automatic failover if a Primary fails • Automatic recovery when a node joins the set • Full control of where writes occur Tuesday, 20 March 12
  23. • Will have downtime • If node crashes human intervention

    might be needed Single Node Tuesday, 20 March 12
  24. • Single datacenter • Single switch & power • One

    node failure • Automatic recovery of single node crash • Points of failure: • Power • Network • Datacenter Replica Set 1 Arbiter Tuesday, 20 March 12
  25. • Single datacenter • Multiple power/network zones • Automatic recovery

    of single node crash • w=2 not viable as losing 1 node means no writes • Points of failure: • Datacenter • Two node failure Replica Set 2 Arbiter Tuesday, 20 March 12
  26. • Single datacenter • Multiple power/network zones • Automatic recovery

    of single node crash • w=2 viable as 2/3 online • Points of failure: • Datacenter • Two node failure Replica Set 3 Tuesday, 20 March 12
  27. • Multi datacenter • DR node for safety • Can't

    do multi data center durable write safely since only 1 node in distant DC Replica Set 4 Tuesday, 20 March 12
  28. • Three data centers • Can survive full data center

    loss • Can do w= { dc : 2 } to guarantee write in 2 data centers Replica Set 5 Tuesday, 20 March 12
  29. Typical Deployments Use? Set size Data Protection High Availability Notes

    X One No No Must use --journal to protect against crashes Two Yes No On loss of one member, surviving member is read only Three Yes Yes - 1 failure On loss of one member, surviving two members can elect a new primary X Four Yes Yes - 1 failure* * On loss of two members, surviving two members are read only Five Yes Yes - 2 failures On loss of two members, surviving three members can elect a new primary Tuesday, 20 March 12
  30. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by Tuesday, 20 March 12