Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Replication and Replica Sets

mongodb
April 20, 2012
260

Replication and Replica Sets

MongoDB Stockholm - Replication and Replica Sets - Ross Lawley, Software Engineer, 10gen

MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads. We'll also discuss proper architecture for durability.

mongodb

April 20, 2012
Tweet

Transcript

  1. Use cases • High Availability (auto-failover) • Read Scaling (extra

    copies to read from) • Backups • Online, Delayed Copy (fat finger) • Point in Time (PiT) backups • Use (hidden) replica for secondary workload • Analytics • Data-processing • Integration with external systems Friday, April 20, 12
  2. Types of outage Planned • Hardware upgrade • O/S or

    file-system tuning • Relocation of data to new file-system / storage • Software upgrade Unplanned • Hardware failure • Data center failure • Region outage • Human error • Application corruption Friday, April 20, 12
  3. Replica Set features • A cluster of N servers •

    All writes to primary • Reads can be to primary (default) or a secondary • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery Friday, April 20, 12
  4. How MongoDB Replication works Member 1 Member 2 Member 3

    • Set is made up of 2 or more nodes Friday, April 20, 12
  5. How MongoDB Replication works • Election establishes the PRIMARY •

    Data replication from PRIMARY to SECONDARY Member 1 Member 2 Primary Member 3 Friday, April 20, 12
  6. How MongoDB Replication works • PRIMARY may fail • Automatic

    election of new PRIMARY if majority exists Member 1 Member 2 DOWN Member 3 negotiate new master Friday, April 20, 12
  7. How MongoDB Replication works • New PRIMARY elected • Replica

    Set re-established Member 1 Member 2 DOWN Member 3 Primary Friday, April 20, 12
  8. How MongoDB Replication works • Automatic recovery Member 1 Member

    3 Primary Member 2 Recovering Friday, April 20, 12
  9. How MongoDB Replication works • Replica Set re-established Member 1

    Member 3 Primary Member 2 Friday, April 20, 12
  10. How's Replication work? • Change operations are written to the

    oplog • The oplog is a capped collection (fixed size) • Must have enough space to allow new secondaries to catch up after copying from a primary • Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary's oplog and apply what they find • All replicas contain an oplog Friday, April 20, 12
  11. mongod --replSet <replica_name> --oplogSize <MB> > cfg = { _id

    : "myset", members : [ { _id : 0, host : "stockholm1.acme.com" }, { _id : 1, host : "stockholm2.acme.com" }, { _id : 2, host : "stockholm3.acme.com" } ] } > use admin > db.runCommand( { replSetInitiate : cfg } ) Creating a Replica Set Friday, April 20, 12
  12. Managing a Replica Set • rs.conf() • Shell helper: get

    current configuration • rs.initiate(<cfg>); • Shell helper: initiate replica set • rs.reconfig(<cfg>) • Shell helper: reconfigure a replica set • rs.add("hostname:<port>") • Shell helper: add a new member • rs.remove("hostname:<port>") • Shell helper: remove a member Friday, April 20, 12
  13. Managing a Replica Set • rs.status() • Reports status of

    the replica set from one node's point of view • rs.stepDown(<secs>) • Request the primary to step down • rs.freeze(<secs>) • Prevents any changes to the current replica set configuration (primary/secondary status) • Use during backups Friday, April 20, 12
  14. Priorities • Priority, floating point number between 0 and 100

    • Used during an election: • Most up to date • Highest priority • Less than 10s behind failed Primary • Allows weighting of members during failover Friday, April 20, 12
  15. Priorities - example • Assuming all members are up to

    date • Members A or B will be chosen first • Highest priority • Members C or D will be chosen when: • A and B are unavailable • A and B are not up to date • Member E is never chosen • priority:0 means it cannot be elected A p:10 B p:10 C p:1 D p:1 E p:0 Friday, April 20, 12
  16. Writes Concerns db.runCommand({getLastError: 1, w : 1}) • ensure write

    is synchronous • command returns after primary has written to memory w: n or w: 'majority' • n is the number of nodes data must be replicated to • driver will always send writes to Primary w: 'my_tag' • Each member is "tagged" e.g. "allDCs" • Ensure that the write is executed in each tagged "region" Friday, April 20, 12
  17. Writes Concerns fsync: true • Ensures changed disk blocks are

    flushed to disk j: true • Ensures changes are flushed to the Journal Friday, April 20, 12
  18. Tagging • Control over where data is written to. •

    Each member can have one or more tags: tags: {dc: "stockholm"} tags: {dc: "stockholm", ip: "192.168", rack: "row3-rk7"} • Replica set defines rules for where data resides • Rules can change without change application code Friday, April 20, 12
  19. {_id : "mySet", members : [ {_id : 0, host

    : "A", tags : {"dc": "sto"}}, {_id : 1, host : "B", tags : {"dc": "ber"}}, {_id : 2, host : "C", tags : {"dc": "lon"}}, {_id : 4, host : "D", tags : {"dc": "nyc"}}] settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2}}} } > db.post.insert({...}) > db.runCommand({getLastError : 1, w : "allDCs"}) Tagging - example Friday, April 20, 12
  20. Using Replicas for Reads • Read Preference / Slave Okay

    • driver will always send writes to Primary • driver will send read requests to Secondaries • Python examples • Connection(read_preference=ReadPreference.PRIMARY) • db.read_preference = ReadPreference.SECONDARY_ONLY • db.test.read_preference = ReadPreference.SECONDARY • db.test.find(read_preference=ReadPreference.SECONDARY) Friday, April 20, 12
  21. Using Replicas for reads • Warning! • Secondaries may be

    out of date • Not applicable for all applications • Sharding provides consistent scaling of reads Friday, April 20, 12
  22. Replication features • Reads from Primary are always consistent •

    Reads from Secondaries are eventually consistent • Automatic failover if a Primary fails • Automatic recovery when a node joins the set • Full control of where writes occur Friday, April 20, 12
  23. • Will have downtime • If node crashes human intervention

    might be needed Single Node Friday, April 20, 12
  24. • Single datacenter • Single switch & power • One

    node failure • Automatic recovery of single node crash • Points of failure: • Power • Network • Datacenter Replica Set 1 Arbiter Friday, April 20, 12
  25. • Single datacenter • Multiple power/network zones • Automatic recovery

    of single node crash • w=2 not viable as losing 1 node means no writes • Points of failure: • Datacenter • Two node failure Replica Set 2 Arbiter Friday, April 20, 12
  26. • Single datacenter • Multiple power/network zones • Automatic recovery

    of single node crash • w=2 viable as 2/3 online • Points of failure: • Datacenter • Two node failure Replica Set 3 Friday, April 20, 12
  27. • Multi datacenter • DR node for safety • Can't

    do multi data center durable write safely since only 1 node in distant DC Replica Set 4 Friday, April 20, 12
  28. • Three data centers • Can survive full data center

    loss • Can do w= { dc : 2 } to guarantee write in 2 data centers Replica Set 5 Friday, April 20, 12
  29. Typical Deployments Use? Set size Data Protection High Availability Notes

    X One No No Must use --journal to protect against crashes Two Yes No On loss of one member, surviving member is read only Three Yes Yes - 1 failure On loss of one member, surviving two members can elect a new primary X Four Yes Yes - 1 failure* * On loss of two members, surviving two members are read only Five Yes Yes - 2 failures On loss of two members, surviving three members can elect a new primary Friday, April 20, 12
  30. @mongodb conferences, appearances, and meetups http://www.10gen.com/events http://bit.ly/mongofb Facebook | Twitter

    | LinkedIn http://linkd.in/joinmongo download at mongodb.org support, training, and this talk brought to you by Friday, April 20, 12