Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Replication and Replica Sets - Robert Stam, 10gen

mongodb
February 06, 2012

Replication and Replica Sets - Robert Stam, 10gen

MongoDB Boulder 2012

MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads. We'll also discuss proper architecture for durability.

mongodb

February 06, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. • A cluster of N servers • Any (one) node

    can be primary • Consensus election of primary • Automatic failover • Automatic recovery • All writes to primary • Reads can be to primary (default) or a secondary Replica Set features
  2. How MongoDB Replication works Member  1 Member  2 PRIMARY Member

     3 •Election establishes the PRIMARY •Data replication from PRIMARY to SECONDARY
  3. How MongoDB Replication works Member  1 Member  2 DOWN Member

     3 negotiate  new   master •PRIMARY may fail •Automatic election of new PRIMARY if majority exists
  4. How MongoDB Replication works Member  1 Member  2 DOWN Member

     3 PRIMARY •New PRIMARY elected •Replication Set re-established
  5. How MongoDB Replication works Member  1 Member  2 Member  3

    PRIMARY •Replication Set re-established
  6. When are elections triggered? When a given member • See's

    that the Primary is not reachable • The member is not an Arbiter • Is caught up and has a priority greater than the current primary
  7. Creating a Replica Set $  ./mongod  -­‐-­‐replSet  <name> >  cfg

     =  {        _id  :  "<name>",        members  :  [            {  _id  :  0,  host  :  "sf1.acme.com"  },            {  _id  :  1,  host  :  "sf2.acme.com"  },            {  _id  :  2,  host  :  "sf3.acme.com"  }        ] } >  use  admin >  rs.initiate(cfg)
  8. Managing a Replica Set • rs.add(“hostname:<port>”) • Shell helper: add

    a new member • rs.remove(“hostname:<port>”) • Shell helper: remove a member • rs.reconfig(configDoc) • Used by other helpers; more control, single step for multiple changes (add/remove/etc.) • rs.stepDown(ineligibleDuration) • Issued on the Primary; “step down” to new node
  9. Some Administrative Commands • rs.status() • Reports status of the

    replica set from one node’s point of view • rs.freeze(timeInSeconds) • Prevents any changes to the current replica set node (Secondaries stay that way) • Use during backups, shuffling primary • …more…
  10. Durability Options • Fire and forget • Wait for error

    • Wait for journal sync • Wait for fsync • Wait for replication
  11. Priorities • Floating point number between 0..1000 • Highest member

    that is up to date wins • Up to date == within 10 seconds of primary • If a higher priority member catches up, it will force election and win
  12. Slave Delay • Lags behind master by configurable time delay

    • Automatically hidden from clients • Protects against operator errors • Fat fingering • Application corrupts data
  13. Tagging • New in 2.0 • Tags represent properties of

    a member • Each member can have one or more tags e.g. • tags: {dc: "ny"} • tags: {dc: "ny", ip: "192.168", rack: "row3rk7"} • getLastErrorModes represent sets of servers • getLastError with w=”mode” waits for writes to replicate to a set of servers that match the mode
  14. Tagging - example { _id : "mySet", members : [

    {_id : 0, host : "A", tags : {"dc": "ny"}}, {_id : 1, host : "B", tags : {"dc": "ny"}}, {_id : 2, host : "C", tags : {"dc": "sf"}}, {_id : 3, host : "D", tags : {"dc": "sf"}}, {_id : 4, host : "E", tags : {"dc": "cloud"}} ] settings : { getLastErrorModes : { allDCs : {"dc" : 3}, someDCs : {"dc" : 2} } } } > db.blogs.insert({...}) > db.runCommand({getLastError : 1, w : "allDCs"})
  15. Other member types • Arbiters • Don’t store a copy

    of the data • Vote in elections • Used as a tie breaker • Hidden • Not reported in isMaster • Hidden from slaveOk reads
  16. Replica Set – 1 Data Center • Single datacenter •

    Single switch & power • Points of failure: • Power • Network • Datacenter • Two node failure • Automatic recovery of single node crash Member  1 Member  2 Member  3
  17. Replica Set – 2 data centers • Multi datacenter •

    DR node for safety • Can’t do multi data center durable write safely since only 1 node in distant DC Member  1 Member  2 Member  3 DR DC1 DC2
  18. Replica Set – 3 Data Centers • Three data centers

    • Can survive full data center loss • Can do w=”twoDCs” to guarantee write in 2 data centers Member  1 Member  2 Member  3 Member  4 DC2 Member  5 DR DC1 DC3
  19. Typical Deployments Use? Set   size Data  Protection High  Availability

    Notes X One No No Must  use  -­‐-­‐journal  to  protect  against   crashes Two Yes No On  loss  of  one  member,  surviving   member  is  read  only Three Yes Yes  -­‐  1  failure On  loss  of  one  member,  surviving  two   members  can  elect  a  new  primary X Four Yes Yes  -­‐  1  failure* *  On  loss  of  two  members,  surviving   two  members  are  read  only   Five Yes Yes  -­‐  2  failures On  loss  of  two  members,  surviving   three  members  can  elect  a  new  primary
  20. How Is Data Replicated? • Change operations are written to

    the oplog • The oplog is a capped collection (fixed size) • Must have enough space to allow new secondaries to catch up (from scratch or from a backup) • Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary’s oplog and apply what they find • All replicas contain an oplog
  21. ©  Copyright  2010  10gen  Inc. @mongodb conferences,  appearances,  and  meetups

    http://www.10gen.com/events http://bit.ly/mongoW            Facebook                  Twitter              LinkedIn http://linkd.in/joinmongo download at mongodb.org We’re  Hiring  ! [email protected]