Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Replication and Replica Sets - Kyle Banker, 10gen

mongodb
November 04, 2011

Replication and Replica Sets - Kyle Banker, 10gen

MongoChicago 2011

MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets which provide...... automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads. We'll also discuss proper architecture for durability.

mongodb

November 04, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. What is Replication for? • High availability • If a

    node fails, another node can step in • Extra copies of data for recovery • Offline operations • Backups and exports are cheap • Scaling reads • Read-heavy applications can read from replicas Friday, November 4, 2011
  2. What Does Replication Look Like? • Replica Set • A

    set of mongod servers • Minimum of 3 • Can use “arbiters” • Consensus election of a “primary” • All writes go to primary • “Secondaries” replicate from primary Friday, November 4, 2011
  3. Configuring a Replica Set • Start mongod processes with --replSet

    • Then: > rs.initiate(); > rs.add(‘localhost:30000’); > rs.add(‘localhost:30001’) Friday, November 4, 2011
  4. How Does it Work? • Change operations are written to

    the oplog • The oplog is a capped collection • Must have enough space to allow new secondaries to catch up after copying from a primary • Must have enough space to cope with any applicable slaveDelay • Secondaries query the primary’s oplog and apply what they find Friday, November 4, 2011
  5. Primary Election Primary Secondary Secondary As long as a partition

    can see a majority (>50%) of the cluster, then it will elect a primary. Friday, November 4, 2011
  6. Simple Failure Primary Failed Node Secondary 66% of cluster visible.

    Primary is elected Friday, November 4, 2011
  7. Simple Failure Failed Node 33% of cluster visible. Read only

    mode. Failed Node Secondary Friday, November 4, 2011
  8. Network Partition Primary Secondary Secondary Primary Failed Node Secondary 66%

    of cluster visible. Primary is elected Friday, November 4, 2011
  9. Secondary Network Partition 33% of cluster visible. Read only mode.

    Primary Secondary Failed Node Failed Node Secondary Friday, November 4, 2011
  10. Even Cluster Size Primary Secondary Secondary Secondary Failed Node Secondary

    Failed Node 50% of cluster visible. Read only mode. Secondary Friday, November 4, 2011
  11. Even Cluster Size Primary Secondary Failed Node Secondary Failed Node

    50% of cluster visible. Read only mode. Secondary Secondary Secondary Friday, November 4, 2011
  12. Avoid Single points of failure Primary Secondary Secondary Top of

    rack switch Rack falls over Friday, November 4, 2011
  13. Priorities Primary Secondary Secondary San Francisco Dallas Priority 1 Priority

    1 Priority 0 Disaster recover data center. Will never become primary automatically. Friday, November 4, 2011
  14. 2 Replicas + Arbiter?? Primary Arbiter Secondary Primary Arbiter Secondary

    1 2 Primary Arbiter Secondary 3 Secondary Full Sync Uh oh. Full Sync is going to use a lot of resources on the primary. So I may have downtime or degraded performance Friday, November 4, 2011
  15. With 3 replicas Primary Secondary Primary Secondary 1 2 Primary

    Secondary 3 Secondary Full Sync Sync can happen from secondary, which will not impact traffic on Primary. Secondary Secondary Secondary Friday, November 4, 2011
  16. Replica Set Topology • Avoid single points of failure –

    Separate racks – Separate data centers • Avoid long recovery downtime – Use journaling – Use 3+ replicas • Keep your actives close – Use priority to control where failovers happen Friday, November 4, 2011
  17. For Applications • getLastError( { w : … } )

    • Application waits until changes are written to the specified number of servers • Defaults can be set in the replica set’s configuration • “Safe mode” for critical writes: setWriteConcern() • Another way to force writes to a number of servers • Drivers support ‘read preference’ for sending queries to a secondary • Careful on this: secondary reads are not guaranteed to be consisted with the primary. Friday, November 4, 2011
  18. Replication and Sharding • Each shard needs its own replica

    set • Drivers use a mongos process to route queries to the appropriate shard(s) • Configuration servers maintain the shard key range metadata Friday, November 4, 2011
  19. Data Center Awareness • Tag nodes in replica set configuration

    • Apply hierarchical labels to replica set members • Define getLastError modes • Require number of nodes writes must go to • Require locations of nodes writes must go to • Combinations • Available in 2.0.0+ Friday, November 4, 2011