Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB - Replication and Replica Sets

Sridhar Nanjundeswaran
February 24, 2012
65

MongoDB - Replication and Replica Sets

Presented at MongoDB LA. Jan 2012

Sridhar Nanjundeswaran

February 24, 2012
Tweet

Transcript

  1. Sridhar Nanjundeswaran, 10gen
    [email protected]
    @snanjund
    © Copyright 2010 10gen Inc.

    View full-size slide

  2. Agenda
    • Introduction to replica sets
    • Durability and Consistency
    • Options and configuration
    • Common deployment scenarios
    • Behind the Scenes

    View full-size slide

  3. INTRO TO REPLICA SETS

    View full-size slide

  4. Replica Set - Creation
    • 2 or more nodes form the set
    Node 1 Node 2
    Node 3

    View full-size slide

  5. Replica Set - Initialize
    • Initialize -> Election
    • Primary + data replication from primary to secondary
    Node 1
    Secondary
    Node 2
    Secondary
    Node 3
    Primary
    Replication
    Replication
    Heartbeat

    View full-size slide

  6. Replica Set - Failure
    • Primary down/network failure
    • Automatic election of new primary if majority exists
    Node 1
    Secondary
    Node 2
    Secondary
    Node 3
    Primary
    Heartbeat
    Primary Election

    View full-size slide

  7. Replica Set - Failover
    • New primary elected
    • Replication established from new primary
    Node 1
    Secondary
    Node 2
    Primary
    Node 3
    Primary
    Heartbeat

    View full-size slide

  8. Replica Set - Recovery
    • Down node comes up
    • Rejoins sets
    • Recovery and then secondary
    Node 1
    Secondary
    Node 2
    Primary
    Node 3
    Secondary
    Replication
    Replication
    Heartbeat

    View full-size slide

  9. DURABILITY AND CONSISTENCY

    View full-size slide

  10. Strong Consistency
    Primary
    Secondary
    Secondary
    Read
    Write
    Driver
    Client

    View full-size slide

  11. Eventual Consistency
    Primary
    Read
    Write
    Driver
    Read
    Secondary
    Secondary
    Client Application

    View full-size slide

  12. Durability
    • Fire and forget
    • Wait for error
    • Wait for journal sync
    • Wait for fsync
    • Wait for replication

    View full-size slide

  13. Fire and Forget
    Driver Primary
    write
    apply in memory

    View full-size slide

  14. Wait for error
    Driver Primary
    getLastError
    apply in memory

    View full-size slide

  15. Wait for journal sync
    Driver Primary
    getLastError
    apply in memory
    write
    j:true
    Write to journal

    View full-size slide

  16. Wait for fsync
    Driver Primary
    getLastError
    apply in memory
    write
    fsync:true
    fsync

    View full-size slide

  17. Wait for replication
    Driver Primary
    getLastError
    apply in memory
    write
    w:2
    Secondary
    replicate

    View full-size slide

  18. OPTIONS AND CONFIGURATION

    View full-size slide

  19. Priorities
    • Floating point number between 0..1000
    • Highest member that is up to date wins
    • Up to date == within 10 seconds of primary
    • If a higher priority member catches up, it will
    force election and win

    View full-size slide

  20. Slave Delay
    • Lags behind master by configurable time
    delay
    • Automatically hidden from clients
    • Protects against operator errors
    • Fat fingering
    • Application corrupts data

    View full-size slide

  21. Tagging
    • New in 2.0.0
    • Control over where data is written to
    • Each member can have one or more tags e.g.
    • tags: {dc: "ny"}
    • tags: {dc: "ny",
 ip: "192.168",
 rack:
    "row3rk7"}
    • Replica set defines rules for where data resides
    • Rules can change without change application
    code

    View full-size slide

  22. Tagging - example
    {
    _id : "mySet",
    members : [
    {_id : 0, host : "A", tags : {"dc": "ny"}},
    {_id : 1, host : "B", tags : {"dc": "ny"}},
    {_id : 2, host : "C", tags : {"dc": "sf"}},
    {_id : 3, host : "D", tags : {"dc": "sf"}},
    {_id : 4, host : "E", tags : {"dc": "cloud"}}]
    settings : {
    getLastErrorModes : {
    allDCs : {"dc" : 3},
    someDCs : {"dc" : 2}} }
    }
    > db.blogs.insert({...})
    > db.runCommand({getLastError : 1, w : "allDCs"})

    View full-size slide

  23. Others
    • Arbiters
    • Vote in elections
    • Don’t store a copy of data
    • Use as tie breaker
    • Hidden
    • Not reported in isMaster
    • Hidden from slaveOk reads

    View full-size slide

  24. COMMON DEPLOYMENT
    SCENARIOS

    View full-size slide

  25. Replica Set – 1 Data Center
    • Single datacenter
    • Single switch &
    power
    • Points of failure:
    • Power
    • Network
    • Datacenter
    • Two node failure
    • Automatic recovery
    of single node crash
    Member 1
    Member 2
    Member 3

    View full-size slide

  26. Replica Set – 2 data centers
    • Multi datacenter
    • DR node for safety
    • Can’t do multi data
    center durable write
    safely since only 1
    node in distant DC
    Member 1
    Member 2
    Member 3
    DC1
    DC2

    View full-size slide

  27. Replica Set – 3 Data Centers
    • Three data centers
    • Can survive full data
    center loss
    • Can do w= { dc : 2 }
    to guarantee write in
    2 data centers
    Member 1
    Member 2
    Member 3
    Member 4
    DC2
    Member 5 - DR
    DC1
    DC3

    View full-size slide

  28. BEHIND THE SCENES

    View full-size slide

  29. Local database
    • config
    • Oplog
    • Capped collection
    • Idempotent version of operation stored
    db.replsettest.update({},{$inc:{set:1}})
    { "ts" : …, "h" : …, "op" : "u", "ns" : "mongo_la.replsettest", "o2" : {
    "_id" : ObjectId("4f177bc768dad821278224d8") }, "o" : { "$set" : {
    "set" : 2 } } }

    View full-size slide

  30. @mongodb
    © Copyright 2010 10gen Inc.
    conferences, appearances, and meetups
    http://www.10gen.com/events
    http://bit.ly/mongofb
    Facebook | Twitter | LinkedIn
    http://linkd.in/joinmongo
    download at mongodb.org
    We’re Hiring !
    [email protected] @snanjund

    View full-size slide