Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatic Failover in RethinkDB

Automatic Failover in RethinkDB

Jorge Silva

July 27, 2015
Tweet

More Decks by Jorge Silva

Other Decks in Programming

Transcript

  1. Automatic Failover
    Using Raspberry Pis to
    understand automatic
    failover
    RethinkDB Meetup
    San Francisco, California
    July 27, 2015

    View full-size slide

  2. Jorge Silva
    @thejsj
    Developer Evangelist @ RethinkDB

    View full-size slide

  3. Distributed Systems
    What makes RethinkDB
    distributed?

    View full-size slide

  4. What is RethinkDB?
    • Open source database for building
    realtime web applications
    • NoSQL database that
    stores schemaless JSON documents
    • Distributed database that is easy to
    scale

    View full-size slide

  5. What makes it distributed?
    • Allows simple sharding and
    replication of tables
    • Allows you to easily connect
    nodes to a cluster using `--join`

    View full-size slide

  6. The problem
    • When one of your nodes goes
    down, you needed to manually
    decide what to do

    View full-size slide

  7. Automatic Failover
    RethinkDB 2.1

    View full-size slide

  8. What's new in 2.1
    • RethinkDB 2.1 introduces
    automatic failover
    • It uses Raft as the consensus
    algorithm

    View full-size slide

  9. Replicas
    • Primary replicas serve as the
    authoritative copy of the data
    • Secondary replicas serve as a
    mirror of the primary replica

    View full-size slide

  10. Automatic Failover
    • In RethinkDB, automatic failover
    takes care of promoting secondary
    replicas into primary replicas when
    a primary replica is unavailable
    • The cluster picks new primaries by
    voting. New server need a majority
    vote.

    View full-size slide

  11. Automatic Failover
    Cluster with Raspberry Pis

    View full-size slide

  12. Step #1: Start RethinkDB
    // Check RethinkDB is running
    $ ssh [email protected]

    View full-size slide

  13. Step #2: ssh into raspberry pis
    // Check devices in network
    $ nmap -sn *.*.*.0/24
    // ssh into raspberry pis
    $ ssh pi@redisgeek
    $ ssh pi@pishark

    View full-size slide

  14. Step #3: Start RethinkDB
    // Start RethinkDB in both Raspberry Pis
    pi@mrpi1 ~ $ rethinkdb \
    -n redisgeek \
    -t pi -t redisgeek \
    --bind all \
    --join 104.236.171.225

    View full-size slide

  15. Step #4: Check servers
    r.db('rethinkdb').table('server_config')

    View full-size slide

  16. Step #5: Insert test data
    // Insert data into table
    r.table('data')
    .insert(
    // Insert data form Reddit
    r.http('reddit.com/r/rethinkdb.json')
    ('data')('children').map(r.row('data'))
    )
    // Query data
    r.table('data')

    View full-size slide

  17. Step #6: Check replica

    View full-size slide

  18. Automatic Failover
    Demo #1

    View full-size slide

  19. Step #1: Move data
    // Move all data to `redisgeek`
    r.table('data')
    .reconfigure({
    shards: 1,
    replicas: { 'redisgeek': 1 },
    primaryReplicaTag: 'redisgeek'
    })

    View full-size slide

  20. Step #2: Disconnect primary

    View full-size slide

  21. Step #3: Query data
    // Query table
    r.table('data') // Returns Error

    View full-size slide

  22. What happened?
    • We move all our data in 'redisgeek'
    • We disconnected 'redisgeek' from
    the network
    • Because we can't communicate
    with 'redisgeek' (primary replica),
    our data in inaccessible

    View full-size slide

  23. Step #4: Replicate data
    // Configure 3 replicas
    r.table('data')
    .reconfigure({
    shards: 1,
    replicas: 3
    })

    View full-size slide

  24. Step #5: Check replicas

    View full-size slide

  25. Step #6: Disconnect primary

    View full-size slide

  26. Step #7: Query data
    // Query table
    r.table('data') // We have data!

    View full-size slide

  27. Step #8: Insert data
    // Insert data
    r.table('data').insert({ hello: 'world' })

    View full-size slide

  28. What happened?
    • Node gets promoted to primary
    replica

    View full-size slide

  29. Step #8: Reconnect primary
    • 'redisgeek' comes back as primary

    View full-size slide

  30. Automatic Failover
    Replication and failover #2

    View full-size slide

  31. Step #1: Make main primary
    // Make `main` the primary replica
    r.table('data')
    .reconfigure({
    shards: 1,
    replicas: {
    'main': 1,
    'redisgeek': 1,
    'pishark': 1
    },
    primaryReplicaTag: 'main'
    })

    View full-size slide

  32. Step #2: Disconnect secondaries

    View full-size slide

  33. Step #3: Query data
    // Query table
    r.table('data').count() // 25

    View full-size slide

  34. Step #4: Attempt Insert
    r.table('data')
    .insert({ hello: 'world' }) // Error

    View full-size slide

  35. What happened?
    • Because the primary replica is
    connected data can be read
    • Because a majority of nodes are
    disconnected, data can't be
    written

    View full-size slide

  36. Questions
    • RethinkDB website:

    http://rethinkdb.com
    • New Failover Documentation:

    http://docs.rethinkdb.com/2.1/docs/
    failover/
    • Email me: [email protected]
    • Tweet: @thejsj, @rethinkdb

    View full-size slide