Distributed Systems
What makes RethinkDB
distributed?
Slide 4
Slide 4 text
What is RethinkDB?
• Open source database for building
realtime web applications
• NoSQL database that
stores schemaless JSON documents
• Distributed database that is easy to
scale
Slide 5
Slide 5 text
What makes it distributed?
• Allows simple sharding and
replication of tables
• Allows you to easily connect
nodes to a cluster using `--join`
Slide 6
Slide 6 text
The problem
• When one of your nodes goes
down, you needed to manually
decide what to do
Slide 7
Slide 7 text
Automatic Failover
RethinkDB 2.1
Slide 8
Slide 8 text
What's new in 2.1
• RethinkDB 2.1 introduces
automatic failover
• It uses Raft as the consensus
algorithm
Slide 9
Slide 9 text
Replicas
• Primary replicas serve as the
authoritative copy of the data
• Secondary replicas serve as a
mirror of the primary replica
Slide 10
Slide 10 text
Automatic Failover
• In RethinkDB, automatic failover
takes care of promoting secondary
replicas into primary replicas when
a primary replica is unavailable
• The cluster picks new primaries by
voting. New server need a majority
vote.
Step #5: Insert test data
// Insert data into table
r.table('data')
.insert(
// Insert data form Reddit
r.http('reddit.com/r/rethinkdb.json')
('data')('children').map(r.row('data'))
)
// Query data
r.table('data')
Slide 17
Slide 17 text
Step #6: Check replica
Slide 18
Slide 18 text
Automatic Failover
Demo #1
Slide 19
Slide 19 text
Step #1: Move data
// Move all data to `redisgeek`
r.table('data')
.reconfigure({
shards: 1,
replicas: { 'redisgeek': 1 },
primaryReplicaTag: 'redisgeek'
})
What happened?
• We move all our data in 'redisgeek'
• We disconnected 'redisgeek' from
the network
• Because we can't communicate
with 'redisgeek' (primary replica),
our data in inaccessible