Mongo Replica Set
● asynchronous replication
● multiple nodes that are copies of each other
● one primary, multiple secondaries (slaves)
● automatic election
● writes are only handled by primary only!
Slide 3
Slide 3 text
Mongo Replica Set
Slide 4
Slide 4 text
● oplog - special collection (capped)
● oplog - records each write operation
● replicas “tail the oplog” for new updates
● new ops are replayed on secondaries
Q: What we want?
A: Cross-cluster oplog replay (replay oplogs
from one mongo cluster to another).
Custom oplog replay
Slide 8
Slide 8 text
Custom oplog replay
Q: How do we do that?
A: Using OplogReplay (now Open Source)
./oplog-replay localhost:27017
Slide 9
Slide 9 text
How it works?
A: Very similar to MongoDB internal oplog :)
tail the oplog for new entries:
apply oplog entry
save timestamp of last entry
Slide 10
Slide 10 text
How it works?
● last timestamp is persisted on destination
> oplogreplay.settings.findOne()
{ "_id" : "misc-lastts",
"value" : { "t" : 1335960424000,
"i" : 770 } }
● restarting will replay entries newer than last
Slide 11
Slide 11 text
Other features?
TODO - explain what else can it do? - also db &
collection regexp, start from point-in-time
Slide 12
Slide 12 text
Want more?
TODO - show how it can be easily extended?
inheritance + skip deletes
Slide 13
Slide 13 text
Inverted pyramid
Recent data is more important
TODO - add picture here (see notes)
Slide 14
Slide 14 text
Inverted pyramid
Q: How can we store historical data cheaper?
A: Keep data in two distinct mongo clusters:
● recent - only last 30 days, more resources
● historical - all data, but less resources
(or even more clusters...)
Slide 15
Slide 15 text
Q: Why bother with distinct mongo clusters?
A: Several reasons:
● different # of shards
● different # of replica sets
● more / less RAM
● adjust storage size
Slide 16
Slide 16 text
Implementation (1)
Setup an oplog-replay between clusters:
TODO - add picture
( recent ) ---oplogreplat---> ( historical )
Slide 17
Slide 17 text
Implementation (2)
Modify your code to know about separation
def get_data(since, until):
results = []
T = compute_time_threshold()
if since <= T:
results += get_hist_data(since, T)
if until > T:
results += get_rcnt_data(T, until)
return results
Slide 18
Slide 18 text
Splitting Replica Sets
TODO - show a diagram of before and after
before split after split
| blogs |blogs |
| posts ------> | posts | | comments |
| comments |
Slide 19
Slide 19 text
Splitting Replica Sets
● split one mongo cluster in two
with no downtime
Slide 20
Slide 20 text
Splitting Replica Sets
Q: How it's done?
Slide 21
Slide 21 text
Splitting Replica Sets
create new node (Secondary), wait for it to
catch up
stop node, remove from ReplicaSet
hack its internal state to look like a NEW
replica set
stop oplogreplay from point-in-time
redirect your app code (all at once or one at
a time, depending on your application
Slide 22
Slide 22 text
TODO - mongos → one oplogreplay per shard,
but balancer deletes is an issue!
Slide 23
Slide 23 text
TODO - how to overcome balancer issue?
Slide 24
Slide 24 text
TODO - if the oplogreplay falls behind for too
long, there is no recovery procedure
Slide 25
Slide 25 text
OplogReplay @ uberVU
TODO - recent / historical split