Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB UK 2012: MongoDB Oplog Magic

mongodb
June 28, 2012
750

MongoDB UK 2012: MongoDB Oplog Magic

Mihnea Giurgea, Technical Lead, uberVU.
MongoDB Oplogs are quite easy to manipulate, and are potentially very rewarding, especially when you need to perform live database migrations. This talk will give a short intro into replication and how oplogs work, then a deeper dive into what can be achieved by redirecting oplogs from one cluster to another. For example, you could force two distinct clusters to mirror each other, or just have one cluster copy another.

mongodb

June 28, 2012
Tweet

Transcript

  1. Mongo Replica Set • asynchronous replication • multiple nodes that

    are copies of each other • one primary, multiple secondaries (slaves) • automatic election • writes are only handled by primary only!
  2. Oplog • oplog - special collection (capped) • oplog -

    records each write operation • replicas “tail the oplog” for new updates • new ops are replayed on secondaries
  3. Oplog example: insert {u'h': -1469300750073380169L, u'ns': u'mydb.tweets', u'o': {u'_id': ObjectId('4e95ae77a20e6164850761cd'),

    u'content': u'Lorem ipsum', u'nr': 16}, u'op': u'i', u'ts': Timestamp(1318432375, 1)}
  4. Oplog example: update {u'h': -5295451122737468990L, u'ns': u'mydb.tweets', u'o': {u'$set': {u'content':

    u'Lorem ipsum'}}, u'o2': {u'_id': ObjectId('4e95ae3616692111bb000001')}, u'op': u'u', u'ts': Timestamp(1318432339, 1)}
  5. Q: What we want? A: Cross-cluster oplog replay (replay oplogs

    from one mongo cluster to another). Custom oplog replay
  6. Custom oplog replay Q: How do we do that? A:

    Using OplogReplay (now Open Source) ./oplog-replay localhost:27017 localhost:27018
  7. How it works? A: Very similar to MongoDB internal oplog

    :) tail the oplog for new entries: apply oplog entry save timestamp of last entry
  8. How it works? • last timestamp is persisted on destination

    > oplogreplay.settings.findOne() { "_id" : "misc-lastts", "value" : { "t" : 1335960424000, "i" : 770 } } • restarting will replay entries newer than last timestamp
  9. Other features? TODO - explain what else can it do?

    - also db & collection regexp, start from point-in-time
  10. Want more? TODO - show how it can be easily

    extended? inheritance + skip deletes
  11. Inverted pyramid Q: How can we store historical data cheaper?

    A: Keep data in two distinct mongo clusters: • recent - only last 30 days, more resources • historical - all data, but less resources (or even more clusters...)
  12. Advantages Q: Why bother with distinct mongo clusters? A: Several

    reasons: • different # of shards • different # of replica sets • more / less RAM • adjust storage size
  13. Implementation (1) Setup an oplog-replay between clusters: TODO - add

    picture ( recent ) ---oplogreplat---> ( historical )
  14. Implementation (2) Modify your code to know about separation def

    get_data(since, until): results = [] T = compute_time_threshold() if since <= T: results += get_hist_data(since, T) if until > T: results += get_rcnt_data(T, until) return results
  15. Splitting Replica Sets TODO - show a diagram of before

    and after split before split after split | blogs |blogs | | posts ------> | posts | | comments | | comments |
  16. Splitting Replica Sets i. create new node (Secondary), wait for

    it to catch up ii. stop node, remove from ReplicaSet iii. hack its internal state to look like a NEW replica set iv. stop oplogreplay from point-in-time v. redirect your app code (all at once or one at a time, depending on your application needs)
  17. Limitations TODO - if the oplogreplay falls behind for too

    long, there is no recovery procedure