Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Implementing Real-Time Geo-Replication with
Ela...

Implementing Real-Time Geo-Replication with
ElasticSearch

ElasticSearch is a phenomenal data store -- its easy approach to scalability using symmetric nodes has dramatically improved the way we operate scalable persistence services. With this huge success, we find that some challenges still remain -- especially in operating ElasticSearch across geographically distant clusters for fault-tolerance and disaster recovery. In this talk, I'd like to share a set of new, open source ElasticSearch plugins I'm building that use the PubNub fault-tolerant global data stream network as a medium for cross-cluster document replication and indexing. This includes a storage event listener for document change propagation and a new ElasticSearch River for indexing. I'd love to get feedback on these use cases and plugin design and implementation and also hear about some of the geo-replication challenges other folks might be facing. I'll have the code up on GitHub with a HOWTO and a downloadable demo bundle that folks can try if they'd like to follow along during the presentation.
Presenter: Sunny Gleason is founder and Cloud Guy at SunnyCloud, a company that provides Cloud, Web &Mobile application development, hosting and operations to businesses in the cloud. He specializes in real-time protocols and scalable persistence solutions. Before all that, Sunny was a Platform Engineer developing Cloud Computing solutions at Ning andAmazon.com.

Avatar for Sunny Gleason

Sunny Gleason

October 14, 2014
Tweet

More Decks by Sunny Gleason

Other Decks in Technology

Transcript

  1. var self = this; • Sunny Gleason, All-Stack Engineer •

    Previous: Amazon (web services), Ning, Startup(1..N) • Current: SunnyCloud • Short Story: A-Team you can call on to help you build (or rescue) cloud services, web & mobile applications • Longer Story: Network of developers aiming to change the way businesses build applications
  2. take these… PubNub Account http://goo.gl/oJnTpv Quickstart Guide http://goo.gl/eFHYLc Quickstart Bundle

    http://goo.gl/d05uwg Changes Plugin http://goo.gl/tLVhhP River Plugin http://goo.gl/WhqvAu
  3. elasticsearch clustering • NODE : process / unit (~JVM) •

    PRIMARY : master of a shard * • REPLICA : copies of a shard
 
 * at a particular moment in time
  4. global network model • Availability zone : a single data

    center* • Region : a collection of data centers within ~1ms
 
 * for the purposes of fault-tolerance
  5. how to geo-replicate in ElasticSearch? • Create routing configuration for

    global index and shard placement • Update each ElasticSearch cluster with its own version of the configuration • IAD: [me, SFO, DUB] • SFO: [IAD, me, DUB] • DUB: [IAD, SFO, me]
  6. issues w/ ElasticSearch geo- replication out-of-the-box • lots of global

    configuration state • geo-distant sites see each other’s internals (violates encapsulation) • N:N networking - topologically inefficient • requires network connectivity among all nodes • reasoning about failure is extremely difficult
  7. our vision • what if each logical data store had

    a publish/ subscribe channel? • what if each primary cluster could publish changes to that channel? • what if each replica cluster could simply listen on that channel and apply updates to its local index? • what if there was smart routing so that global update propagation follows a minimal spanning tree?
  8. what we’d do • write an ElasticSearch Changes plugin so

    that updates are published to a channel • write an ElasticSearch River plugin so that updates from channel(s) could be applied to the local index
  9. doesn’t this already exist? • Amazon Simple Queue Service (SQS):

    1:1 messaging, not global, not easy make M:N • RabbitMQ / AMQP : probably more challenging than ElasticSearch to set up globally in a fault- tolerant manner • It’s hard to find or create a global system with consistent availability & real-time performance
  10. PubNub properties • global data stream network • 14 global

    data centers • global update propagation in < 250ms • publish/subscribe with presence & history
  11. PubNub Changes Plugin • extends IndexingOperationListener • Attaches to PRIMARY

    indexes, publishes 3 types of events to PubNub channel • CREATE, INDEX, DELETE • Create & Index require OpType to be set • Feedback welcome!
  12. PubNub River Plugin • implements River • Subscribes to PubNub

    channel • Replays operations from channel against local index(es) • Not quite happy when version conflicts occur • Feedback welcome!
  13. interesting aspects • presence support allows operational insight
 (similar to

    a chat room where “users” are the cluster members) • history support allows messages to be replayed (configurable message retention) • built-in transport & message-level encryption can provide a reasonable level of security for many use cases
  14. related work • PubNub MongoDB Plugin : http://goo.gl/4etuYK • PubNub

    Redis Plugin : http://goo.gl/2Sf33N 
 
 Allow updates to be propagated to/replayed from a PubNub channel
  15. future work • handling batch calls • versioning in a

    multi-master world • finding and fixing failures in a distributed model • semantics and better ordering guarantees to support higher update rates • anti-entropy, possibly using: ElasticSearch transaction log, PubNub history, checksum trees • operational insight using presence features • more polyglot persistence use cases