Implementing Real-Time Geo-Replication with ElasticSearch

http://goo.gl/mfQcmo Implementing Real-Time Geo-Replication with  ElasticSearch Sunny Gleason Boston ElasticSearch
Meetup October 14, 2014

var self = this; • Sunny Gleason, All-Stack Engineer •
Previous: Amazon (web services), Ning, Startup(1..N) • Current: SunnyCloud • Short Story: A-Team you can call on to help you build (or rescue) cloud services, web & mobile applications • Longer Story: Network of developers aiming to change the way businesses build applications

take these… PubNub Account http://goo.gl/oJnTpv Quickstart Guide http://goo.gl/eFHYLc Quickstart Bundle
http://goo.gl/d05uwg Changes Plugin http://goo.gl/tLVhhP River Plugin http://goo.gl/WhqvAu

elasticsearch clustering • NODE : process / unit (~JVM) •
PRIMARY : master of a shard * • REPLICA : copies of a shard    * at a particular moment in time

global network model • Availability zone : a single data
center* • Region : a collection of data centers within ~1ms    * for the purposes of fault-tolerance

geo-replication

how to geo-replicate in ElasticSearch? • Create routing conﬁguration for
global index and shard placement • Update each ElasticSearch cluster with its own version of the conﬁguration • IAD: [me, SFO, DUB] • SFO: [IAD, me, DUB] • DUB: [IAD, SFO, me]

issues w/ ElasticSearch geo- replication out-of-the-box • lots of global
configuration state • geo-distant sites see each other’s internals (violates encapsulation) • N:N networking - topologically inefficient • requires network connectivity among all nodes • reasoning about failure is extremely difficult

our vision • what if each logical data store had
a publish/ subscribe channel? • what if each primary cluster could publish changes to that channel? • what if each replica cluster could simply listen on that channel and apply updates to its local index? • what if there was smart routing so that global update propagation follows a minimal spanning tree?

what we’d do • write an ElasticSearch Changes plugin so
that updates are published to a channel • write an ElasticSearch River plugin so that updates from channel(s) could be applied to the local index

doesn’t this already exist? • Amazon Simple Queue Service (SQS):
1:1 messaging, not global, not easy make M:N • RabbitMQ / AMQP : probably more challenging than ElasticSearch to set up globally in a fault- tolerant manner • It’s hard to ﬁnd or create a global system with consistent availability & real-time performance

hello pubnub OK, we have a Data Stream Network…

PubNub properties • global data stream network • 14 global
data centers • global update propagation in < 250ms • publish/subscribe with presence & history

hello elasticsearch and away we go…

hello elasticsearch

PubNub Changes Plugin • extends IndexingOperationListener • Attaches to PRIMARY
indexes, publishes 3 types of events to PubNub channel • CREATE, INDEX, DELETE • Create & Index require OpType to be set • Feedback welcome!

PubNub River Plugin • implements River • Subscribes to PubNub
channel • Replays operations from channel against local index(es) • Not quite happy when version conﬂicts occur • Feedback welcome!

interesting aspects • presence support allows operational insight  (similar to
a chat room where “users” are the cluster members) • history support allows messages to be replayed (conﬁgurable message retention) • built-in transport & message-level encryption can provide a reasonable level of security for many use cases

related work • PubNub MongoDB Plugin : http://goo.gl/4etuYK • PubNub
Redis Plugin : http://goo.gl/2Sf33N     Allow updates to be propagated to/replayed from a PubNub channel

future work • handling batch calls • versioning in a
multi-master world • ﬁnding and ﬁxing failures in a distributed model • semantics and better ordering guarantees to support higher update rates • anti-entropy, possibly using: ElasticSearch transaction log, PubNub history, checksum trees • operational insight using presence features • more polyglot persistence use cases

… and you’re done! (for now)    questions/feedback? thank you
so much!

Implementing Real-Time Geo-Replication with Ela...

Implementing Real-Time Geo-Replication with ElasticSearch

Sunny Gleason

More Decks by Sunny Gleason

Other Decks in Technology

Featured

Transcript

http://goo.gl/mfQcmo Implementing Real-Time Geo-Replication with  ElasticSearch Sunny Gleason Boston ElasticSearch

var self = this; • Sunny Gleason, All-Stack Engineer •

take these… PubNub Account http://goo.gl/oJnTpv Quickstart Guide http://goo.gl/eFHYLc Quickstart Bundle

elasticsearch clustering • NODE : process / unit (~JVM) •

global network model • Availability zone : a single data

geo-replication

how to geo-replicate in ElasticSearch? • Create routing conﬁguration for

issues w/ ElasticSearch geo- replication out-of-the-box • lots of global

our vision • what if each logical data store had

what we’d do • write an ElasticSearch Changes plugin so

doesn’t this already exist? • Amazon Simple Queue Service (SQS):

hello pubnub OK, we have a Data Stream Network…

PubNub properties • global data stream network • 14 global

hello elasticsearch and away we go…

hello elasticsearch

PubNub Changes Plugin • extends IndexingOperationListener • Attaches to PRIMARY

PubNub River Plugin • implements River • Subscribes to PubNub

interesting aspects • presence support allows operational insight  (similar to

related work • PubNub MongoDB Plugin : http://goo.gl/4etuYK • PubNub

future work • handling batch calls • versioning in a

… and you’re done! (for now)    questions/feedback? thank you

Implementing Real-Time Geo-Replication with Ela...

Implementing Real-Time Geo-Replication with ElasticSearch

More Decks by Sunny Gleason

Other Decks in Technology

Featured

Transcript

Implementing Real-Time Geo-Replication with Ela...

Implementing Real-Time Geo-Replication with ElasticSearch