From the event loop to the distributed system

From the event loop to the distributed system RubyConf Brazil
Martyn Loughran – @mloughran 3rd November, 2011

From the event loop to the distributed system • An
introduction to Pusher • The event loop • Why you’d use it • Managing complexity • The distributed system • Some general considerations • Some speciﬁc problems and how we solved them

Who am I? • Martyn Loughran • CTO of Pusher
• We’re based in London, England • Rubyist and EventMachine enthusiast • Started building Pusher in January 2010 • Eu não falo Português

Part I – An introduction to Pusher

So what is Pusher anyway? • A web service which
helps developers add real-time functionality to their web applications • It makes scaling easy • Complex distributed system I

Notiﬁcations

Collaboration

Data Sync CloudApp

WebSocket, the basics: • A pretty silly logo • Sockets
for the web • Bidirectional • Low latency • Bandwidth efﬁcient • Already supported in Safari, Chrome, and Firefox • Coming to IE in version 10

We use EventMachine • Evented IO for Ruby • em-websocket
• em-hiredis

“Ruby doesn’t scale!”

13,969,264 Number of API requests made yesterday (There are 86,400
seconds in a day)

35,552,810,379 Total number of messages sent to clients since launch

< 10ms Mean end to end latency (excluding the internet)

Part II – The Event Loop

Why use an event loop? • To handle massive numbers
of connections • To share data without Mutexes • Efﬁcient scheduling of work

It’s really easy to use in Ruby require 'eventmachine' EM.run
do # Start a server # Make some network connections # Create a timer # etc. end

WebSocket WebSocket WebSocket WebSocket Timer Timer Timer Redis The Pusher
event loop Redis Pubsub ZeroMQ

Never block the reactor

Don’t get caught up in callback spaghetti http://rubylearning.com/blog/2010/10/01/an-introduction-to-eventmachine-and-how-to-avoid-callback-spaghetti/

Using callbacks and deferrable objects EM.run { stream = TwitterStream.new('yourtwitterusername',
'pass', 'term') stream.ontweet { |tweet| LanguageDetector.new(tweet).callback { |lang| puts "New tweet in #{lang}: #{tweet}" } } }

Return a deferrable from a function def do_something_complex df =
EM::DefaultDeferrable.new use_lots_of_callbacks { ... { df.succeed(result) } ... df.fail(error) } return df end

Pass a deferrable to a strategy Juggler.juggle(:send_webhook, 100) do |df,
job_params| http = EM::HttpRequest.new(job_params['url'].post({ :body => job_params["data"] }) http.callback do |response| df.success end http.errback do df.fail end end

Or try Fibers http://www.igvita.com/2010/03/22/untangling-evented-code-with-ruby-ﬁbers/

Part III – The distributed system

“A distributed system is a collection of independent computers that
appears to its users as a single coherent system” Distributed Systems: Principles and Paradigms, Tanenbaum and Steen 2006

The distributed system • Why would I build one? •
Work doesn’t ﬁt on a single machine any more • You need better availability • How can I make one? • Decouple the application so that each function is handled by a separate component • Scale components horizontally, and independantly • Make components tolerant to failure

Messaging

“Do not communicate by sharing memory; instead, share memory by
communicating.” Effective Go, Google State Messaging

It is impossible for a distributed computer system to simultaneously
provide all three of the following guarantees: - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem State: CAP theorem

State: More questions • What performance do you need? •
How durable does it need to be? • How much data do you need to store? • Does it need to be highly available? • Does it need to be consistent / eventually consistent?

State – so what do we use?

MySQL ~ 20GB • Consistent • Durable • Not highly
available - but this doesn’t matter • Rails models • Aggregated usage statistics

Redis ~ 500MB • Consistent • Very fast • Shared
memory for all processes • Some current statistics, waiting to be aggregated

ZooKeeper ~ 1MB • Slow • Consistent • Highly available
• Not partition tolerant • Processes state, and assignment of roles

Messaging • Central broker • AMQP - the SQL of
messaging? • A single all powerful box • Simple, but hard to scale • Custom messaging topologies • ZeroMQ - point to point, fanout, pubsub, load balanced • Lots of choices, therefore complex • This is the future, but we’re not quite there yet

Messaging – what do we use? • Redis pub/sub •
ZeroMQ • Beanstalkd

Some examples

Usage statistics and latency metrics • Loads of events •
Collect incrementers and distributions in memory • Flush to redis every minute • Eventually consistent state In memory Redis MySQL

Storing presence information • Need to know when a user
joins or leaves a channel • Needs to be consistent across processes • Use redis incrementers • Needs to survive process failure • Use a global hash, and a hash per process, with redis transactions • Consistent state

Optimising internal messaging • Debug console shows all events for
all connections • Unnecessary messaging, most of the time • Only publish data when it’s needed • Eventually consistent, distributed state, cached in memory

Redis caches, and live caches # (pseudo simplified version) set
= RedisLiveSet.new("debug_open") set.add('42') # redis.sadd("debug_open", 42) # redis.publish("debug_open", ["sadd", "42"]) # On another process set.member?('42') # Checks the in memory set

Recovering from process failure • Store process UUIDs in ZooKeeper
as ephemeral ﬁles • Leader process notices process failure, and takes required action • Low volume, highly available, and consistent

Some other thoughts...

Avoid conﬁguration

Distributed locking

Delay anything you can

Think about concurrency

In Conclusion • Consider an event loop for concurrency •
EventMachine is great, you don’t need to use node.js • Think about state & messaging • It’s all about compromises; there are no right answers • Find creative solutions to your problems

Thanks for listening Martyn Loughran [email protected] @mloughran Come

From the event loop to the distributed system

From the event loop to the distributed system

More Decks by mloughran

Other Decks in Programming

Featured

Transcript