Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From the event loop to the distributed system

mloughran
November 10, 2011

From the event loop to the distributed system

Talk for RubyConf Brazil introducing Pusher, some patterns for managing complexity in evented code, and some thoughts on distributed systems - specifically how we manage state and messaging at Pusher.

mloughran

November 10, 2011
Tweet

More Decks by mloughran

Other Decks in Programming

Transcript

  1. From the event loop to the distributed system RubyConf Brazil

    Martyn Loughran – @mloughran 3rd November, 2011
  2. From the event loop to the distributed system • An

    introduction to Pusher • The event loop • Why you’d use it • Managing complexity • The distributed system • Some general considerations • Some specific problems and how we solved them
  3. Who am I? • Martyn Loughran • CTO of Pusher

    • We’re based in London, England • Rubyist and EventMachine enthusiast • Started building Pusher in January 2010 • Eu não falo Português
  4. So what is Pusher anyway? • A web service which

    helps developers add real-time functionality to their web applications • It makes scaling easy • Complex distributed system I
  5. WebSocket, the basics: • A pretty silly logo • Sockets

    for the web • Bidirectional • Low latency • Bandwidth efficient • Already supported in Safari, Chrome, and Firefox • Coming to IE in version 10
  6. Why use an event loop? • To handle massive numbers

    of connections • To share data without Mutexes • Efficient scheduling of work
  7. It’s really easy to use in Ruby require 'eventmachine' EM.run

    do # Start a server # Make some network connections # Create a timer # etc. end
  8. Using callbacks and deferrable objects EM.run { stream = TwitterStream.new('yourtwitterusername',

    'pass', 'term') stream.ontweet { |tweet| LanguageDetector.new(tweet).callback { |lang| puts "New tweet in #{lang}: #{tweet}" } } }
  9. Return a deferrable from a function def do_something_complex df =

    EM::DefaultDeferrable.new use_lots_of_callbacks { ... { df.succeed(result) } ... df.fail(error) } return df end
  10. Pass a deferrable to a strategy Juggler.juggle(:send_webhook, 100) do |df,

    job_params| http = EM::HttpRequest.new(job_params['url'].post({ :body => job_params["data"] }) http.callback do |response| df.success end http.errback do df.fail end end
  11. “A distributed system is a collection of independent computers that

    appears to its users as a single coherent system” Distributed Systems: Principles and Paradigms, Tanenbaum and Steen 2006
  12. The distributed system • Why would I build one? •

    Work doesn’t fit on a single machine any more • You need better availability • How can I make one? • Decouple the application so that each function is handled by a separate component • Scale components horizontally, and independantly • Make components tolerant to failure
  13. “Do not communicate by sharing memory; instead, share memory by

    communicating.” Effective Go, Google State Messaging
  14. It is impossible for a distributed computer system to simultaneously

    provide all three of the following guarantees: - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem State: CAP theorem
  15. State: More questions • What performance do you need? •

    How durable does it need to be? • How much data do you need to store? • Does it need to be highly available? • Does it need to be consistent / eventually consistent?
  16. MySQL ~ 20GB • Consistent • Durable • Not highly

    available - but this doesn’t matter • Rails models • Aggregated usage statistics
  17. Redis ~ 500MB • Consistent • Very fast • Shared

    memory for all processes • Some current statistics, waiting to be aggregated
  18. ZooKeeper ~ 1MB • Slow • Consistent • Highly available

    • Not partition tolerant • Processes state, and assignment of roles
  19. Messaging • Central broker • AMQP - the SQL of

    messaging? • A single all powerful box • Simple, but hard to scale • Custom messaging topologies • ZeroMQ - point to point, fanout, pubsub, load balanced • Lots of choices, therefore complex • This is the future, but we’re not quite there yet
  20. Usage statistics and latency metrics • Loads of events •

    Collect incrementers and distributions in memory • Flush to redis every minute • Eventually consistent state In memory Redis MySQL
  21. Storing presence information • Need to know when a user

    joins or leaves a channel • Needs to be consistent across processes • Use redis incrementers • Needs to survive process failure • Use a global hash, and a hash per process, with redis transactions • Consistent state
  22. Optimising internal messaging • Debug console shows all events for

    all connections • Unnecessary messaging, most of the time • Only publish data when it’s needed • Eventually consistent, distributed state, cached in memory
  23. Redis caches, and live caches # (pseudo simplified version) set

    = RedisLiveSet.new("debug_open") set.add('42') # redis.sadd("debug_open", 42) # redis.publish("debug_open", ["sadd", "42"]) # On another process set.member?('42') # Checks the in memory set
  24. Recovering from process failure • Store process UUIDs in ZooKeeper

    as ephemeral files • Leader process notices process failure, and takes required action • Low volume, highly available, and consistent
  25. In Conclusion • Consider an event loop for concurrency •

    EventMachine is great, you don’t need to use node.js • Think about state & messaging • It’s all about compromises; there are no right answers • Find creative solutions to your problems