Slide 1

Slide 1 text

Managing state in realtime distributed systems Keeping it Realtime Martyn Loughran – @mloughran 7th November, 2011

Slide 2

Slide 2 text

Pusher: • Is a web service which helps developers add real-time functionality to their web applications • Scales the last mile delivery to the browser, adding higher level concepts • Is a real time distributed system

Slide 3

Slide 3 text

Distributed system

Slide 4

Slide 4 text

“A distributed system is a collection of independent computers that appears to its users as a single coherent system” Distributed Systems: Principles and Paradigms, Tanenbaum and Steen 2006

Slide 5

Slide 5 text

Why? • Scaling • High Availability

Slide 6

Slide 6 text

How to build a distributed system: • Decouple the application so that each function is handled by a separate component • Scale components horizontally, and independently • Make components tolerant to failure

Slide 7

Slide 7 text

Easier said than done • Components need to share state, which constantly changes • Components need to communicate • Handling failure is hard

Slide 8

Slide 8 text

The problems of state & solving problems in real time

Slide 9

Slide 9 text

State

Slide 10

Slide 10 text

SQL

Slide 11

Slide 11 text

What is state?

Slide 12

Slide 12 text

Long term state

Slide 13

Slide 13 text

State specific to one process

Slide 14

Slide 14 text

The kind of questions we’d like to answer in a real time distributed system

Slide 15

Slide 15 text

How many times was a URL tweeted?

Slide 16

Slide 16 text

Is a user currently online?

Slide 17

Slide 17 text

Consistent global state

Slide 18

Slide 18 text

How many users subscribed to a channel in the last 20s?

Slide 19

Slide 19 text

It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: - Consistency (all nodes see the same data at the same time) - Availability (a guarantee that every request receives a response about whether it was successful or failed) - Partition tolerance (the system continues to operate despite arbitrary message loss) http://en.wikipedia.org/wiki/CAP_theorem CAP theorem

Slide 20

Slide 20 text

Consistency isn’t free

Slide 21

Slide 21 text

Eventual consistency

Slide 22

Slide 22 text

Answer questions with map/reduce

Slide 23

Slide 23 text

“Do not communicate by sharing memory; instead, share memory by communicating.” Effective Go, Google State Messaging

Slide 24

Slide 24 text

Actors

Slide 25

Slide 25 text

Messaging

Slide 26

Slide 26 text

AMQP The SQL of messaging?

Slide 27

Slide 27 text

AMQP • Centralised message broker • Complex • Hard to scale • Hard to get high availability

Slide 28

Slide 28 text

ZeroMQ Build your own messaging

Slide 29

Slide 29 text

ZeroMQ: What is it? • Socket abstraction designed for messaging (not bytes) • Sockets include queuing • Abstracts the underlying sockets • Connect the sockets to form topologies • Messaging patterns • Devices

Slide 30

Slide 30 text

Share state by communicating • Components publish events • Package up state in bundles to solve problems

Slide 31

Slide 31 text

Problems with this approach • Scaling - how to shard your state • Event publishers and consumers easily become coupled • Handling failure is hard • Testing the stack is hard • ZeroMQ is still too low level most of the time

Slide 32

Slide 32 text

The wish list • Make sharding state easy • Define the messaging problem, rather than configuring ZMQ • Handle failure automatically • Make testing easy

Slide 33

Slide 33 text

Storm The solution?

Slide 34

Slide 34 text

Realtime map/reduce http://howfuckedismydatabase.com/nosql/

Slide 35

Slide 35 text

Storm • Stream processing • Topologies - describe computation as a graph • Graph built from sensible primitives: • Spouts & Bolts • Stream groupings: shuffle, fields, all, global • Automatic management of workers • Handles failure

Slide 36

Slide 36 text

In Conclusion • Storing state is a mess of compromises • Share state by communicating • Package up state in bundles to solve problems • Real-time map reduce is a world of new possibilities

Slide 37

Slide 37 text

Thanks! Martyn Loughran [email protected] @mloughran Come