Making DISQUS Realtime - Speaker Deck

Slide 1

Slide 1 text

Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13 suposed to be and advanced talk, but i don’t know what that means (i assume you know some thing)...so I hope this doesn’t bore anybody. - I HATE LONG TALKS, OPEN SPACE. - I TALK FAST, if you can’t understand me please tell me to slow down

Slide 2

Slide 2 text

what is DISQUS? Thursday, March 21, 13

Slide 3

Slide 3 text

Thursday, March 21, 13 - community platform - main product is a comment widget (javascript) - main backend is Django + postgres + redis - but we are adding ﬂask into the mix with our realtime system

Slide 4

Slide 4 text

why do realtime? ๏ getting new data to the user asap ๏ increased engagement ๏ looks awesome ๏ we can sell it Thursday, March 21, 13 We deﬁne this as ‘less than 10 seconds’ but my goal was less than one.

Slide 5

Slide 5 text

Thursday, March 21, 13 live votes, typing, posts

Slide 6

Slide 6 text

how many of you currently have a realtime component? Thursday, March 21, 13 well, so do we. and try to keep it short, because you are probably smarter than me, and I want to hear what you have to say (@NorthIsUp)

Slide 7

Slide 7 text

realtime ๏ polls memcache ๏ is kinda #failscale Thursday, March 21, 13 - we set a per thread key, this key gets polled every ﬁve seconds. - problem is... that is is a #failscale

Slide 8

Slide 8 text

DISQUS sees a lot of tra c Google Analytics: May 29 2012 - June 28 2012 Thursday, March 21, 13 the problem is that... - at max capacity - less than 100 thousand concurrent users so i was charged with ﬁxing this

Slide 9

Slide 9 text

realertime ๏ currently active on all DISQUS 2012 sites ๏ tested ‘dark’ on ~50% of our network ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ ~.2 seconds latency end to end Thursday, March 21, 13 - i’ll re-visit on what dark means later on the testing slides - describe heavy tail distribution of popularity - end to end does NOT include the DISQUS app DEMO IT then “so how did we build this?”

Slide 10

Slide 10 text

so, how did we do it? Thursday, March 21, 13

Slide 11

Slide 11 text

technology ๏ node.js and mongodb for webscale Thursday, March 21, 13

Slide 12

Slide 12 text

technology ๏ just kidding :) we used python Thursday, March 21, 13

Slide 13

Slide 13 text

technology ๏ gevent ๏ gunicorn ๏ ﬂask ๏ thoonk (a queue built on redis) ๏ redis ๏ nginx ๏ haproxy Thursday, March 21, 13 pubsub == redis thoonk == queues thoonk (each message is delivered to only one subscriber)

Slide 14

Slide 14 text

architecture overview Thursday, March 21, 13

Slide 15

Slide 15 text

architecture overview redis pub/sub redis queue “Frontend” Gunicorn and Flask “Backend” Gevent server New Posts django redis pub/sub nginx + haproxy Thursday, March 21, 13 - HARDWARE - HA - 3 ﬂows - new info -> pubsub - new subscriptions - pubsub -> subscriptions

Slide 16

Slide 16 text

architecture overview DISQUS Formatter Multiplexer Publisher Listener Sub Pool Requests Incoming HTTP requests from the interwebs redis pub/sub thoonk queue New Posts redis pub/sub Thursday, March 21, 13 DON’T DESCRIBE THESE THINGS HERE, you do that later

Slide 17

Slide 17 text

the backend Thursday, March 21, 13

Slide 18

Slide 18 text

the backend ๏ listens to a Thoonk queue ๏ cleans & formats message ๏ this is the ﬁnal format before http publish ๏ compress data now ๏ publish message to pubsub ๏ forum:id, thread:id, user:id, post:id Formatter Multiplexer Publisher Thursday, March 21, 13 - pipeline semantics (draw gthreads + loops) - end to end ack via thoonk. (not removed until fully published) - not e2e for public consumption, just paid. - if message is over 15 seconds old... - how is this part of the system HA?

Slide 19

Slide 19 text

the backend ๏ average processing time is ~0.2 seconds ๏ queue maintenance ๏ ACK timeouts (5 secondsish) Thursday, March 21, 13 HA before maintenance zookeeper

Slide 20

Slide 20 text

random redis lessons ๏ separate pub/sub and non pub/sub redis usage by physical node ๏ transactions can be prickly Thursday, March 21, 13 transactions can trip you up, atomic is good, but they are way more expensive

Slide 21

Slide 21 text

the backend # redis key for the 'claimed' zset claimed = thoonk_worker.feed_claimed # what jobs to re-queue too_late = int((time() - MAX_AGE) * 1000) # get and cancel jobs job_ids = redis.zrange(claimed, 0, too_late) if len(job_ids): for job_id in job_ids: thoonk_worker.cancel(job_id) Thursday, March 21, 13

Slide 22

Slide 22 text

gevent is nice # the code is too big to show here, so just import it # http://bitly.com/geventspawn from realertime.lib.spawn import Watchdog from realertime.lib.spawn import TimeSensitiveBackoff Thursday, March 21, 13 - /lots/ of greenlets, need a way to manage them NAY! manage themselves - sleep(0)

Slide 23

Slide 23 text

the frontend Thursday, March 21, 13

Slide 24

Slide 24 text

the frontend ๏ needs to be fast! ๏ pools redis connections ๏ routes messages from pubsub to http Thursday, March 21, 13 how is this part of the system HA?

Slide 25

Slide 25 text

the frontend ๏ new request! ๏ create/register a subscription with the pool ๏ sub pool returns a (python) queue based on the channel Listener Sub Pool Requests Thursday, March 21, 13

Slide 26

Slide 26 text

the frontend ๏ Listener receives message on a pubsub channel ๏ If that channel has a subscriber pass it on ๏ subscriber then passes message on to all appropriate requests Listener Sub Pool Requests Thursday, March 21, 13

Slide 27

Slide 27 text

long pollingish ๏ long held http connection ๏ stream JSON over this http connection Thursday, March 21, 13 long held, because why close it? you just put all that work into opening it! Why not websockets? - standard was still in ﬂux when we started - http is maximum compatibility - we are going to add support in v2 - our problem does not require a symmetric communication pipe

Slide 28

Slide 28 text

long pollingish def __subscription_generator(self, q): #Returns a generator for the WSGI response try: to = Timeout(self.timeout_duration) to.start() while True: queue_data = q.get() # one per line yield queue_data['data'] + '\n' except Timeout, t: if t is to: pass else: raise t finally: self.unsubscribe(q) Thursday, March 21, 13 - wsgi can take a generator and it will yield that data as it gets it. - Content-Type needs to be ‘application/json’?

Slide 29

Slide 29 text

pooling redis pub/sub # old way was pretty failscale def subscribe(redis, channel): pubsub = redis.pubsub() pubsub.subscribe(channel) with Timeout(30): while True: yield pubsub.listen() Thursday, March 21, 13 - works surprisingly well - results in 1:1 ratio of redis connections <-> http connections - anybody remember that number? it’s currently 1.5 million and growing

Slide 30

Slide 30 text

pooling redis pub/sub pipe = Queue() pipe.put(‘subscribe’, ‘thread:12345’) pipe.put(‘unsubscribe’, ‘forum:cnn’) ... elsewhere ... # new way is def listener(pubsub, pipe): for data in pubsub.listen(): # handle data here... # handle new subscriptions if not pipe.empty(): action, channel = pipe.get_nowait() getattr(pubsub, action)(channel) Thursday, March 21, 13 this is spawned in a thread heartbeat goal is to minimize redis connections but pubsub isn’t thread safe, and they mean it [q.put(data) for q in channel_proxy[data.channel]]

Slide 31

Slide 31 text

timeouts? ๏ needless reclaiming of ‘resources’ ๏ maximize usage of cheap things ๏ connection count ๏ minimize expensive things ๏ requests per second Thursday, March 21, 13 - cheap == memory - expensive == cpu - timout == Usually it means something took too long - timeout forces a new request, so let’s let the browser decide when that should happen. ~30 sec mobile ~300 sec desktop

Slide 32

Slide 32 text

test, measure, repeat Thursday, March 21, 13

Slide 33

Slide 33 text

testing ๏ Darktime ๏ use existing network to loadtest ๏ (user complaints when it didn’t work...) ๏ Darkesttime ๏ load testing a single thread ๏ have knobs you can twiddle Thursday, March 21, 13

Slide 34

Slide 34 text

stats ๏ measure all the things! ๏ especially when the numbers don’t line up ๏ is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ i used scales from greplin “metrics for py” Thursday, March 21, 13 scales! gauges and aggregation

Slide 35

Slide 35 text

lessons ๏ do hard work early ๏ defer work that you might never need ๏ end-to-end acks are good, but expensive ๏ timeouts are not free ๏ greenlets are e ectively free ๏ pubsub is e ectively free Thursday, March 21, 13 - data processing and json formatting done once not 1000x times - gziping done once not 1000x times - defer setting up the work in the generator until as late as possible - ditched e2e acks from the fe, cost way too much

Slide 36

Slide 36 text

nginx lessons location / { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_fo proxy_set_header Host $http_host; proxy_redirect off; # this line is really important proxy_buffering off; if (!-f $request_filename) { proxy_pass http://app_server; break; } } http://gunicorn.org/deploy.html Thursday, March 21, 13 - only one really - that this line is really really important for streaming data. we currently compress here (in production), THIS IS NOT SCALEABLE

Slide 37

Slide 37 text

slide full o’ links ๏ Gevent (python coroutines and greenlets) http://gevent.org/ ๏ Gunicorn (python pre-fork WSGI server) http://gunicorn.org/ ๏ Thoonk (redis queue) https://github.com/andyet/thoonk.py ๏ Sentry (log aggregation) https://github.com/dcramer/sentry ๏ Scales (in-app metrics) https://github.com/Greplin/scales code.disqus.com Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

Slide 38

Slide 38 text

special thanks ๏ the team at DISQUS ๏ especially our dev-ops guys ๏ and je who had to review all my code Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

Slide 39

Slide 39 text

open questions ๏ best system conﬁg for thousands of rps? ๏ how to make the front end faster? ๏ something faster than pywsgi? ๏ FapWS? ๏ libevent -> libev? (i.e. gevent 1.0) ๏ dump wsgi for raw sockets? (last resort) ๏ best internal python pub/sub option? Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

Slide 40

Slide 40 text

DISQUSsion? psst, we’re hiring disqus.com/jobs Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

Slide 41

Slide 41 text

Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp