Making DISQUS Realtime

Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13
suposed to be and advanced talk, but i don’t know what that means (i assume you know some thing)...so I hope this doesn’t bore anybody. - I HATE LONG TALKS, OPEN SPACE. - I TALK FAST, if you can’t understand me please tell me to slow down

what is DISQUS? Thursday, March 21, 13

Thursday, March 21, 13 - community platform - main product
is a comment widget (javascript) - main backend is Django + postgres + redis - but we are adding ﬂask into the mix with our realtime system

why do realtime? ๏ getting new data to the user
asap ๏ increased engagement ๏ looks awesome ๏ we can sell it Thursday, March 21, 13 We deﬁne this as ‘less than 10 seconds’ but my goal was less than one.

Thursday, March 21, 13 live votes, typing, posts

how many of you currently have a realtime component? Thursday,
March 21, 13 well, so do we. and try to keep it short, because you are probably smarter than me, and I want to hear what you have to say (@NorthIsUp)

realtime ๏ polls memcache ๏ is kinda #failscale Thursday, March
21, 13 - we set a per thread key, this key gets polled every ﬁve seconds. - problem is... that is is a #failscale

DISQUS sees a lot of tra c Google Analytics: May
29 2012 - June 28 2012 Thursday, March 21, 13 the problem is that... - at max capacity - less than 100 thousand concurrent users so i was charged with ﬁxing this

realertime ๏ currently active on all DISQUS 2012 sites ๏
tested ‘dark’ on ~50% of our network ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ ~.2 seconds latency end to end Thursday, March 21, 13 - i’ll re-visit on what dark means later on the testing slides - describe heavy tail distribution of popularity - end to end does NOT include the DISQUS app DEMO IT then “so how did we build this?”

so, how did we do it? Thursday, March 21, 13

technology ๏ node.js and mongodb for webscale Thursday, March 21,
13

technology ๏ just kidding :) we used python Thursday, March
21, 13

technology ๏ gevent ๏ gunicorn ๏ ﬂask ๏ thoonk (a
queue built on redis) ๏ redis ๏ nginx ๏ haproxy Thursday, March 21, 13 pubsub == redis thoonk == queues thoonk (each message is delivered to only one subscriber)

architecture overview Thursday, March 21, 13

architecture overview redis pub/sub redis queue “Frontend” Gunicorn and Flask
“Backend” Gevent server New Posts django redis pub/sub nginx + haproxy Thursday, March 21, 13 - HARDWARE - HA - 3 ﬂows - new info -> pubsub - new subscriptions - pubsub -> subscriptions

architecture overview DISQUS Formatter Multiplexer Publisher Listener Sub Pool Requests
Incoming HTTP requests from the interwebs redis pub/sub thoonk queue New Posts redis pub/sub Thursday, March 21, 13 DON’T DESCRIBE THESE THINGS HERE, you do that later

the backend Thursday, March 21, 13

the backend ๏ listens to a Thoonk queue ๏ cleans
& formats message ๏ this is the ﬁnal format before http publish ๏ compress data now ๏ publish message to pubsub ๏ forum:id, thread:id, user:id, post:id Formatter Multiplexer Publisher Thursday, March 21, 13 - pipeline semantics (draw gthreads + loops) - end to end ack via thoonk. (not removed until fully published) - not e2e for public consumption, just paid. - if message is over 15 seconds old... - how is this part of the system HA?

the backend ๏ average processing time is ~0.2 seconds ๏
queue maintenance ๏ ACK timeouts (5 secondsish) Thursday, March 21, 13 HA before maintenance zookeeper

random redis lessons ๏ separate pub/sub and non pub/sub redis
usage by physical node ๏ transactions can be prickly Thursday, March 21, 13 transactions can trip you up, atomic is good, but they are way more expensive

the backend # redis key for the 'claimed' zset claimed
= thoonk_worker.feed_claimed # what jobs to re-queue too_late = int((time() - MAX_AGE) * 1000) # get and cancel jobs job_ids = redis.zrange(claimed, 0, too_late) if len(job_ids): for job_id in job_ids: thoonk_worker.cancel(job_id) Thursday, March 21, 13

gevent is nice # the code is too big to
show here, so just import it # http://bitly.com/geventspawn from realertime.lib.spawn import Watchdog from realertime.lib.spawn import TimeSensitiveBackoff Thursday, March 21, 13 - /lots/ of greenlets, need a way to manage them NAY! manage themselves - sleep(0)

the frontend Thursday, March 21, 13

the frontend ๏ needs to be fast! ๏ pools redis
connections ๏ routes messages from pubsub to http Thursday, March 21, 13 how is this part of the system HA?

the frontend ๏ new request! ๏ create/register a subscription with
the pool ๏ sub pool returns a (python) queue based on the channel Listener Sub Pool Requests Thursday, March 21, 13

the frontend ๏ Listener receives message on a pubsub channel
๏ If that channel has a subscriber pass it on ๏ subscriber then passes message on to all appropriate requests Listener Sub Pool Requests Thursday, March 21, 13

long pollingish ๏ long held http connection ๏ stream JSON
over this http connection Thursday, March 21, 13 long held, because why close it? you just put all that work into opening it! Why not websockets? - standard was still in ﬂux when we started - http is maximum compatibility - we are going to add support in v2 - our problem does not require a symmetric communication pipe

long pollingish def __subscription_generator(self, q): #Returns a generator for the
WSGI response try: to = Timeout(self.timeout_duration) to.start() while True: queue_data = q.get() # one per line yield queue_data['data'] + '\n' except Timeout, t: if t is to: pass else: raise t finally: self.unsubscribe(q) Thursday, March 21, 13 - wsgi can take a generator and it will yield that data as it gets it. - Content-Type needs to be ‘application/json’?

pooling redis pub/sub # old way was pretty failscale def
subscribe(redis, channel): pubsub = redis.pubsub() pubsub.subscribe(channel) with Timeout(30): while True: yield pubsub.listen() Thursday, March 21, 13 - works surprisingly well - results in 1:1 ratio of redis connections <-> http connections - anybody remember that number? it’s currently 1.5 million and growing

pooling redis pub/sub pipe = Queue() pipe.put(‘subscribe’, ‘thread:12345’) pipe.put(‘unsubscribe’, ‘forum:cnn’)
... elsewhere ... # new way is def listener(pubsub, pipe): for data in pubsub.listen(): # handle data here... # handle new subscriptions if not pipe.empty(): action, channel = pipe.get_nowait() getattr(pubsub, action)(channel) Thursday, March 21, 13 this is spawned in a thread heartbeat goal is to minimize redis connections but pubsub isn’t thread safe, and they mean it [q.put(data) for q in channel_proxy[data.channel]]

timeouts? ๏ needless reclaiming of ‘resources’ ๏ maximize usage of
cheap things ๏ connection count ๏ minimize expensive things ๏ requests per second Thursday, March 21, 13 - cheap == memory - expensive == cpu - timout == Usually it means something took too long - timeout forces a new request, so let’s let the browser decide when that should happen. ~30 sec mobile ~300 sec desktop

test, measure, repeat Thursday, March 21, 13

testing ๏ Darktime ๏ use existing network to loadtest ๏
(user complaints when it didn’t work...) ๏ Darkesttime ๏ load testing a single thread ๏ have knobs you can twiddle Thursday, March 21, 13

stats ๏ measure all the things! ๏ especially when the
numbers don’t line up ๏ is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ i used scales from greplin “metrics for py” Thursday, March 21, 13 scales! gauges and aggregation

lessons ๏ do hard work early ๏ defer work that
you might never need ๏ end-to-end acks are good, but expensive ๏ timeouts are not free ๏ greenlets are e ectively free ๏ pubsub is e ectively free Thursday, March 21, 13 - data processing and json formatting done once not 1000x times - gziping done once not 1000x times - defer setting up the work in the generator until as late as possible - ditched e2e acks from the fe, cost way too much

nginx lessons location / { proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_fo proxy_set_header Host
$http_host; proxy_redirect off; # this line is really important proxy_buffering off; if (!-f $request_filename) { proxy_pass http://app_server; break; } } http://gunicorn.org/deploy.html Thursday, March 21, 13 - only one really - that this line is really really important for streaming data. we currently compress here (in production), THIS IS NOT SCALEABLE

slide full o’ links ๏ Gevent (python coroutines and greenlets)
http://gevent.org/ ๏ Gunicorn (python pre-fork WSGI server) http://gunicorn.org/ ๏ Thoonk (redis queue) https://github.com/andyet/thoonk.py ๏ Sentry (log aggregation) https://github.com/dcramer/sentry ๏ Scales (in-app metrics) https://github.com/Greplin/scales code.disqus.com Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

special thanks ๏ the team at DISQUS ๏ especially our
dev-ops guys ๏ and je who had to review all my code Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

open questions ๏ best system conﬁg for thousands of rps?
๏ how to make the front end faster? ๏ something faster than pywsgi? ๏ FapWS? ๏ libevent -> libev? (i.e. gevent 1.0) ๏ dump wsgi for raw sockets? (last resort) ๏ best internal python pub/sub option? Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp

DISQUSsion? psst, we’re hiring disqus.com/jobs Thursday, March 21, 13 Tell
me your thoughts! @NorthIsUp

Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13
Tell me your thoughts! @NorthIsUp

Making DISQUS Realtime

Making DISQUS Realtime

More Decks by Adam

Other Decks in Technology

Featured

Transcript