Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Realtime at DISQUS by Adam Hitchcock

Scaling Realtime at DISQUS by Adam Hitchcock

PyCon 2013

March 17, 2013
Tweet

More Decks by PyCon 2013

Other Decks in Technology

Transcript

  1. why do realtime? ๏ getting new data to the user

    asap ๏ for increased engagement ๏ and it looks awesome ๏ and we can sell (or trade) it Sunday, 17 March, 13
  2. realertime ๏ currently active on all DISQUS sites ๏ tested

    ‘dark’ on our existing network ๏ during testing: ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ <.2 seconds latency end to end Sunday, 17 March, 13
  3. thoonk redis queue some python glue nginx push stream and

    long(er) polling Sunday, 17 March, 13
  4. old-june memcache New Posts memcache DISQUS embed clients DISQUS poll

    memcache ever 5 seconds Sunday, 17 March, 13
  5. june-july redis pub/sub New Posts redis pub/sub DISQUS embed clients

    DISQUS HA Proxy Flask FE cluster Sunday, 17 March, 13
  6. HA Proxy july-october Flask FE cluster redis queue “python glue”

    Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server Sunday, 17 March, 13
  7. HA Proxy august-october Flask FE cluster redis queue “python glue”

    Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server 2 14 BIG 6 servers 5 servers Sunday, 17 March, 13
  8. HA Proxy august-october Flask FE cluster redis queue “python glue”

    Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server 2 6 servers 5 servers 2 for 14 BIG lots of servers, we can do better Sunday, 17 March, 13
  9. “python glue” Gevent server october-now nginx + push stream module

    redis queue New Posts ngnix pub endpoint DISQUS embed clients http post DISQUS Sunday, 17 March, 13
  10. “python glue” Gevent server october-now nginx + push stream module

    redis queue New Posts ngnix pub endpoint DISQUS embed clients http post DISQUS 2 5 Why still 5 for this? Network memory restriction, we can’t fix this without kernel hacking, tweaking, etc. (if you know how, tell us, then apply for a job, then fix it for us) Sunday, 17 March, 13
  11. october-now django Formatter Publishers thoonk queue http post ngnix pub

    endpoint DISQUS embed clients other realtime stuff nginx + push stream module New Posts Sunday, 17 March, 13
  12. thoonk redis queue some python glue nginx push stream and

    long(er) polling Sunday, 17 March, 13
  13. the thoonk queue ๏ django post_save and post_delete hooks ๏

    thoonk is a queue on top of redis ๏ implemented as a DFA ๏ provides job semantics ๏ useful for end to end acking ๏ reliable job processing in distributed system ๏ did I mention it’s on top of redis? ๏ uses zset to store items == ranged queries Sunday, 17 March, 13
  14. thoonk redis queue some python glue nginx push stream and

    long(er) polling Sunday, 17 March, 13
  15. the python glue ๏ listens to a thoonk queue ๏

    cleans & formats message ๏ this is the final format for end clients ๏ compress data now ๏ publish message to nginx and other firehoses ๏ forum:id, thread:id, user:id, post:id Formatter Publishers Sunday, 17 March, 13
  16. gevent is nice # the code is too big to

    show here, so just import it # http://bitly.com/geventspawn from realertime.lib.spawn import Watchdog from realertime.lib.spawn import TimeSensitiveBackoff Sunday, 17 March, 13
  17. data pipelines class Pipeline(object): def parse_data(self, data): raise NotImplemented('No ParserMixin

    used') def compute_data(self, data, parsed_data): raise NotImplemented('No ComputeMixin used') def publish_data(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used') def handle(self, data): parsed_data = self.parse_data(data) computed_data = self.compute_data(data, parsed_data) return self.publish_data(data, parsed_data, computed_data) Sunday, 17 March, 13
  18. Example Mixins class JSONParserMixin(Pipeline): def parse_data(self, data): return json.loads(data) class

    AnnomizeDataMixin(Pipeline): def parse_data(self, data, parsed_data): return {} class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self, data, parsed_data): return parsed_data.encode('rot13') class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data) Sunday, 17 March, 13
  19. Finished Pipeline class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass class JSONSecureHTTPPipeline(

    JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass Sunday, 17 March, 13
  20. real live DISQUS code class FEOrbitalNginxMultiplexer( SchemaTransformerMixin, JSONFormatterMixin, SelfChannelsMixin, HTTPPublisherMixin):

    def __init__(self, domains, api_version=1): schema_namespace = 'orbital' self.channels = ('orbital', ) super(FEOrbitalNginxMultiplexer, self).__init__(domains=domain class FEPublicAckingMultiplexer( PublicTransformerMixin, JSONFormatterMixin, FEChannelsMixin, ThoonkQueuePubSubPublisherMixin): def __init__(self, domains, api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer, self).__init__(domains=domain Sunday, 17 March, 13
  21. thoonk redis queue some python glue nginx push stream and

    long(er) polling Sunday, 17 March, 13
  22. nginx push stream ๏ follow John Watson (@wizputer) for updated

    #humblebrags as we ramp up traffic ๏ an example config can be found here: http://bit.ly/disqus-nginx-push-stream http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13
  23. nginx push stream ๏ Replaced webservers and Redis Pub/Sub ๏

    But starting with Pub/Sub was important for us ๏ Encouraged us to over publish on keys Sunday, 17 March, 13
  24. nginx push stream ๏ Turned on for 70% of our

    network... ๏ ~950K subscribers (peak single machine) ๏ peak 40 MBytes/second (per machine) ๏ CPU usage is still well under 15% ๏ 99.845% active writes (the socket is written to often enough to come up as ACTIVE) http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13
  25. config push stream location = /pub { allow 127.0.0.1; deny

    all; push_stream_publisher admin; set $push_stream_channel_id $arg_channel; } location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2; push_stream_subscriber streaming; push_stream_content_type application/json; } } http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13
  26. examples # Subs curl -s 'localhost/sub/forum/cnn' curl -s 'localhost/sub/thread/907824578' curl

    -s 'localhost/sub/user/northisup' # Pubs curl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}' curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}' curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}' http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13
  27. measure nginx location = /push-stream-status { allow 127.0.0.1; deny all;

    push_stream_channels_statistics; set $push_stream_channel_id $arg_channel; } http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13
  28. thoonk redis queue some python glue nginx push stream and

    long(er) polling Sunday, 17 March, 13
  29. long(er) polling onProgress: function () { var self = this;

    var resp = self.xhr.responseText; var advance = 0; var rows; // If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return; // Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n'); _.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance; } Sunday, 17 March, 13
  30. Soon... EventSource // Currently EventSource has CORS issues ev =

    EventSource(dat_url); ev.addEventListener("Post", handlePostEvent); Sunday, 17 March, 13
  31. test ๏ Darktime ๏ use existing network to load test

    ๏ (user complaints when it didn’t work...) ๏ Darkesttime ๏ load testing a single thread ๏ have knobs you can twiddle Sunday, 17 March, 13
  32. measure ๏ measure all the things! ๏ especially when the

    numbers don’t line up ๏ measuring is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ Sentry for measuring exceptions Sunday, 17 March, 13
  33. wha? ๏ People do weird stuff with your stuff ๏

    turned off this server in Oct 2012 ๏ Still getting 100 req/sec Sunday, 17 March, 13
  34. lessons ๏ do hard (computation) work early ๏ end-to-end acks

    are good, but expensive ๏ redis/nginx pubsub is effectively free Sunday, 17 March, 13
  35. special thanks ๏ the team at DISQUS ๏ like jeff

    a.k.a. @nfluxx who had to review all my code ๏ and especially our dev-ops guys ๏ like john watson a.k.a. @wizputer who found the nginx-push-stream module psst, we’re hiring disqus.com/jobs Sunday, 17 March, 13
  36. slide full o’ links ๏ Nginx push stream module http://wiki.nginx.org/HttpPushStreamModule

    ๏ Thoonk (redis queue) http://github.com/andyet/thoonk.py ๏ Sentry (distributed traceback aggregation) http://github.com/dcramer/sentry ๏ Gevent (python coroutines and greenlets) http://gevent.org/ ๏ Scales (in-app metrics) http://github.com/Greplin/scales code.disqus.com Sunday, 17 March, 13
  37. Come find me here! PyCon 2013 Santa Clara Convention Center

    Hall A-B Santa Clara, CA 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 20’ 20’ 8’ 8’ LUNCH & BREAKS 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 8’ 20’ 20’ 10’ 20’ 19’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x20’ 10’x15’ 10’x15’ 10’x20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 8’x20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ Sunday, 17 March, 13
  38. Questions I have ๏ What is the best kernel config

    for webscale concurrency. Nginx? ๏ I <3 gevent, but what if I want to pypy? ๏ Nginx + lua? Seems kind of awesome. ๏ Composing data pipelines: good or bad? ๏ I didn’t have time to mention: ๏ Kafka, what is it good for? ๏ Seriously, why not RabbitMQ? Sunday, 17 March, 13