Slide 1

Slide 1 text

Adam Hitchcock @NorthIsUp Scaling Realtime at DISQUS Sunday, 17 March, 13

Slide 2

Slide 2 text

Sunday, 17 March, 13

Slide 3

Slide 3 text

Adam Hitchcock @NorthIsUp Scaling Realtime at DISQUS Sunday, 17 March, 13

Slide 4

Slide 4 text

we’re hiring disqus.com/jobs If this is interesting to you... Sunday, 17 March, 13

Slide 5

Slide 5 text

what is DISQUS? Sunday, 17 March, 13

Slide 6

Slide 6 text

Sunday, 17 March, 13

Slide 7

Slide 7 text

why do realtime? ๏ getting new data to the user asap ๏ for increased engagement ๏ and it looks awesome ๏ and we can sell (or trade) it Sunday, 17 March, 13

Slide 8

Slide 8 text

http://github.com/NorthIsUp/orbital2 http://map.labs.disqus.com Sunday, 17 March, 13

Slide 9

Slide 9 text

DISQUS sees a lot of traffic Google Analytics: Feb 2013 - March 2012 Sunday, 17 March, 13

Slide 10

Slide 10 text

realertime ๏ currently active on all DISQUS sites ๏ tested ‘dark’ on our existing network ๏ during testing: ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ <.2 seconds latency end to end Sunday, 17 March, 13

Slide 11

Slide 11 text

so, how did we do it? Sunday, 17 March, 13

Slide 12

Slide 12 text

Node.js and MongoDB! Sunday, 17 March, 13

Slide 13

Slide 13 text

Node.js and MongoDB! Sunday, 17 March, 13

Slide 14

Slide 14 text

This is PyCon. We used Python. Sunday, 17 March, 13

Slide 15

Slide 15 text

and some other Technology You Know™ Sunday, 17 March, 13

Slide 16

Slide 16 text

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday, 17 March, 13

Slide 17

Slide 17 text

architecture overview Sunday, 17 March, 13

Slide 18

Slide 18 text

old-june memcache New Posts memcache DISQUS embed clients DISQUS poll memcache ever 5 seconds Sunday, 17 March, 13

Slide 19

Slide 19 text

june-july redis pub/sub New Posts redis pub/sub DISQUS embed clients DISQUS HA Proxy Flask FE cluster Sunday, 17 March, 13

Slide 20

Slide 20 text

HA Proxy july-october Flask FE cluster redis queue “python glue” Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server Sunday, 17 March, 13

Slide 21

Slide 21 text

HA Proxy august-october Flask FE cluster redis queue “python glue” Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server 2 14 BIG 6 servers 5 servers Sunday, 17 March, 13

Slide 22

Slide 22 text

HA Proxy august-october Flask FE cluster redis queue “python glue” Gevent server New Posts redis pub/sub DISQUS embed clients redis pub/sub DISQUS “python glue” Gevent server 2 6 servers 5 servers 2 for 14 BIG lots of servers, we can do better Sunday, 17 March, 13

Slide 23

Slide 23 text

“python glue” Gevent server october-now nginx + push stream module redis queue New Posts ngnix pub endpoint DISQUS embed clients http post DISQUS Sunday, 17 March, 13

Slide 24

Slide 24 text

“python glue” Gevent server october-now nginx + push stream module redis queue New Posts ngnix pub endpoint DISQUS embed clients http post DISQUS 2 5 Why still 5 for this? Network memory restriction, we can’t fix this without kernel hacking, tweaking, etc. (if you know how, tell us, then apply for a job, then fix it for us) Sunday, 17 March, 13

Slide 25

Slide 25 text

october-now django Formatter Publishers thoonk queue http post ngnix pub endpoint DISQUS embed clients other realtime stuff nginx + push stream module New Posts Sunday, 17 March, 13

Slide 26

Slide 26 text

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday, 17 March, 13

Slide 27

Slide 27 text

the thoonk queue ๏ django post_save and post_delete hooks ๏ thoonk is a queue on top of redis ๏ implemented as a DFA ๏ provides job semantics ๏ useful for end to end acking ๏ reliable job processing in distributed system ๏ did I mention it’s on top of redis? ๏ uses zset to store items == ranged queries Sunday, 17 March, 13

Slide 28

Slide 28 text

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday, 17 March, 13

Slide 29

Slide 29 text

the python glue ๏ listens to a thoonk queue ๏ cleans & formats message ๏ this is the final format for end clients ๏ compress data now ๏ publish message to nginx and other firehoses ๏ forum:id, thread:id, user:id, post:id Formatter Publishers Sunday, 17 March, 13

Slide 30

Slide 30 text

gevent is nice # the code is too big to show here, so just import it # http://bitly.com/geventspawn from realertime.lib.spawn import Watchdog from realertime.lib.spawn import TimeSensitiveBackoff Sunday, 17 March, 13

Slide 31

Slide 31 text

data pipelines class Pipeline(object): def parse_data(self, data): raise NotImplemented('No ParserMixin used') def compute_data(self, data, parsed_data): raise NotImplemented('No ComputeMixin used') def publish_data(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used') def handle(self, data): parsed_data = self.parse_data(data) computed_data = self.compute_data(data, parsed_data) return self.publish_data(data, parsed_data, computed_data) Sunday, 17 March, 13

Slide 32

Slide 32 text

Example Mixins class JSONParserMixin(Pipeline): def parse_data(self, data): return json.loads(data) class AnnomizeDataMixin(Pipeline): def parse_data(self, data, parsed_data): return {} class SuperSecureEncryptDataMixin(Pipeline): def parse_data(self, data, parsed_data): return parsed_data.encode('rot13') class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data) Sunday, 17 March, 13

Slide 33

Slide 33 text

Finished Pipeline class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass class JSONSecureHTTPPipeline( JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass Sunday, 17 March, 13

Slide 34

Slide 34 text

real live DISQUS code class FEOrbitalNginxMultiplexer( SchemaTransformerMixin, JSONFormatterMixin, SelfChannelsMixin, HTTPPublisherMixin): def __init__(self, domains, api_version=1): schema_namespace = 'orbital' self.channels = ('orbital', ) super(FEOrbitalNginxMultiplexer, self).__init__(domains=domain class FEPublicAckingMultiplexer( PublicTransformerMixin, JSONFormatterMixin, FEChannelsMixin, ThoonkQueuePubSubPublisherMixin): def __init__(self, domains, api_version): schema_namespace = 'general' super(FEPublicAckingMultiplexer, self).__init__(domains=domain Sunday, 17 March, 13

Slide 35

Slide 35 text

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday, 17 March, 13

Slide 36

Slide 36 text

nginx push stream ๏ follow John Watson (@wizputer) for updated #humblebrags as we ramp up traffic ๏ an example config can be found here: http://bit.ly/disqus-nginx-push-stream http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13

Slide 37

Slide 37 text

nginx push stream ๏ Replaced webservers and Redis Pub/Sub ๏ But starting with Pub/Sub was important for us ๏ Encouraged us to over publish on keys Sunday, 17 March, 13

Slide 38

Slide 38 text

nginx push stream ๏ Turned on for 70% of our network... ๏ ~950K subscribers (peak single machine) ๏ peak 40 MBytes/second (per machine) ๏ CPU usage is still well under 15% ๏ 99.845% active writes (the socket is written to often enough to come up as ACTIVE) http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13

Slide 39

Slide 39 text

config push stream location = /pub { allow 127.0.0.1; deny all; push_stream_publisher admin; set $push_stream_channel_id $arg_channel; } location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { # Url encoding things? $1%3A2$2 set $push_stream_channels_path $1:$2; push_stream_subscriber streaming; push_stream_content_type application/json; } } http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13

Slide 40

Slide 40 text

examples # Subs curl -s 'localhost/sub/forum/cnn' curl -s 'localhost/sub/thread/907824578' curl -s 'localhost/sub/user/northisup' # Pubs curl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}' curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}' curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}' http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13

Slide 41

Slide 41 text

measure nginx location = /push-stream-status { allow 127.0.0.1; deny all; push_stream_channels_statistics; set $push_stream_channel_id $arg_channel; } http://wiki.nginx.org/HttpPushStreamModule Sunday, 17 March, 13

Slide 42

Slide 42 text

thoonk redis queue some python glue nginx push stream and long(er) polling Sunday, 17 March, 13

Slide 43

Slide 43 text

long(er) polling onProgress: function () { var self = this; var resp = self.xhr.responseText; var advance = 0; var rows; // If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return; // Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n'); _.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance; } Sunday, 17 March, 13

Slide 44

Slide 44 text

Soon... EventSource // Currently EventSource has CORS issues ev = EventSource(dat_url); ev.addEventListener("Post", handlePostEvent); Sunday, 17 March, 13

Slide 45

Slide 45 text

test, measure, repeat Sunday, 17 March, 13

Slide 46

Slide 46 text

test ๏ Darktime ๏ use existing network to load test ๏ (user complaints when it didn’t work...) ๏ Darkesttime ๏ load testing a single thread ๏ have knobs you can twiddle Sunday, 17 March, 13

Slide 47

Slide 47 text

measure ๏ measure all the things! ๏ especially when the numbers don’t line up ๏ measuring is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ Sentry for measuring exceptions Sunday, 17 March, 13

Slide 48

Slide 48 text

pretty graphs Sunday, 17 March, 13

Slide 49

Slide 49 text

how does it really scale? POPE white smoke francis announced Sunday, 17 March, 13

Slide 50

Slide 50 text

maths Sunday, 17 March, 13

Slide 51

Slide 51 text

it’s been a busy few weeks Sunday, 17 March, 13

Slide 52

Slide 52 text

wha? ๏ People do weird stuff with your stuff ๏ turned off this server in Oct 2012 ๏ Still getting 100 req/sec Sunday, 17 March, 13

Slide 53

Slide 53 text

lessons ๏ do hard (computation) work early ๏ end-to-end acks are good, but expensive ๏ redis/nginx pubsub is effectively free Sunday, 17 March, 13

Slide 54

Slide 54 text

If this was interesting to you... psst, we’re hiring disqus.com/jobs Sunday, 17 March, 13

Slide 55

Slide 55 text

special thanks ๏ the team at DISQUS ๏ like jeff a.k.a. @nfluxx who had to review all my code ๏ and especially our dev-ops guys ๏ like john watson a.k.a. @wizputer who found the nginx-push-stream module psst, we’re hiring disqus.com/jobs Sunday, 17 March, 13

Slide 56

Slide 56 text

slide full o’ links ๏ Nginx push stream module http://wiki.nginx.org/HttpPushStreamModule ๏ Thoonk (redis queue) http://github.com/andyet/thoonk.py ๏ Sentry (distributed traceback aggregation) http://github.com/dcramer/sentry ๏ Gevent (python coroutines and greenlets) http://gevent.org/ ๏ Scales (in-app metrics) http://github.com/Greplin/scales code.disqus.com Sunday, 17 March, 13

Slide 57

Slide 57 text

Come find me here! PyCon 2013 Santa Clara Convention Center Hall A-B Santa Clara, CA 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 20’ 20’ 8’ 8’ LUNCH & BREAKS 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 8’ 20’ 20’ 10’ 20’ 19’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x20’ 10’x15’ 10’x15’ 10’x20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 8’x20’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ 10’x15’ Sunday, 17 March, 13

Slide 58

Slide 58 text

we are still hiring psst, we’re hiring disqus.com/jobs Sunday, 17 March, 13

Slide 59

Slide 59 text

Questions I have ๏ What is the best kernel config for webscale concurrency. Nginx? ๏ I <3 gevent, but what if I want to pypy? ๏ Nginx + lua? Seems kind of awesome. ๏ Composing data pipelines: good or bad? ๏ I didn’t have time to mention: ๏ Kafka, what is it good for? ๏ Seriously, why not RabbitMQ? Sunday, 17 March, 13

Slide 60

Slide 60 text

Adam Hitchcock @NorthIsUp DISQUSsion? Sunday, 17 March, 13