asap ๏ for increased engagement ๏ and it looks awesome ๏ and we can sell (or trade) it Thursday, March 21, 13 We define this as ‘less than 10 seconds’ but my goal was less than one.
tested ‘dark’ on our existing network ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ <.2 seconds latency end to end Thursday, March 21, 13 - wtf: describe disqus 12 - i’ll re-visit on what dark means later on the testing slides - describe heavy tail distribution of popularity - end to end does NOT include the DISQUS app DEMO IT then “so how did we build this?”
glue” Gevent server New Posts ngnix /pub endpoint DISQUS embed clients http post DISQUS Thursday, March 21, 13 - HARDWARE - HA - 3 flows - new info -> pubsub - new subscriptions - pubsub -> subscriptions
post ngnix pub endpoint DISQUS embed clients other realtime stu nginx + push stream module Thursday, March 21, 13 - post save and post delete hooks - other realtime stuff + thoonk
of redis ๏ implemented as a DFA ๏ provides job semantics ๏ useful for end to end acking ๏ reliable job processing in distributed system ๏ did I mention it’s on top of redis? Thursday, March 21, 13 uses zset to store items so you have ranged queries (can’t do that on rabbit)
cleans & formats message ๏ this is the final format for end clients ๏ compress data now ๏ publish message to nginx and other firehoses ๏ forum:id, thread:id, user:id, post:id Formatter Publishers Thursday, March 21, 13 - django post save & post delete signals - thoonk was easy and fun! - end to end ack via thoonk. (not removed until fully published to nginx) - allows for multiple publishers, we publish to nginx, pubsubhubbub, commercial consumers.
#humblebrags as we ramp up tra c ๏ an example config can be found here: http://bit.ly/disqus-nginx-push-stream http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13
network... ๏ ~950K subscribers (peak single machine) ๏ 144 Mbits/second (per machine) ๏ CPU usage is still well under 50% http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 Have NOT tested SSL yet
all; push_stream_publisher admin; set $push_stream_channel_id $arg_channel; } location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { set $push_stream_channels_path $1:$2; push_stream_subscriber streaming; push_stream_content_type application/json; } } http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 and nginx does the rest
push_stream_channels_statistics; set $push_stream_channel_id $arg_channel; } http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 actually used this to build a realtime stream of popular threads on disqus
var resp = self.xhr.responseText; var advance = 0; var rows; // If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return; // Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n'); _.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance; } Thursday, March 21, 13 because on a busy thread this matters, 99% of the time, doesn’t matter (IGN E3) - peak post rate ~40 msg/sec - peak delivery ~164K msg/sec
numbers don’t line up ๏ measuring is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ Sentry for measuring exceptions Thursday, March 21, 13 so when you have a talk you #humblebrag like “peak delivery rate” and stuff scales! gauges and aggregation
are good, but expensive ๏ redis/nginx pubsub is e ectively free Thursday, March 21, 13 - data processing and json formatting done once not 1000x times - gziping done once not 1000x times - defer setting up the work in the generator until as late as possible - ditched e2e acks from the fe, cost way too much
who had to review all my code ๏ and especially our dev-ops guys ๏ like john watson a.k.a. @wizputer a.k.a the one who made me rewrite this talk psst, we’re hiring disqus.com/jobs Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp