Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Realtime at DISQUS by Adam Hitchcock

Scaling Realtime at DISQUS by Adam Hitchcock

PyCon 2013

March 17, 2013
Tweet

More Decks by PyCon 2013

Other Decks in Technology

Transcript

  1. Adam Hitchcock
    @NorthIsUp
    Scaling Realtime at DISQUS
    Sunday, 17 March, 13

    View full-size slide

  2. Sunday, 17 March, 13

    View full-size slide

  3. Adam Hitchcock
    @NorthIsUp
    Scaling Realtime at DISQUS
    Sunday, 17 March, 13

    View full-size slide

  4. we’re hiring
    disqus.com/jobs
    If this is
    interesting to you...
    Sunday, 17 March, 13

    View full-size slide

  5. what is DISQUS?
    Sunday, 17 March, 13

    View full-size slide

  6. Sunday, 17 March, 13

    View full-size slide

  7. why do realtime?
    ๏ getting new data to the user asap
    ๏ for increased engagement
    ๏ and it looks awesome
    ๏ and we can sell (or trade) it
    Sunday, 17 March, 13

    View full-size slide

  8. http://github.com/NorthIsUp/orbital2
    http://map.labs.disqus.com
    Sunday, 17 March, 13

    View full-size slide

  9. DISQUS sees a lot of traffic
    Google Analytics: Feb 2013 - March 2012
    Sunday, 17 March, 13

    View full-size slide

  10. realertime
    ๏ currently active on all DISQUS sites
    ๏ tested ‘dark’ on our existing network
    ๏ during testing:
    ๏ 1.5 million concurrently connected users
    ๏ 45 thousand new connections per second
    ๏ 165 thousand messages/second
    ๏ <.2 seconds latency end to end
    Sunday, 17 March, 13

    View full-size slide

  11. so, how did we do it?
    Sunday, 17 March, 13

    View full-size slide

  12. Node.js and MongoDB!
    Sunday, 17 March, 13

    View full-size slide

  13. Node.js and MongoDB!
    Sunday, 17 March, 13

    View full-size slide

  14. This is PyCon.
    We used Python.
    Sunday, 17 March, 13

    View full-size slide

  15. and some other
    Technology You Know™
    Sunday, 17 March, 13

    View full-size slide

  16. thoonk redis queue
    some python glue
    nginx push stream
    and long(er) polling
    Sunday, 17 March, 13

    View full-size slide

  17. architecture overview
    Sunday, 17 March, 13

    View full-size slide

  18. old-june
    memcache
    New Posts
    memcache
    DISQUS embed clients
    DISQUS
    poll memcache
    ever 5 seconds
    Sunday, 17 March, 13

    View full-size slide

  19. june-july
    redis pub/sub
    New Posts
    redis pub/sub
    DISQUS embed clients
    DISQUS
    HA Proxy
    Flask FE
    cluster
    Sunday, 17 March, 13

    View full-size slide

  20. HA Proxy
    july-october
    Flask FE
    cluster
    redis queue
    “python glue”
    Gevent server
    New Posts
    redis pub/sub
    DISQUS embed clients
    redis pub/sub
    DISQUS
    “python glue”
    Gevent server
    Sunday, 17 March, 13

    View full-size slide

  21. HA Proxy
    august-october
    Flask FE
    cluster
    redis queue
    “python glue”
    Gevent server
    New Posts
    redis pub/sub
    DISQUS embed clients
    redis pub/sub
    DISQUS
    “python glue”
    Gevent server
    2
    14 BIG
    6 servers
    5 servers
    Sunday, 17 March, 13

    View full-size slide

  22. HA Proxy
    august-october
    Flask FE
    cluster
    redis queue
    “python glue”
    Gevent server
    New Posts
    redis pub/sub
    DISQUS embed clients
    redis pub/sub
    DISQUS
    “python glue”
    Gevent server
    2
    6 servers
    5 servers
    2 for
    14 BIG
    lots of servers,
    we can do better
    Sunday, 17 March, 13

    View full-size slide

  23. “python glue”
    Gevent server
    october-now
    nginx
    +
    push stream
    module
    redis queue
    New Posts
    ngnix pub endpoint
    DISQUS embed clients
    http post
    DISQUS
    Sunday, 17 March, 13

    View full-size slide

  24. “python glue”
    Gevent server
    october-now
    nginx
    +
    push stream
    module
    redis queue
    New Posts
    ngnix pub endpoint
    DISQUS embed clients
    http post
    DISQUS
    2
    5
    Why still 5 for this?
    Network memory restriction, we
    can’t fix this without kernel
    hacking, tweaking, etc.
    (if you know how, tell us, then
    apply for a job, then fix it for us)
    Sunday, 17 March, 13

    View full-size slide

  25. october-now
    django
    Formatter
    Publishers
    thoonk queue
    http post
    ngnix pub endpoint
    DISQUS embed clients
    other realtime
    stuff
    nginx
    +
    push
    stream
    module
    New Posts
    Sunday, 17 March, 13

    View full-size slide

  26. thoonk redis queue
    some python glue
    nginx push stream
    and long(er) polling
    Sunday, 17 March, 13

    View full-size slide

  27. the thoonk queue
    ๏ django post_save and post_delete hooks
    ๏ thoonk is a queue on top of redis
    ๏ implemented as a DFA
    ๏ provides job semantics
    ๏ useful for end to end acking
    ๏ reliable job processing in distributed system
    ๏ did I mention it’s on top of redis?
    ๏ uses zset to store items == ranged queries
    Sunday, 17 March, 13

    View full-size slide

  28. thoonk redis queue
    some python glue
    nginx push stream
    and long(er) polling
    Sunday, 17 March, 13

    View full-size slide

  29. the python glue
    ๏ listens to a thoonk queue
    ๏ cleans & formats message
    ๏ this is the final format for
    end clients
    ๏ compress data now
    ๏ publish message to nginx and
    other firehoses
    ๏ forum:id, thread:id, user:id,
    post:id
    Formatter
    Publishers
    Sunday, 17 March, 13

    View full-size slide

  30. gevent is nice
    # the code is too big to show here, so just import it
    # http://bitly.com/geventspawn
    from realertime.lib.spawn import Watchdog
    from realertime.lib.spawn import TimeSensitiveBackoff
    Sunday, 17 March, 13

    View full-size slide

  31. data pipelines
    class Pipeline(object):
    def parse_data(self, data):
    raise NotImplemented('No ParserMixin used')
    def compute_data(self, data, parsed_data):
    raise NotImplemented('No ComputeMixin used')
    def publish_data(self, data, parsed_data, computed_data):
    raise NotImplemented('No PublisherMixin used')
    def handle(self, data):
    parsed_data = self.parse_data(data)
    computed_data = self.compute_data(data, parsed_data)
    return self.publish_data(data, parsed_data, computed_data)
    Sunday, 17 March, 13

    View full-size slide

  32. Example Mixins
    class JSONParserMixin(Pipeline):
    def parse_data(self, data):
    return json.loads(data)
    class AnnomizeDataMixin(Pipeline):
    def parse_data(self, data, parsed_data):
    return {}
    class SuperSecureEncryptDataMixin(Pipeline):
    def parse_data(self, data, parsed_data):
    return parsed_data.encode('rot13')
    class HTTPPublisher(Pipeline):
    def publish(self, data, parsed_data, computed_data):
    u = urllib2.urlopen(self.dat_url, computed_data)
    return u
    class FilePublisher(Pipeline):
    def publish(self, data, parsed_data, computed_data):
    with open(self.output, 'a') as f:
    f.write(computed_data)
    Sunday, 17 March, 13

    View full-size slide

  33. Finished Pipeline
    class JSONAnnonHTTPPipeline(
    JSONParserMixin,
    AnnomizeDataMixin,
    HTTPPublisherMixin):
    pass
    class JSONSecureHTTPPipeline(
    JSONParserMixin,
    SuperSecureEncyptionMixin,
    HTTPPublisherMixin):
    pass
    class JSONAnnonFilePipeline(
    JSONParserMixin,
    AnnomizeDataMixin,
    FilePublisherMixin):
    pass
    Sunday, 17 March, 13

    View full-size slide

  34. real live DISQUS code
    class FEOrbitalNginxMultiplexer(
    SchemaTransformerMixin,
    JSONFormatterMixin,
    SelfChannelsMixin,
    HTTPPublisherMixin):
    def __init__(self, domains, api_version=1):
    schema_namespace = 'orbital'
    self.channels = ('orbital', )
    super(FEOrbitalNginxMultiplexer, self).__init__(domains=domain
    class FEPublicAckingMultiplexer(
    PublicTransformerMixin,
    JSONFormatterMixin,
    FEChannelsMixin,
    ThoonkQueuePubSubPublisherMixin):
    def __init__(self, domains, api_version):
    schema_namespace = 'general'
    super(FEPublicAckingMultiplexer, self).__init__(domains=domain
    Sunday, 17 March, 13

    View full-size slide

  35. thoonk redis queue
    some python glue
    nginx push stream
    and long(er) polling
    Sunday, 17 March, 13

    View full-size slide

  36. nginx push stream
    ๏ follow John Watson (@wizputer) for updated
    #humblebrags as we ramp up traffic
    ๏ an example config can be found here:
    http://bit.ly/disqus-nginx-push-stream
    http://wiki.nginx.org/HttpPushStreamModule
    Sunday, 17 March, 13

    View full-size slide

  37. nginx push stream
    ๏ Replaced webservers and Redis Pub/Sub
    ๏ But starting with Pub/Sub was important for
    us
    ๏ Encouraged us to over publish on keys
    Sunday, 17 March, 13

    View full-size slide

  38. nginx push stream
    ๏ Turned on for 70% of our network...
    ๏ ~950K subscribers (peak single machine)
    ๏ peak 40 MBytes/second (per machine)
    ๏ CPU usage is still well under 15%
    ๏ 99.845% active writes (the socket is written
    to often enough to come up as ACTIVE)
    http://wiki.nginx.org/HttpPushStreamModule
    Sunday, 17 March, 13

    View full-size slide

  39. config push stream
    location = /pub {
    allow 127.0.0.1;
    deny all;
    push_stream_publisher admin;
    set $push_stream_channel_id $arg_channel;
    }
    location ^~ /sub/ {
    # to maintain api compatibility we need this
    location ~ /sub/(.*)/(.*)$ {
    # Url encoding things? $1%3A2$2
    set $push_stream_channels_path $1:$2;
    push_stream_subscriber streaming;
    push_stream_content_type application/json;
    }
    }
    http://wiki.nginx.org/HttpPushStreamModule
    Sunday, 17 March, 13

    View full-size slide

  40. examples
    # Subs
    curl -s 'localhost/sub/forum/cnn'
    curl -s 'localhost/sub/thread/907824578'
    curl -s 'localhost/sub/user/northisup'
    # Pubs
    curl -s -X POST 'localhost/pub?channel=forum:cnn' \
    -d '{"some sort": "of json data"}'
    curl -s -X POST 'localhost/pub?channel=thread:907824578' \
    -d '{"more": "json data"}'
    curl -s -X POST 'localhost/pub?channel=user:northisup' \
    -d '{"the idea": "I think you get it by now"}'
    http://wiki.nginx.org/HttpPushStreamModule
    Sunday, 17 March, 13

    View full-size slide

  41. measure nginx
    location = /push-stream-status {
    allow 127.0.0.1;
    deny all;
    push_stream_channels_statistics;
    set $push_stream_channel_id $arg_channel;
    }
    http://wiki.nginx.org/HttpPushStreamModule
    Sunday, 17 March, 13

    View full-size slide

  42. thoonk redis queue
    some python glue
    nginx push stream
    and long(er) polling
    Sunday, 17 March, 13

    View full-size slide

  43. long(er) polling
    onProgress: function () {
    var self = this;
    var resp = self.xhr.responseText;
    var advance = 0;
    var rows;
    // If server didn't push anything new, do nothing.
    if (!resp || self.len === resp.length)
    return;
    // Server returns JSON objects, one per line.
    rows = resp.slice(self.len).split('\n');
    _.each(rows, function (obj) {
    advance += (obj.length + 1);
    obj = JSON.parse(obj);
    self.trigger('progress', obj);
    });
    self.len += advance;
    }
    Sunday, 17 March, 13

    View full-size slide

  44. Soon... EventSource
    // Currently EventSource has CORS issues
    ev = EventSource(dat_url);
    ev.addEventListener("Post", handlePostEvent);
    Sunday, 17 March, 13

    View full-size slide

  45. test, measure, repeat
    Sunday, 17 March, 13

    View full-size slide

  46. test
    ๏ Darktime
    ๏ use existing network to load test
    ๏ (user complaints when it didn’t work...)
    ๏ Darkesttime
    ๏ load testing a single thread
    ๏ have knobs you can twiddle
    Sunday, 17 March, 13

    View full-size slide

  47. measure
    ๏ measure all the things!
    ๏ especially when the numbers don’t line up
    ๏ measuring is hard in distributed systems
    ๏ try to express things as +1 and -1 if you
    can
    ๏ Sentry for measuring exceptions
    Sunday, 17 March, 13

    View full-size slide

  48. pretty graphs
    Sunday, 17 March, 13

    View full-size slide

  49. how does it really scale?
    POPE
    white smoke
    francis announced
    Sunday, 17 March, 13

    View full-size slide

  50. maths
    Sunday, 17 March, 13

    View full-size slide

  51. it’s been a busy few weeks
    Sunday, 17 March, 13

    View full-size slide

  52. wha?
    ๏ People do weird stuff with your stuff
    ๏ turned off this server in Oct 2012
    ๏ Still getting 100 req/sec
    Sunday, 17 March, 13

    View full-size slide

  53. lessons
    ๏ do hard (computation) work early
    ๏ end-to-end acks are good, but expensive
    ๏ redis/nginx pubsub is effectively free
    Sunday, 17 March, 13

    View full-size slide

  54. If this was interesting to you...
    psst, we’re hiring
    disqus.com/jobs
    Sunday, 17 March, 13

    View full-size slide

  55. special thanks
    ๏ the team at DISQUS
    ๏ like jeff a.k.a. @nfluxx who had to review all
    my code
    ๏ and especially our dev-ops guys
    ๏ like john watson a.k.a. @wizputer who
    found the nginx-push-stream module
    psst, we’re hiring
    disqus.com/jobs
    Sunday, 17 March, 13

    View full-size slide

  56. slide full o’ links
    ๏ Nginx push stream module
    http://wiki.nginx.org/HttpPushStreamModule
    ๏ Thoonk (redis queue)
    http://github.com/andyet/thoonk.py
    ๏ Sentry (distributed traceback aggregation)
    http://github.com/dcramer/sentry
    ๏ Gevent (python coroutines and greenlets)
    http://gevent.org/
    ๏ Scales (in-app metrics)
    http://github.com/Greplin/scales
    code.disqus.com
    Sunday, 17 March, 13

    View full-size slide

  57. Come find me here!
    PyCon 2013
    Santa Clara Convention Center
    Hall A-B
    Santa Clara, CA
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    20’ 20’
    8’ 8’
    LUNCH
    &
    BREAKS
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    8’
    20’ 20’
    10’
    20’
    19’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x20’
    10’x15’
    10’x15’
    10’x20’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    8’x20’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    10’x15’
    Sunday, 17 March, 13

    View full-size slide

  58. we are still hiring
    psst, we’re hiring
    disqus.com/jobs
    Sunday, 17 March, 13

    View full-size slide

  59. Questions I have
    ๏ What is the best kernel config for webscale
    concurrency. Nginx?
    ๏ I <3 gevent, but what if I want to pypy?
    ๏ Nginx + lua? Seems kind of awesome.
    ๏ Composing data pipelines: good or bad?
    ๏ I didn’t have time to mention:
    ๏ Kafka, what is it good for?
    ๏ Seriously, why not RabbitMQ?
    Sunday, 17 March, 13

    View full-size slide

  60. Adam Hitchcock
    @NorthIsUp
    DISQUSsion?
    Sunday, 17 March, 13

    View full-size slide