Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Disqus does "it" when "it" isn't Django

Adam
July 15, 2013

How Disqus does "it" when "it" isn't Django

Over the past few years, Disqus has become one of the biggest Django apps in existence, crossing over a billion unique visitors a month. But sometimes Django isn't the right tool for the job.

Join Disqus engineer Adam Hitchcock to learn how nginx modules and Lua can replace Python services, about the infrastructure that launched realtime and ad services, as well as about some of the failures they've encountered along the way.

** For more great Python videos and training resources, as well as Adam's slides from this talk, head to http://mrkn.co/0h7ey

Adam

July 15, 2013
Tweet

More Decks by Adam

Other Decks in Technology

Transcript

  1. white smoke francis announced Boston Marathon Bombs Detonate MIT Officer

    Shot FBI Releases Photo Pursuit and Firefight Begin Ricin Letters 2nd Suspect in Custody Waco Fertilizer Explosion
  2. why python? ๏ because it is fast… ๏ to develop

    in ๏ good community ๏ lots of libraries ๏ so active we can’t keep up with it ๏ is anybody using 3.3 yet? ๏ Disqus has a really good dev loop for it
  3. how does our dev loop work? ๏ diff code (with

    phabricator) ๏ get that reviewed ๏ review with phabricator ๏ ci done by jenkins ๏ land it on master (git push) ๏ wait for the daily deploy (ops team) ๏ and we are moving to auto deploy (we’ll revisit this one)
  4. break it down ๏ mostly Django ๏ two years ago

    95% Django* ๏ today ~70% Django* ๏ what is in that growing gap? ๏ *totally made up numbers
  5. disqus-web (kinda monolithic) ๏ Django + celery on postgres +

    rabbit ๏ we roll up commits for a deploy ๏ risk of revert that isn’t your fault ๏ you bring down the whole thing if you forgot to remove that pdb.settrace() ๏ high scrutiny in code review (see above) ๏ high volume of code review (code review RTT can be a full day) ๏ lots of legacy code to work around (and not break)
  6. problems that don’t fit ๏ high concurrency ๏ isolation ๏

    feature ๏ failure ๏ speed (cpu cycles) ๏ speed (dev iteration cycle) ๏ fun
  7. for fun and concurrency! ๏ nginx + nginx-modules ๏ https://speakerdeck.com/northisup/

    scaling-realtime-at-disqus?slide=39 ๏ nginx + lua ๏ https://github.com/NorthIsUp/nginx- oauth-on-dotcloud/blob/master/nginx/ access.persona.lua.in ๏ go ๏ http://blog.disqus.com/post/ 51155103801/trying-out-this-go-thing
  8. embedly in nginx + lua local cjson = require "resty.http"

    local cjson = require "cjson" local url = "https://api.embed.ly/1/oembed?key=" .. ngx.var.api_key .. "&url=" .. ngx.var.url local hc = http:new() local ok, code, headers, status, body = hc:request { url = url, method = "GET" } if code ~= 200 then ngx.exit(code) end local thumbnail = cjson.decode(body).thumbnail_url ngx.var.thumbnail = thumbnail Text
  9. isolation and iteration speed ๏ failure should be isolated to

    a small service ๏ successes should be allowed to occur quickly ๏ python is still great for these use cases
  10. when we needed it right now! ๏ if __name__ ==

    ‘__main__’: ๏ from wsgiref.simple_server import make_server ๏ from xmlrpclib import ServerProxy
  11. lessons ๏ consistency is good! ๏ protects you from somebody

    quitting or getting hit by a bus ๏ anybody can just pick it up and run ๏ simplicity is good ๏ modularity is good ๏ the ability to borrow/combine features from other projects ๏ but copypasta is bad ๏ and not bugging ops for a deploy is the best
  12. disqus-service ๏ the goal is consistent/free access to… ๏ config

    ๏ switches ๏ logging ๏ stats ๏ other systems ๏ convention over configuration ๏ but allow for configuration ๏ should be easy to run
  13. but you want to do more! ๏ class Service: ๏

    @handler ๏ @on_message ๏ @pre_config ๏ @post_config ๏ @pre_setup ๏ @post_setup ๏ @pre_update_config ๏ @post_update_config
  14. service lifecycle ๏ config ๏ @pre_config ๏ self.config() ๏ @post_config

    ๏ setup ๏ @pre_setup ๏ self.setup() ๏ @post_setup ๏ run ๏ @handler
  15. hello-v2.0.py from disqus.service.application import (Service, handler, post_config) def World(Service): @post_config

    def setup_redis(self) self.redis = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def world(self): print self.redis.get("hello world") world = World()
  16. helper mixins ๏ FlaskService ๏ the service instance is also

    a wsgi app ๏ QueueService ๏ lets services talk over many queue types ๏ GeventService ๏ concurrency helpers ๏ RedisService (pre-config redis) ๏ KafkaService (kind of like a queue) ๏ DjangoORMService (db access via django)
  17. real life Disqus code! from IPython.frontend.terminal.embed import InteractiveShellEmbed from disqus.service.application

    import handler from tempest.services.mixins.data import RedisMixin import tempest class Shell(RedisMixin): @handler def shell(self, *args, **kwargs): """ runs an ipython shell loaded in the tempest module """ ipshell = InteractiveShellEmbed() ipshell(module=tempest)
  18. The ‘tempest’ ad server django nginx + uwsgi + @service

    Not involved! “the internet” redis @service
  19. The ‘tempest’ ad server DISQUS Not involved! “the internet” redis

    warmer pre-filter scorer post-filter pusher webservice (flask)
  20. The ‘tempest’ ad server ๏ warmer - ‘warm’ up redis

    ๏ fetches a lot of data from slow places ๏ pre-filter - remove bad data ๏ scorer - score ads ๏ post-filter - remove more bad data ๏ stasher - put it in redis
  21. scaling services ๏ You don’t know which part will need

    more... ๏ CPU ๏ I/O ๏ GPU ๏ sockets ๏ bits or bytes ๏ etc. ๏ so keep it simple stupid
  22. looks kinda like this def Tempest(Service): @post_config def setup_redis(self): self.redis

    = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def do_all_of_it(self): self.warm_cache() self.pre_filter() self.score() self.post_filter() self.push_to_redis() $ toil run tempest.tempest
  23. after running it we found that ๏ warmer - really,

    really, slow (10 min/task) ๏ pre-filter - super fast ๏ scorer - kinda ok, (several seconds/task) ๏ post-filter - super fast ๏ pusher - super fast
  24. The ‘tempest’ ad server 2.0 DISQUS Not involved! “the internet”

    redis post-filter pusher webservice warmer pre-filter scorer
  25. looks kinda like this def RedisService(Service): @post_config def setup_redis(self): self.redis

    = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT ) def Tempest(RedisService): @handler def do_it(self): self.warm_cache() ads = self.pre_filter() ads = self.score(ads) ads = self.post_filter(ads) self.push_to_redis(ads)
  26. looks kinda like this def Warmer(RedisService): @handler def warm_ads(self): self.warm_cache()

    self.pre_filter() def Scorer(QueueService): @handler def score_ads(self): ads = self.score() self.q.put(ads) def Pusher(QueueService, RedisService): @on_message def handle_ads(self, ads): ads = self.post_filter(ads) self.push_to_redis(ads)
  27. What actually runs $ toil run tempest.warmer # run 16

    procs $ toil run tempest.scorer # run 2 procs $ toil run tempest.pusher # run 2 procs
  28. start simple, then iterate ๏ services are good for when

    you need to break out work ๏ but don’t prematurely break out work ๏ because you are probably wrong ๏ measure first ๏ make decisions on data
  29. stolen from exam package ๏ written by @nfluxx for Disqus

    ๏ literally the same code but for tests ๏ replace setUp with @before ๏ makes test fixtures easy with @fixture ๏ check out exam/cases.py, line 35 ๏ github.com/Fluxx/exam
  30. how the decorators work class base(object): """ Base for the

    decorators. Allows a decorator to access the callable. """ def __init__(self, *things): self.init_callables = things def __call__(self, instance): return self.init_callables[0](instance) class post_config(base): """ Runs after the config phase. """ pass
  31. how the decorators work class BaseService(object): def __attrs_of_type(self, kind): for

    base in reversed(inspect.getmro(type(self))): for attr, class_value in vars(base).items(): resolved_value = getattr(type(self), attr, False) if not isinstance(resolved_value, kind): continue elif class_value is not resolved_value: continue else: yield attr, resolved_value def __run_hooks(self, hook): return [value(self) for _, value in self.__attrs_of_type(hook)] # ... snip ...
  32. how the decorators work class BaseService(object): # ... snip ...

    def __call__(self): results = self.__run_hooks(handler) @contextmanager def around_config(self): self.__run_hooks(pre_config) yield self.__run_hooks(post_config)
  33. data pipelines class Pipeline(object): def parse(self, data): raise NotImplemented('No ParserMixin

    used') def compute(self, data, parsed_data): raise NotImplemented('No ComputeMixin used') def publish(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used') def pipe_data(self, data): parsed_data = self.parse(data) computed_data = self.compute(data, parsed_data) return self.publish(data, parsed_data, computed_data)
  34. example mixins class JSONParserMixin(Pipeline): def parse(self, data): return json.loads(data) class

    SuperSecureEncryptDataMixin(Pipeline): def compute(self, data, parsed_data): return parsed_data.encode('rot13') class AnnomizeDataMixin(Pipeline): def compute(self, data, parsed_data): return {} class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)
  35. final pipeline class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass class JSONSecureHTTPPipeline(

    JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass
  36. open source as output ๏ we have lots of open

    source stuff ๏ sentry for errors ๏ nydus (for redis connections) ๏ django-mailviews ๏ gargoyle (and v2.0 renamed to gutter) ๏ and more! https://github.com/disqus/
  37. what about disqus-service? ๏ not open source, yet ๏ but

    I want your feedback! ๏ come talk to me about how ๏ you solved the same issue ๏ how my solution is stupid ๏ how my solution is awesome ๏ would you use actually it?
  38. What did I just talk about? ๏ lots of python

    at Disqus! ๏ but not all of it ๏ but most of it ๏ because sometimes other tools are better ๏ services are awesome ๏ cool decorator hacking ๏ mixins make testing super easy
  39. I don’t have all the answers... ๏ How do you

    run services? ๏ RPC vs HTTP vs Queues (I like queues) ๏ what do you use for highly concurrent systems? ๏ do you lint for `print`, `pdb.settrace()`, or `import debug`? ๏ if yes, can I have it? ๏ dev != prod, or does it? ๏ if you run uwsgi + nginx in prod ๏ why dev with ./manage.py runserver? @NorthIsUp