Save 37% off PRO during our Black Friday Sale! »

How Disqus does "it" when "it" isn't Django

887048987be67f10649a9dfacced6606?s=47 Adam
July 15, 2013

How Disqus does "it" when "it" isn't Django

Over the past few years, Disqus has become one of the biggest Django apps in existence, crossing over a billion unique visitors a month. But sometimes Django isn't the right tool for the job.

Join Disqus engineer Adam Hitchcock to learn how nginx modules and Lua can replace Python services, about the infrastructure that launched realtime and ad services, as well as about some of the failures they've encountered along the way.

** For more great Python videos and training resources, as well as Adam's slides from this talk, head to http://mrkn.co/0h7ey

887048987be67f10649a9dfacced6606?s=128

Adam

July 15, 2013
Tweet

Transcript

  1. Adam Hitchcock @NorthIsUp How DISQUS does ‘it’ when ‘it’ isn’t

    Django
  2. None
  3. we’re hiring disqus.com/jobs If this is interesting to you...

  4. what is DISQUS?

  5. None
  6. DISQUS sees a lot of traffic Google Analytics: June 2013

  7. POPE white smoke francis announced

  8. white smoke francis announced Boston Marathon Bombs Detonate MIT Officer

    Shot FBI Releases Photo Pursuit and Firefight Begin Ricin Letters 2nd Suspect in Custody Waco Fertilizer Explosion
  9. what makes up Disqus?

  10. well, there are a few answers

  11. what makes up Disqus?

  12. mostly Python

  13. (and javascript)

  14. but mostly Python

  15. and we love it!

  16. why python? ๏ because it is fast… ๏ to develop

    in ๏ good community ๏ lots of libraries ๏ so active we can’t keep up with it ๏ is anybody using 3.3 yet? ๏ Disqus has a really good dev loop for it
  17. what makes up Disqus?

  18. a HUUUGE Django app

  19. which is a pain

  20. (sometimes)

  21. but our dev loop is good

  22. how does our dev loop work? ๏ diff code (with

    phabricator) ๏ get that reviewed ๏ review with phabricator ๏ ci done by jenkins ๏ land it on master (git push) ๏ wait for the daily deploy (ops team) ๏ and we are moving to auto deploy (we’ll revisit this one)
  23. what makes up Disqus?

  24. emerging service architecture

  25. break it down ๏ mostly Django ๏ two years ago

    95% Django* ๏ today ~70% Django* ๏ what is in that growing gap? ๏ *totally made up numbers
  26. why not just add more code to the monolithic app?

  27. close look at some pain points

  28. disqus-web (kinda monolithic) ๏ Django + celery on postgres +

    rabbit ๏ we roll up commits for a deploy ๏ risk of revert that isn’t your fault ๏ you bring down the whole thing if you forgot to remove that pdb.settrace() ๏ high scrutiny in code review (see above) ๏ high volume of code review (code review RTT can be a full day) ๏ lots of legacy code to work around (and not break)
  29. problems that don’t fit ๏ high concurrency ๏ isolation ๏

    feature ๏ failure ๏ speed (cpu cycles) ๏ speed (dev iteration cycle) ๏ fun
  30. what did we play with?

  31. for fun and concurrency! ๏ nginx + nginx-modules ๏ https://speakerdeck.com/northisup/

    scaling-realtime-at-disqus?slide=39 ๏ nginx + lua ๏ https://github.com/NorthIsUp/nginx- oauth-on-dotcloud/blob/master/nginx/ access.persona.lua.in ๏ go ๏ http://blog.disqus.com/post/ 51155103801/trying-out-this-go-thing
  32. embedly in nginx + lua local cjson = require "resty.http"

    local cjson = require "cjson" local url = "https://api.embed.ly/1/oembed?key=" .. ngx.var.api_key .. "&url=" .. ngx.var.url local hc = http:new() local ok, code, headers, status, body = hc:request { url = url, method = "GET" } if code ~= 200 then ngx.exit(code) end local thumbnail = cjson.decode(body).thumbnail_url ngx.var.thumbnail = thumbnail Text
  33. python still works too

  34. isolation and iteration speed ๏ failure should be isolated to

    a small service ๏ successes should be allowed to occur quickly ๏ python is still great for these use cases
  35. when we needed it right now! ๏ if __name__ ==

    ‘__main__’: ๏ from wsgiref.simple_server import make_server ๏ from xmlrpclib import ServerProxy
  36. it works great! ๏ for one deploy ๏ then you

    have to update it…
  37. lessons ๏ consistency is good! ๏ protects you from somebody

    quitting or getting hit by a bus ๏ anybody can just pick it up and run ๏ simplicity is good ๏ modularity is good ๏ the ability to borrow/combine features from other projects ๏ but copypasta is bad ๏ and not bugging ops for a deploy is the best
  38. enter disqus-service

  39. disqus-service ๏ the goal is consistent/free access to… ๏ config

    ๏ switches ๏ logging ๏ stats ๏ other systems ๏ convention over configuration ๏ but allow for configuration ๏ should be easy to run
  40. @service ๏ decorate a function with @service ๏ that is

    basically it, you can now run it
  41. hello.py from disqus.service.application import service @service def world(): print "hello

    world" $ toil run hello.world hello world
  42. but you want to do more! ๏ class Service: ๏

    @handler ๏ @on_message ๏ @pre_config ๏ @post_config ๏ @pre_setup ๏ @post_setup ๏ @pre_update_config ๏ @post_update_config
  43. service lifecycle ๏ config ๏ @pre_config ๏ self.config() ๏ @post_config

    ๏ setup ๏ @pre_setup ๏ self.setup() ๏ @post_setup ๏ run ๏ @handler
  44. hello-v2.0.py from disqus.service.application import (Service, handler, post_config) def World(Service): @post_config

    def setup_redis(self) self.redis = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def world(self): print self.redis.get("hello world") world = World()
  45. helper mixins ๏ FlaskService ๏ the service instance is also

    a wsgi app ๏ QueueService ๏ lets services talk over many queue types ๏ GeventService ๏ concurrency helpers ๏ RedisService (pre-config redis) ๏ KafkaService (kind of like a queue) ๏ DjangoORMService (db access via django)
  46. real life Disqus code! from IPython.frontend.terminal.embed import InteractiveShellEmbed from disqus.service.application

    import handler from tempest.services.mixins.data import RedisMixin import tempest class Shell(RedisMixin): @handler def shell(self, *args, **kwargs): """ runs an ipython shell loaded in the tempest module """ ipshell = InteractiveShellEmbed() ipshell(module=tempest)
  47. growing a service

  48. The ‘old’ ad server “the internet” DISQUS

  49. The ‘tempest’ ad server django nginx + uwsgi + @service

    Not involved! “the internet” redis @service
  50. The ‘tempest’ ad server DISQUS Not involved! “the internet” redis

    warmer pre-filter scorer post-filter pusher webservice (flask)
  51. The ‘tempest’ ad server ๏ warmer - ‘warm’ up redis

    ๏ fetches a lot of data from slow places ๏ pre-filter - remove bad data ๏ scorer - score ads ๏ post-filter - remove more bad data ๏ stasher - put it in redis
  52. scaling services ๏ You don’t know which part will need

    more... ๏ CPU ๏ I/O ๏ GPU ๏ sockets ๏ bits or bytes ๏ etc. ๏ so keep it simple stupid
  53. looks kinda like this def Tempest(Service): @post_config def setup_redis(self): self.redis

    = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def do_all_of_it(self): self.warm_cache() self.pre_filter() self.score() self.post_filter() self.push_to_redis() $ toil run tempest.tempest
  54. after running it we found that ๏ warmer - really,

    really, slow (10 min/task) ๏ pre-filter - super fast ๏ scorer - kinda ok, (several seconds/task) ๏ post-filter - super fast ๏ pusher - super fast
  55. The ‘tempest’ ad server 2.0 DISQUS Not involved! “the internet”

    redis post-filter pusher webservice warmer pre-filter scorer
  56. looks kinda like this def RedisService(Service): @post_config def setup_redis(self): self.redis

    = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT ) def Tempest(RedisService): @handler def do_it(self): self.warm_cache() ads = self.pre_filter() ads = self.score(ads) ads = self.post_filter(ads) self.push_to_redis(ads)
  57. looks kinda like this def Warmer(RedisService): @handler def warm_ads(self): self.warm_cache()

    self.pre_filter() def Scorer(QueueService): @handler def score_ads(self): ads = self.score() self.q.put(ads) def Pusher(QueueService, RedisService): @on_message def handle_ads(self, ads): ads = self.post_filter(ads) self.push_to_redis(ads)
  58. What actually runs $ toil run tempest.warmer # run 16

    procs $ toil run tempest.scorer # run 2 procs $ toil run tempest.pusher # run 2 procs
  59. start simple, then iterate ๏ services are good for when

    you need to break out work ๏ but don’t prematurely break out work ๏ because you are probably wrong ๏ measure first ๏ make decisions on data
  60. how do those decorators work? they were cool

  61. stolen from exam package ๏ written by @nfluxx for Disqus

    ๏ literally the same code but for tests ๏ replace setUp with @before ๏ makes test fixtures easy with @fixture ๏ check out exam/cases.py, line 35 ๏ github.com/Fluxx/exam
  62. how the decorators work class base(object): """ Base for the

    decorators. Allows a decorator to access the callable. """ def __init__(self, *things): self.init_callables = things def __call__(self, instance): return self.init_callables[0](instance) class post_config(base): """ Runs after the config phase. """ pass
  63. how the decorators work class BaseService(object): def __attrs_of_type(self, kind): for

    base in reversed(inspect.getmro(type(self))): for attr, class_value in vars(base).items(): resolved_value = getattr(type(self), attr, False) if not isinstance(resolved_value, kind): continue elif class_value is not resolved_value: continue else: yield attr, resolved_value def __run_hooks(self, hook): return [value(self) for _, value in self.__attrs_of_type(hook)] # ... snip ...
  64. how the decorators work class BaseService(object): # ... snip ...

    def __call__(self): results = self.__run_hooks(handler) @contextmanager def around_config(self): self.__run_hooks(pre_config) yield self.__run_hooks(post_config)
  65. more about mixins

  66. or, with great power comes great responsibility

  67. data pipelines class Pipeline(object): def parse(self, data): raise NotImplemented('No ParserMixin

    used') def compute(self, data, parsed_data): raise NotImplemented('No ComputeMixin used') def publish(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used') def pipe_data(self, data): parsed_data = self.parse(data) computed_data = self.compute(data, parsed_data) return self.publish(data, parsed_data, computed_data)
  68. example mixins class JSONParserMixin(Pipeline): def parse(self, data): return json.loads(data) class

    SuperSecureEncryptDataMixin(Pipeline): def compute(self, data, parsed_data): return parsed_data.encode('rot13') class AnnomizeDataMixin(Pipeline): def compute(self, data, parsed_data): return {} class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)
  69. None
  70. final pipeline class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass class JSONSecureHTTPPipeline(

    JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass
  71. what about tests? class JSONAnnonHTTPPipelineTest( BasePipelineTest, JSONParserMixinTest, AnnomizeDataMixinTest, HTTPPublisherMixinTest): pass

  72. open source as output ๏ we have lots of open

    source stuff ๏ sentry for errors ๏ nydus (for redis connections) ๏ django-mailviews ๏ gargoyle (and v2.0 renamed to gutter) ๏ and more! https://github.com/disqus/
  73. what about disqus-service? ๏ not open source, yet ๏ but

    I want your feedback! ๏ come talk to me about how ๏ you solved the same issue ๏ how my solution is stupid ๏ how my solution is awesome ๏ would you use actually it?
  74. What did I just talk about? ๏ lots of python

    at Disqus! ๏ but not all of it ๏ but most of it ๏ because sometimes other tools are better ๏ services are awesome ๏ cool decorator hacking ๏ mixins make testing super easy
  75. psst, we’re hiring disqus.com/jobs If this was interesting to you...

  76. I don’t have all the answers... ๏ How do you

    run services? ๏ RPC vs HTTP vs Queues (I like queues) ๏ what do you use for highly concurrent systems? ๏ do you lint for `print`, `pdb.settrace()`, or `import debug`? ๏ if yes, can I have it? ๏ dev != prod, or does it? ๏ if you run uwsgi + nginx in prod ๏ why dev with ./manage.py runserver? @NorthIsUp