Slide 1

Slide 1 text

Adam Hitchcock @NorthIsUp How DISQUS does ‘it’ when ‘it’ isn’t Django

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

we’re hiring disqus.com/jobs If this is interesting to you...

Slide 4

Slide 4 text

what is DISQUS?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

DISQUS sees a lot of traffic Google Analytics: June 2013

Slide 7

Slide 7 text

POPE white smoke francis announced

Slide 8

Slide 8 text

white smoke francis announced Boston Marathon Bombs Detonate MIT Officer Shot FBI Releases Photo Pursuit and Firefight Begin Ricin Letters 2nd Suspect in Custody Waco Fertilizer Explosion

Slide 9

Slide 9 text

what makes up Disqus?

Slide 10

Slide 10 text

well, there are a few answers

Slide 11

Slide 11 text

what makes up Disqus?

Slide 12

Slide 12 text

mostly Python

Slide 13

Slide 13 text

(and javascript)

Slide 14

Slide 14 text

but mostly Python

Slide 15

Slide 15 text

and we love it!

Slide 16

Slide 16 text

why python? ๏ because it is fast… ๏ to develop in ๏ good community ๏ lots of libraries ๏ so active we can’t keep up with it ๏ is anybody using 3.3 yet? ๏ Disqus has a really good dev loop for it

Slide 17

Slide 17 text

what makes up Disqus?

Slide 18

Slide 18 text

a HUUUGE Django app

Slide 19

Slide 19 text

which is a pain

Slide 20

Slide 20 text

(sometimes)

Slide 21

Slide 21 text

but our dev loop is good

Slide 22

Slide 22 text

how does our dev loop work? ๏ diff code (with phabricator) ๏ get that reviewed ๏ review with phabricator ๏ ci done by jenkins ๏ land it on master (git push) ๏ wait for the daily deploy (ops team) ๏ and we are moving to auto deploy (we’ll revisit this one)

Slide 23

Slide 23 text

what makes up Disqus?

Slide 24

Slide 24 text

emerging service architecture

Slide 25

Slide 25 text

break it down ๏ mostly Django ๏ two years ago 95% Django* ๏ today ~70% Django* ๏ what is in that growing gap? ๏ *totally made up numbers

Slide 26

Slide 26 text

why not just add more code to the monolithic app?

Slide 27

Slide 27 text

close look at some pain points

Slide 28

Slide 28 text

disqus-web (kinda monolithic) ๏ Django + celery on postgres + rabbit ๏ we roll up commits for a deploy ๏ risk of revert that isn’t your fault ๏ you bring down the whole thing if you forgot to remove that pdb.settrace() ๏ high scrutiny in code review (see above) ๏ high volume of code review (code review RTT can be a full day) ๏ lots of legacy code to work around (and not break)

Slide 29

Slide 29 text

problems that don’t fit ๏ high concurrency ๏ isolation ๏ feature ๏ failure ๏ speed (cpu cycles) ๏ speed (dev iteration cycle) ๏ fun

Slide 30

Slide 30 text

what did we play with?

Slide 31

Slide 31 text

for fun and concurrency! ๏ nginx + nginx-modules ๏ https://speakerdeck.com/northisup/ scaling-realtime-at-disqus?slide=39 ๏ nginx + lua ๏ https://github.com/NorthIsUp/nginx- oauth-on-dotcloud/blob/master/nginx/ access.persona.lua.in ๏ go ๏ http://blog.disqus.com/post/ 51155103801/trying-out-this-go-thing

Slide 32

Slide 32 text

embedly in nginx + lua local cjson = require "resty.http" local cjson = require "cjson" local url = "https://api.embed.ly/1/oembed?key=" .. ngx.var.api_key .. "&url=" .. ngx.var.url local hc = http:new() local ok, code, headers, status, body = hc:request { url = url, method = "GET" } if code ~= 200 then ngx.exit(code) end local thumbnail = cjson.decode(body).thumbnail_url ngx.var.thumbnail = thumbnail Text

Slide 33

Slide 33 text

python still works too

Slide 34

Slide 34 text

isolation and iteration speed ๏ failure should be isolated to a small service ๏ successes should be allowed to occur quickly ๏ python is still great for these use cases

Slide 35

Slide 35 text

when we needed it right now! ๏ if __name__ == ‘__main__’: ๏ from wsgiref.simple_server import make_server ๏ from xmlrpclib import ServerProxy

Slide 36

Slide 36 text

it works great! ๏ for one deploy ๏ then you have to update it…

Slide 37

Slide 37 text

lessons ๏ consistency is good! ๏ protects you from somebody quitting or getting hit by a bus ๏ anybody can just pick it up and run ๏ simplicity is good ๏ modularity is good ๏ the ability to borrow/combine features from other projects ๏ but copypasta is bad ๏ and not bugging ops for a deploy is the best

Slide 38

Slide 38 text

enter disqus-service

Slide 39

Slide 39 text

disqus-service ๏ the goal is consistent/free access to… ๏ config ๏ switches ๏ logging ๏ stats ๏ other systems ๏ convention over configuration ๏ but allow for configuration ๏ should be easy to run

Slide 40

Slide 40 text

@service ๏ decorate a function with @service ๏ that is basically it, you can now run it

Slide 41

Slide 41 text

hello.py from disqus.service.application import service @service def world(): print "hello world" $ toil run hello.world hello world

Slide 42

Slide 42 text

but you want to do more! ๏ class Service: ๏ @handler ๏ @on_message ๏ @pre_config ๏ @post_config ๏ @pre_setup ๏ @post_setup ๏ @pre_update_config ๏ @post_update_config

Slide 43

Slide 43 text

service lifecycle ๏ config ๏ @pre_config ๏ self.config() ๏ @post_config ๏ setup ๏ @pre_setup ๏ self.setup() ๏ @post_setup ๏ run ๏ @handler

Slide 44

Slide 44 text

hello-v2.0.py from disqus.service.application import (Service, handler, post_config) def World(Service): @post_config def setup_redis(self) self.redis = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def world(self): print self.redis.get("hello world") world = World()

Slide 45

Slide 45 text

helper mixins ๏ FlaskService ๏ the service instance is also a wsgi app ๏ QueueService ๏ lets services talk over many queue types ๏ GeventService ๏ concurrency helpers ๏ RedisService (pre-config redis) ๏ KafkaService (kind of like a queue) ๏ DjangoORMService (db access via django)

Slide 46

Slide 46 text

real life Disqus code! from IPython.frontend.terminal.embed import InteractiveShellEmbed from disqus.service.application import handler from tempest.services.mixins.data import RedisMixin import tempest class Shell(RedisMixin): @handler def shell(self, *args, **kwargs): """ runs an ipython shell loaded in the tempest module """ ipshell = InteractiveShellEmbed() ipshell(module=tempest)

Slide 47

Slide 47 text

growing a service

Slide 48

Slide 48 text

The ‘old’ ad server “the internet” DISQUS

Slide 49

Slide 49 text

The ‘tempest’ ad server django nginx + uwsgi + @service Not involved! “the internet” redis @service

Slide 50

Slide 50 text

The ‘tempest’ ad server DISQUS Not involved! “the internet” redis warmer pre-filter scorer post-filter pusher webservice (flask)

Slide 51

Slide 51 text

The ‘tempest’ ad server ๏ warmer - ‘warm’ up redis ๏ fetches a lot of data from slow places ๏ pre-filter - remove bad data ๏ scorer - score ads ๏ post-filter - remove more bad data ๏ stasher - put it in redis

Slide 52

Slide 52 text

scaling services ๏ You don’t know which part will need more... ๏ CPU ๏ I/O ๏ GPU ๏ sockets ๏ bits or bytes ๏ etc. ๏ so keep it simple stupid

Slide 53

Slide 53 text

looks kinda like this def Tempest(Service): @post_config def setup_redis(self): self.redis = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT) @handler def do_all_of_it(self): self.warm_cache() self.pre_filter() self.score() self.post_filter() self.push_to_redis() $ toil run tempest.tempest

Slide 54

Slide 54 text

after running it we found that ๏ warmer - really, really, slow (10 min/task) ๏ pre-filter - super fast ๏ scorer - kinda ok, (several seconds/task) ๏ post-filter - super fast ๏ pusher - super fast

Slide 55

Slide 55 text

The ‘tempest’ ad server 2.0 DISQUS Not involved! “the internet” redis post-filter pusher webservice warmer pre-filter scorer

Slide 56

Slide 56 text

looks kinda like this def RedisService(Service): @post_config def setup_redis(self): self.redis = Redis( host=self.config.REDIS_HOST, port=self.config.REDIS_PORT ) def Tempest(RedisService): @handler def do_it(self): self.warm_cache() ads = self.pre_filter() ads = self.score(ads) ads = self.post_filter(ads) self.push_to_redis(ads)

Slide 57

Slide 57 text

looks kinda like this def Warmer(RedisService): @handler def warm_ads(self): self.warm_cache() self.pre_filter() def Scorer(QueueService): @handler def score_ads(self): ads = self.score() self.q.put(ads) def Pusher(QueueService, RedisService): @on_message def handle_ads(self, ads): ads = self.post_filter(ads) self.push_to_redis(ads)

Slide 58

Slide 58 text

What actually runs $ toil run tempest.warmer # run 16 procs $ toil run tempest.scorer # run 2 procs $ toil run tempest.pusher # run 2 procs

Slide 59

Slide 59 text

start simple, then iterate ๏ services are good for when you need to break out work ๏ but don’t prematurely break out work ๏ because you are probably wrong ๏ measure first ๏ make decisions on data

Slide 60

Slide 60 text

how do those decorators work? they were cool

Slide 61

Slide 61 text

stolen from exam package ๏ written by @nfluxx for Disqus ๏ literally the same code but for tests ๏ replace setUp with @before ๏ makes test fixtures easy with @fixture ๏ check out exam/cases.py, line 35 ๏ github.com/Fluxx/exam

Slide 62

Slide 62 text

how the decorators work class base(object): """ Base for the decorators. Allows a decorator to access the callable. """ def __init__(self, *things): self.init_callables = things def __call__(self, instance): return self.init_callables[0](instance) class post_config(base): """ Runs after the config phase. """ pass

Slide 63

Slide 63 text

how the decorators work class BaseService(object): def __attrs_of_type(self, kind): for base in reversed(inspect.getmro(type(self))): for attr, class_value in vars(base).items(): resolved_value = getattr(type(self), attr, False) if not isinstance(resolved_value, kind): continue elif class_value is not resolved_value: continue else: yield attr, resolved_value def __run_hooks(self, hook): return [value(self) for _, value in self.__attrs_of_type(hook)] # ... snip ...

Slide 64

Slide 64 text

how the decorators work class BaseService(object): # ... snip ... def __call__(self): results = self.__run_hooks(handler) @contextmanager def around_config(self): self.__run_hooks(pre_config) yield self.__run_hooks(post_config)

Slide 65

Slide 65 text

more about mixins

Slide 66

Slide 66 text

or, with great power comes great responsibility

Slide 67

Slide 67 text

data pipelines class Pipeline(object): def parse(self, data): raise NotImplemented('No ParserMixin used') def compute(self, data, parsed_data): raise NotImplemented('No ComputeMixin used') def publish(self, data, parsed_data, computed_data): raise NotImplemented('No PublisherMixin used') def pipe_data(self, data): parsed_data = self.parse(data) computed_data = self.compute(data, parsed_data) return self.publish(data, parsed_data, computed_data)

Slide 68

Slide 68 text

example mixins class JSONParserMixin(Pipeline): def parse(self, data): return json.loads(data) class SuperSecureEncryptDataMixin(Pipeline): def compute(self, data, parsed_data): return parsed_data.encode('rot13') class AnnomizeDataMixin(Pipeline): def compute(self, data, parsed_data): return {} class HTTPPublisher(Pipeline): def publish(self, data, parsed_data, computed_data): u = urllib2.urlopen(self.dat_url, computed_data) return u class FilePublisher(Pipeline): def publish(self, data, parsed_data, computed_data): with open(self.output, 'a') as f: f.write(computed_data)

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

final pipeline class JSONAnnonHTTPPipeline( JSONParserMixin, AnnomizeDataMixin, HTTPPublisherMixin): pass class JSONSecureHTTPPipeline( JSONParserMixin, SuperSecureEncyptionMixin, HTTPPublisherMixin): pass class JSONAnnonFilePipeline( JSONParserMixin, AnnomizeDataMixin, FilePublisherMixin): pass

Slide 71

Slide 71 text

what about tests? class JSONAnnonHTTPPipelineTest( BasePipelineTest, JSONParserMixinTest, AnnomizeDataMixinTest, HTTPPublisherMixinTest): pass

Slide 72

Slide 72 text

open source as output ๏ we have lots of open source stuff ๏ sentry for errors ๏ nydus (for redis connections) ๏ django-mailviews ๏ gargoyle (and v2.0 renamed to gutter) ๏ and more! https://github.com/disqus/

Slide 73

Slide 73 text

what about disqus-service? ๏ not open source, yet ๏ but I want your feedback! ๏ come talk to me about how ๏ you solved the same issue ๏ how my solution is stupid ๏ how my solution is awesome ๏ would you use actually it?

Slide 74

Slide 74 text

What did I just talk about? ๏ lots of python at Disqus! ๏ but not all of it ๏ but most of it ๏ because sometimes other tools are better ๏ services are awesome ๏ cool decorator hacking ๏ mixins make testing super easy

Slide 75

Slide 75 text

psst, we’re hiring disqus.com/jobs If this was interesting to you...

Slide 76

Slide 76 text

I don’t have all the answers... ๏ How do you run services? ๏ RPC vs HTTP vs Queues (I like queues) ๏ what do you use for highly concurrent systems? ๏ do you lint for `print`, `pdb.settrace()`, or `import debug`? ๏ if yes, can I have it? ๏ dev != prod, or does it? ๏ if you run uwsgi + nginx in prod ๏ why dev with ./manage.py runserver? @NorthIsUp