Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Disqus Does SOA on Django

Adam
July 09, 2014

How Disqus Does SOA on Django

Over the past few years, Disqus has become one of the biggest Django apps in existence, crossing over a billion unique visitors a month. But sometimes Django isn't the right tool for the job.

Join Disqus engineer Adam Hitchcock to learn how nginx modules and Lua can replace Python services, about the infrastructure that launched realtime and ad services, as well as about some of the failures they've encountered along the way.

** For more resources and a video of this presentation, head to http://mrkn.co/0h7ey

Adam

July 09, 2014
Tweet

More Decks by Adam

Other Decks in Technology

Transcript

  1. TOC ๏ What is a Disqus? ๏ Why did you

    lie to us last year Adam? ๏ What is SOA? and Why should you SOA? ๏ Different Data patterns in SOA ๏ How Disqus does SOA ๏ Legacy example ๏ New service example ๏ Is SOA at Disqus a success?
  2. Why do I sit on a throne of lies? ๏

    “Double down on Django” - my CTO ๏ leverage Django community ๏ standard practices makes hiring easier ๏ we are already really good at Django stuff ๏ Challenge assumptions and find ways to use Django outside of the normal web/request pattern
  3. What is a SOA? ๏ Architecting systems to contain… ๏

    discrete software applications (services) ๏ simple, well defined interfaces (APIs) ๏ loose cooperation to perform a required function ๏ Two software roles in SOA ๏ service provider ๏ service consumer ๏ an app may play both roles
  4. SOA is not a new idea “Write programs that do

    one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.” - Doug McIlroy
  5. Services you already use ๏ Databases ๏ Postgres, MySQL, redis,

    etc. ๏ Queues ๏ Rabbit, Kafka (I know it’s not a queue…) ๏ External APIs ๏ Twitter, Facebook, etc. ๏ Single page javascript apps
  6. Why SOA? ๏ Allows for heterogenous environment ๏ Data location

    transparency ๏ Small stable apis ๏ Independent scalability ๏ Easier testability ๏ Easier deployment ๏ Easier to maintain conceptual integrity
  7. Data patterns of services ๏ Transactional Data ๏ REST ๏

    Model access ๏ RPC (procedural) ๏ High logic endpoints (recommendations) ๏ Auth Systems ๏ Async Data ๏ Queues ๏ Pub/Sub
  8. Stable APIs ๏ Pick your interface definition language (IDL) ๏

    JSON, Protobuf, Thrift, etc. ๏ Pick your transport protocol ๏ HTTP, Thrift, etc. ๏ I like HTTP + JSON, Django is pretty good at it ๏ “Accept” header or “format” param ๏ “Connection: Keep-Alive”
  9. REST + Django ๏ This is where Django already excels

    ๏ Django Rest Framework ๏ Or roll your own thin API
  10. RPC + Django ๏ Useful for logic heavy APIs ๏

    recommendation ๏ authentication and authorization ๏ Prone to overspecialized APIs ๏ RPC systems can hide network costs too much ๏ Thrift ๏ zerorpc
  11. Async + Django ๏ High cpu or long running task

    ๏ Django Management Commands ๏ while True: do_work() ๏ Celery ๏ post_save hook + celery task ๏ easy to parse celery in any language ๏ go-celery ๏ Celery Beat for periodic tasks
  12. Django is easy to run ๏ Django IO Loop ๏

    easy to run ๏ easy to understand ๏ Multiple entry points into Django ๏ WSGI ๏ management command ๏ Celery task ๏ Celery beat
  13. Disqus Web ๏ Monolithic Django project ๏ 183,108 lines of

    code ๏ Over 7 years old ๏ Lots of bad decisions
  14. Deployment ๏ Deploy the entire code base (as a lib)

    ๏ Cluster machines by purpose ๏ cpu/memory/network patterns emerge ๏ makes scale planning easier ๏ Services routed to based on hostname + path ๏ Three phase deployment
  15. Entry points ๏ Different entry points to change purpose ๏

    DJANGO_SETTINGS_MODULE ๏ multiple settings.py ๏ multiple urls.py ๏ Using different settings.py files we can… ๏ load different middleware ๏ load different url resolvers ๏ url resolution is expensive ๏ different template request contexts
  16. Example Services ๏ Public api ๏ a ton of middleware

    ๏ hundreds of url routes ๏ lots of automatic request context ๏ Internal objects api ๏ no middleware ๏ one url route ๏ no request context for transformers
  17. Did it work? ๏ Problems typical of a large code

    base ๏ Version conflicts still problematic ๏ internal function api changes ๏ eternal package upgrades ๏ Conceptual integrity still hard ๏ you can only remember so many lines of code ๏ Constantly integrating with entire code base
  18. Did it work? ✓ Allows for heterogenous environment ✓ Data

    location transparency Small stable apis ✓ Independent scalability Easier testability ✓ Easier deployment Easier to maintain conceptual integrity
  19. The Disqus Ads server ๏ Use Django apps for encapsulation

    ๏ Leverage Django beyond WSGI ๏ Multiple code bases ๏ only one codebase can access the DB directly ๏ others access via REST or RPC APIs
  20. Lots o’ services ๏ Ads Data API ๏ Django REST

    framework ๏ minimal RPC endpoints ๏ Ads Serving API ๏ RPC endpoint ๏ Ads Scoring & Ads Cache/Time-Series Warming ๏ Management command ๏ Ads Data Import ๏ Celery + Celery Beat
  21. Code organization ๏ Ads Data service ๏ Ads Data Import

    service ๏ 11,400 lines ๏ Ads Serving service ๏ Ads Scoring service ๏ Ads Cache Warming service ๏ Ads Time-Series Warming service ๏ 11,185 lines
  22. What does it look like? The Internet Ads Data API

    Ads Serving API Cache Legacy Disqus Web Monolithic Ads Scoring Service Ads Data Import Service Internal Ads Tooling Advertiser dashboard Disqus Embed Gutter Feature Switch Service Ads Cache Warming Service Ads Time-Series Warming Service Has ORM access No ORM access
  23. What does it look like? The Internet Django uwsgi REST

    Django uwsgi RPC Redis Django + a million custom things Django Celery Beat javascript backbone javascript backbone Disqus Embed Django uwsgi Django manage.py command Django manage.py command Django Celery Beat Has ORM access No ORM access
  24. Did it work? ๏ Harder to share code between services

    ๏ need to use an external packages ๏ Django best practices help a lot long term ๏ Easy to understand the entire system ๏ easy to quickly add + test code ๏ integration tests are more important ๏ service apis live longer, need more support ๏ Fast deploys and tests ๏ Ease of scalability
  25. Did it work? ✓ Allows for heterogenous environment ✓ Location

    transparency ✓ Small stable apis ✓ Independent scalability ✓ Easier testability ✓ Easier deployment ✓ Easier to maintain conceptual integrity
  26. Is SOA a success for Disqus? ๏ Easier to run

    over all ๏ Easier to understand new systems ๏ Easier to not break existing systems
  27. Roundup ๏ “Do one thing and do it well” ๏

    Know what data pattern you are solving for ๏ Stick to your API decisions ๏ protocol ๏ transport ๏ Django has multiple entry points, use them
  28. Links ๏ Support Django REST Framework on Kickstarter ๏ kickstarter.com/projects/tomchristie/django-

    rest-framework-3 ๏ django-rest-framework.org ๏ github.com/mattrobenolt/go-celery ๏ lincolnloop.com/django-best-practices/ ๏ en.wikipedia.org/wiki/Unix_philosophy
  29. Questions for me What was the most challenging part of

    designing the system?” - Tom Christie of Django REST Framework “I thought you would say ‘designing it so it doesn't go horribly horribly wrong when one part breaks’” - Also Tom Christie of Django REST Framework
  30. Questions for you ๏ How do you make maintainable RPC

    endpoints? ๏ Why are the best methods of service discovery? ๏ Which is better one codebase vs. many? ๏ What is a microservice?
  31. `