Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling DISQUS - PyCon 2011

David Cramer
September 26, 2011

Scaling DISQUS - PyCon 2011

David Cramer

September 26, 2011
Tweet

More Decks by David Cramer

Other Decks in Technology

Transcript

  1. DISQUS Jason Yan @jasonyan David Cramer @zeeg Python at 400

    500 million visitors Got feedback? Use hashtag #sckrw Sunday, March 13, 2011
  2. Agenda • What is DISQUS? • An Overview of the

    Infrastructure • Iterative Development and Deployment • Why We Love Python Sunday, March 13, 2011
  3. We are a comment system with an emphasis on connecting

    communities http://disqus.com/about/ dis·cuss • dĭ-skŭs' What is DISQUS? Sunday, March 13, 2011
  4. Startup-ish • Founded just about 4 years ago • 16

    employees, 8 engineers • Tra c increasing 15-20% a month • Flat organizational structure, every engineer is a product manager • Fast turnaround, new feature launches every week (sometimes daily) Sunday, March 13, 2011
  5. Tra c 0M 125M 250M 375M 500M Number of Visitors

    March 2008 through March 2011 Sunday, March 13, 2011
  6. DjangoCon 2010 • 17,000 requests/ second peak • 450,000 websites

    • 15 million profiles • 75 million comments • 250 million visitors Sunday, March 13, 2011
  7. Six Months Later • 25,000 requests/ second peak • 700,000

    websites • 30 million profiles • 170 million comments • 500 million visitors • 17,000 requests/ second peak • 450,000 websites • 15 million profiles • 75 million comments • 250 million visitors Sunday, March 13, 2011
  8. Six Months Later • September 2010: 250 million uniques •

    March 2011: 500 million uniques • Handling over 2x the tra c Sunday, March 13, 2011
  9. Six Months Later • September 2010: ~100 servers • March

    2011: ~100 servers • Scale diagonally Sunday, March 13, 2011
  10. Scaling Diagonally • We still rent hardware, so there is

    no “commodity hardware” • Cheaper to upgrade • Everything is redundant • Partition data where you need to, scale partitions vertically • Upgrade hardware (more RAM, more drives, more cores) • Python apps tend to be CPU bound Sunday, March 13, 2011
  11. Infrastructure • 35% Web Servers (Apache + mod_wsgi) • 15%

    Utility Servers (Python scripts, background workers) • 20% Databases (PostgreSQL, Redis, Membase) • 20% Load Balancing / High Availability (HAProxy + Heartbeat) • 10% Caching servers (Memcached, Varnish) • Half of our servers run Python Sunday, March 13, 2011
  12. • Use what you’re comfortable with • Apache + mod_wsgi

    vs nginx + uWSGI • Bottleneck is in the application Python Web Servers mod_wsgi uWSGI 0 200 400 600 req/sec Min Avg Max 0 15.0 30.0 45.0 60.0 mod_wsgi uWSGI Memory Sunday, March 13, 2011
  13. Background Workers • Lots of tasks that don’t need to

    be done in web application process: • Crawling URLs • Updating avatars • Email notifications • Analytics • Counters Sunday, March 13, 2011
  14. Background Workers (cont’d) • Most jobs are I/O bound •

    Slow external calls • Twitter is slow • Facebook is slow • Could parallelize with multiple processes, but... Sunday, March 13, 2011
  15. Background Workers (cont’d) • Waste of memory • Use non-blocking

    I/O • Celery 2.2 adds support for gevent/ eventlet Sunday, March 13, 2011
  16. Monitoring • Application side: Graphite • Real-time(ish) graphing • Django

    front-end, Python backend • Etsy’s StatsD proxy to Graphite • UDP (fire and forget) • Batches updates Sunday, March 13, 2011
  17. Monitoring • Track application metrics • Errors, exceptions • New

    comments, users, sites, etc. • Anything Sunday, March 13, 2011
  18. Monitoring • Check out Etsy’s posts: • Measure Anything, Measure

    Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/ • Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/ Sunday, March 13, 2011
  19. Which means... • Largest Django-powered web application • We fork,

    and even sometimes monkey patch to make it scale to our needs • Fortunately, we don’t have to do too much (Yay, Django!) • Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways) Sunday, March 13, 2011
  20. Iterating Quickly • Abstracting our application environment • Less dependancies

    locally • Rely on CI for dependency coverage • Heavy use of open source packages • No NIH syndrome • Deploy frequently, 3-7 times a day • Lots of branches, but master is “stable” • Realtime reporting on exceptions, metrics • Our test suite is the main blocker (slow) Sunday, March 13, 2011
  21. Gargoyle Being users of our product, we actively use early

    versions of features before public release Deploy features to portions of a user base at a time to ensure smooth, measurable releases Sunday, March 13, 2011
  22. The Deployment Problem • Make some changes locally • Run

    a subset of the test suite • Push your commits • CI server begins running tests • .... Sunday, March 13, 2011
  23. Rinse and Repeat • 30 minutes later tests fail, start

    over • Finally, deploy to a subset of servers • Open Sentry (our exception logger) • Monitor Graphite • Deploy to 35 servers (~8 minutes) • Full rollback in < 30 seconds Sunday, March 13, 2011
  24. Testing Code • Test suite takes around 25 minutes usually

    • “Stuck” with Hudson (or Jenkins) • Most tightly integrated plugins are geared towards Java developers • Which framework do we use? • unittest(2), nose, doctests, LETTUCE? • We use unittest and nose • Need to report code coverage, speed of tests, pylint (or pyflakes) Sunday, March 13, 2011
  25. Love-ish • Many of us started with PHP or Rails

    • Clean syntax, clear standards • All languages need PEP8.py and PyFlakes • Interpreted, fast... enough • Very easy to learn • We all started by learning Django first, then Python Sunday, March 13, 2011
  26. Haters Gonna Hate If you could choose one thing in

    Python to hate on... Sunday, March 13, 2011
  27. What can we do? • Too many forks, too many

    frameworks • We need less clones, and more combined e ort • Improving existing Python solutions • More Python solutions for existing products Sunday, March 13, 2011
  28. References • Sentry (our exception tracking tool) http://github.com/dcramer/django-sentry • Gargoyle

    (feature switches) https://github.com/disqus/gargoyle • Django DB Utils (collection of db helpers for Django) https://github.com/disqus/django-db-utils • Jenkins CI http://jenkins-ci.org/ code.disqus.com Sunday, March 13, 2011