Slide 1

Slide 1 text

DISQUS Jason Yan @jasonyan David Cramer @zeeg Python at 400 500 million visitors Got feedback? Use hashtag #sckrw Sunday, March 13, 2011

Slide 2

Slide 2 text

Agenda • What is DISQUS? • An Overview of the Infrastructure • Iterative Development and Deployment • Why We Love Python Sunday, March 13, 2011

Slide 3

Slide 3 text

We are a comment system with an emphasis on connecting communities http://disqus.com/about/ dis·cuss • dĭ-skŭs' What is DISQUS? Sunday, March 13, 2011

Slide 4

Slide 4 text

Embeddable Comments Sunday, March 13, 2011

Slide 5

Slide 5 text

A Brief History Sunday, March 13, 2011

Slide 6

Slide 6 text

Startup-ish • Founded just about 4 years ago • 16 employees, 8 engineers • Tra c increasing 15-20% a month • Flat organizational structure, every engineer is a product manager • Fast turnaround, new feature launches every week (sometimes daily) Sunday, March 13, 2011

Slide 7

Slide 7 text

Tra c 0M 125M 250M 375M 500M Number of Visitors March 2008 through March 2011 Sunday, March 13, 2011

Slide 8

Slide 8 text

DjangoCon 2010 • 17,000 requests/ second peak • 450,000 websites • 15 million profiles • 75 million comments • 250 million visitors Sunday, March 13, 2011

Slide 9

Slide 9 text

Six Months Later • 25,000 requests/ second peak • 700,000 websites • 30 million profiles • 170 million comments • 500 million visitors • 17,000 requests/ second peak • 450,000 websites • 15 million profiles • 75 million comments • 250 million visitors Sunday, March 13, 2011

Slide 10

Slide 10 text

Six Months Later • September 2010: 250 million uniques • March 2011: 500 million uniques • Handling over 2x the tra c Sunday, March 13, 2011

Slide 11

Slide 11 text

Six Months Later • September 2010: ~100 servers • March 2011: ~100 servers • Scale diagonally Sunday, March 13, 2011

Slide 12

Slide 12 text

Scaling Diagonally • We still rent hardware, so there is no “commodity hardware” • Cheaper to upgrade • Everything is redundant • Partition data where you need to, scale partitions vertically • Upgrade hardware (more RAM, more drives, more cores) • Python apps tend to be CPU bound Sunday, March 13, 2011

Slide 13

Slide 13 text

Infrastructure • 35% Web Servers (Apache + mod_wsgi) • 15% Utility Servers (Python scripts, background workers) • 20% Databases (PostgreSQL, Redis, Membase) • 20% Load Balancing / High Availability (HAProxy + Heartbeat) • 10% Caching servers (Memcached, Varnish) • Half of our servers run Python Sunday, March 13, 2011

Slide 14

Slide 14 text

• Use what you’re comfortable with • Apache + mod_wsgi vs nginx + uWSGI • Bottleneck is in the application Python Web Servers mod_wsgi uWSGI 0 200 400 600 req/sec Min Avg Max 0 15.0 30.0 45.0 60.0 mod_wsgi uWSGI Memory Sunday, March 13, 2011

Slide 15

Slide 15 text

Background Workers • Lots of tasks that don’t need to be done in web application process: • Crawling URLs • Updating avatars • Email notifications • Analytics • Counters Sunday, March 13, 2011

Slide 16

Slide 16 text

Background Workers (cont’d) • Most jobs are I/O bound • Slow external calls • Twitter is slow • Facebook is slow • Could parallelize with multiple processes, but... Sunday, March 13, 2011

Slide 17

Slide 17 text

Background Workers (cont’d) • Waste of memory • Use non-blocking I/O • Celery 2.2 adds support for gevent/ eventlet Sunday, March 13, 2011

Slide 18

Slide 18 text

Monitoring • Application side: Graphite • Real-time(ish) graphing • Django front-end, Python backend • Etsy’s StatsD proxy to Graphite • UDP (fire and forget) • Batches updates Sunday, March 13, 2011

Slide 19

Slide 19 text

Monitoring • Track application metrics • Errors, exceptions • New comments, users, sites, etc. • Anything Sunday, March 13, 2011

Slide 20

Slide 20 text

Monitoring • Check out Etsy’s posts: • Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/ • Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/ Sunday, March 13, 2011

Slide 21

Slide 21 text

What about the code? Sunday, March 13, 2011

Slide 22

Slide 22 text

Powered By Django Sunday, March 13, 2011

Slide 23

Slide 23 text

Which means... • Largest Django-powered web application • We fork, and even sometimes monkey patch to make it scale to our needs • Fortunately, we don’t have to do too much (Yay, Django!) • Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways) Sunday, March 13, 2011

Slide 24

Slide 24 text

Iterative Development Release Early Release Often Sunday, March 13, 2011

Slide 25

Slide 25 text

Iterating Quickly • Abstracting our application environment • Less dependancies locally • Rely on CI for dependency coverage • Heavy use of open source packages • No NIH syndrome • Deploy frequently, 3-7 times a day • Lots of branches, but master is “stable” • Realtime reporting on exceptions, metrics • Our test suite is the main blocker (slow) Sunday, March 13, 2011

Slide 26

Slide 26 text

Dealing with Deploys Sunday, March 13, 2011

Slide 27

Slide 27 text

Gargoyle Being users of our product, we actively use early versions of features before public release Deploy features to portions of a user base at a time to ensure smooth, measurable releases Sunday, March 13, 2011

Slide 28

Slide 28 text

The Deployment Problem • Make some changes locally • Run a subset of the test suite • Push your commits • CI server begins running tests • .... Sunday, March 13, 2011

Slide 29

Slide 29 text

Waiting on the test suite... Sunday, March 13, 2011

Slide 30

Slide 30 text

Rinse and Repeat • 30 minutes later tests fail, start over • Finally, deploy to a subset of servers • Open Sentry (our exception logger) • Monitor Graphite • Deploy to 35 servers (~8 minutes) • Full rollback in < 30 seconds Sunday, March 13, 2011

Slide 31

Slide 31 text

Wait, Sentry? Sunday, March 13, 2011

Slide 32

Slide 32 text

Testing Sunday, March 13, 2011

Slide 33

Slide 33 text

Testing Code • Test suite takes around 25 minutes usually • “Stuck” with Hudson (or Jenkins) • Most tightly integrated plugins are geared towards Java developers • Which framework do we use? • unittest(2), nose, doctests, LETTUCE? • We use unittest and nose • Need to report code coverage, speed of tests, pylint (or pyflakes) Sunday, March 13, 2011

Slide 34

Slide 34 text

We Love Python Sunday, March 13, 2011

Slide 35

Slide 35 text

Love-ish • Many of us started with PHP or Rails • Clean syntax, clear standards • All languages need PEP8.py and PyFlakes • Interpreted, fast... enough • Very easy to learn • We all started by learning Django first, then Python Sunday, March 13, 2011

Slide 36

Slide 36 text

Haters Gonna Hate If you could choose one thing in Python to hate on... Sunday, March 13, 2011

Slide 37

Slide 37 text

Better package management Sunday, March 13, 2011

Slide 38

Slide 38 text

What can we do? • Too many forks, too many frameworks • We need less clones, and more combined e ort • Improving existing Python solutions • More Python solutions for existing products Sunday, March 13, 2011

Slide 39

Slide 39 text

Python Rocks! Sunday, March 13, 2011

Slide 40

Slide 40 text

DISQUS Questions? psst, we’re hiring [email protected] Sunday, March 13, 2011

Slide 41

Slide 41 text

References • Sentry (our exception tracking tool) http://github.com/dcramer/django-sentry • Gargoyle (feature switches) https://github.com/disqus/gargoyle • Django DB Utils (collection of db helpers for Django) https://github.com/disqus/django-db-utils • Jenkins CI http://jenkins-ci.org/ code.disqus.com Sunday, March 13, 2011