Slide 1

Slide 1 text

DISQUS Practicing Continuous Deployment David Cramer @zeeg Saturday, March 10, 12

Slide 2

Slide 2 text

Shipping new code as soon as it’s ready Saturday, March 10, 12

Slide 3

Slide 3 text

Continuous Deployment # Update the site every 5 minutes */5 * * * * cd /www/example.com \ && git pull \ && service apache restart Saturday, March 10, 12

Slide 4

Slide 4 text

Saturday, March 10, 12

Slide 5

Slide 5 text

When it’s ready Saturday, March 10, 12

Slide 6

Slide 6 text

When is it ready? - Reviewed by peers - Passes automated tests - Some level of QA Saturday, March 10, 12

Slide 7

Slide 7 text

Focus on Stability and Iteration Saturday, March 10, 12

Slide 8

Slide 8 text

Workflow Review Integration Deploy Failed Build Reporting Rollback Commit Saturday, March 10, 12

Slide 9

Slide 9 text

- Develop features incrementally - Release frequently - Smaller doses of QA - Culture Shock - Stability depends on test coverage - Initial time investment The Good The Bad Saturday, March 10, 12

Slide 10

Slide 10 text

Keep Development Simple Saturday, March 10, 12

Slide 11

Slide 11 text

Development - Automate testing of complicated processes and architecture - Simple can be better than complete - Especially for local development - python setup.py {develop,test} - Puppet, Chef, Buildout, Fabric, etc. Saturday, March 10, 12

Slide 12

Slide 12 text

Production Staging CI Server Macbook PostgreSQL Memcache Redis Solr Apache Nginx RabbitMQ PostgreSQL Memcache Redis Solr Apache Nginx RabbitMQ PostgreSQL Memcache Redis Solr Apache Nginx RabbitMQ PostgreSQL Apache Memcache Redis Solr Nginx RabbitMQ (and 100 other painful-to-configure services) Saturday, March 10, 12

Slide 13

Slide 13 text

Bootstrapping Local - Simplify local setup - git clone dcramer@disqus:disqus.git - make - python manage.py runserver - Need to test dependancies? - virtualbox + vagrant up Saturday, March 10, 12

Slide 14

Slide 14 text

Progressive Rollout We actively use early versions of features before public release Saturday, March 10, 12

Slide 15

Slide 15 text

Deploy features to portions of a user base at a time to ensure smooth, measurable releases https://github.com/disqus/gargoyle Saturday, March 10, 12

Slide 16

Slide 16 text

from gargoyle import gargoyle def my_view(request): if gargoyle.is_active('awesome', request): return 'new happy version :D' else: return 'old sad version :(' • Iterate quickly by hiding features • Early adopters are free QA Saturday, March 10, 12

Slide 17

Slide 17 text

SWITCHES = { # enable my_feature for 50% 'my_feature': range(0, 50), } def is_active(switch): try: pct_range = SWITCHES[switch] except KeyError: return False ip_hash = sum([int(x) for x in ip_address.split('.')]) return (ip_hash % 100 in pct_range) Saturday, March 10, 12

Slide 18

Slide 18 text

Review ALL the Commits phabricator.org Saturday, March 10, 12

Slide 19

Slide 19 text

Saturday, March 10, 12

Slide 20

Slide 20 text

Saturday, March 10, 12

Slide 21

Slide 21 text

Saturday, March 10, 12

Slide 22

Slide 22 text

Integration (or as we like to call it) Saturday, March 10, 12

Slide 23

Slide 23 text

Saturday, March 10, 12

Slide 24

Slide 24 text

Integration Requirements - Developers must know when they’ve broken something - IRC, Email, IM - Support proper reporting - XUnit, Pylint, Coverage.py - Painless setup - apt-get install jenkins * https://wiki.jenkins-ci.org/display/JENKINS/Installing+Jenkins+on+Ubuntu Saturday, March 10, 12

Slide 25

Slide 25 text

Shortcomings - False positives - Reporting isn't accurate - Services fail - Bad Tests - Test coverage - Regressions on untested code - Feedback delay - Integration tests vs Unit tests Saturday, March 10, 12

Slide 26

Slide 26 text

Fixing False Positives - Re-run tests several times on a failure - Report continually failing tests - Replace external service tests with a functional test suite Saturday, March 10, 12

Slide 27

Slide 27 text

Maintaining Coverage - Raise awareness with reporting - Fail/alert when coverage drops on a build - Commit tests with code - Coverage against commit di for untested regressions - Utilize code review Saturday, March 10, 12

Slide 28

Slide 28 text

Speeding Up Tests - Write true unit tests - vs slower integration tests - Mock external services - Distributed and parallel testing - Matrix builds Saturday, March 10, 12

Slide 29

Slide 29 text

Reporting Saturday, March 10, 12

Slide 30

Slide 30 text

Why is mongodb-1 down? It’s down? Must have crashed again Saturday, March 10, 12

Slide 31

Slide 31 text

Meaningful Metrics - Rate of tra c (not just hits!) - Business vs system - Response time (database, web) - Exceptions - Social media - Twitter Saturday, March 10, 12

Slide 32

Slide 32 text

Graphite (Tra c across a cluster of servers) graphite.wikidot.com Saturday, March 10, 12

Slide 33

Slide 33 text

sentry.readthedocs.org Sentry Saturday, March 10, 12

Slide 34

Slide 34 text

Wrap Up Saturday, March 10, 12

Slide 35

Slide 35 text

Getting Started - Package your app - Value code review - Ease deployment; fast rollbacks - Setup automated tests - Gather some easy metrics Saturday, March 10, 12

Slide 36

Slide 36 text

Going Further - Build an immune system - Automate deploys, rollbacks (maybe) - Adjust to your culture - There is no “right way” - SOA == great success Saturday, March 10, 12

Slide 37

Slide 37 text

DISQUS Questions? psst, we’re hiring disqus.com/jobs Saturday, March 10, 12

Slide 38

Slide 38 text

References - Gargoyle (feature switches) https://github.com/disqus/gargoyle - Sentry (log aggregation) https://github.com/dcramer/sentry - Jenkins CI (continuous integration) http://jenkins-ci.org/ - Phabricator (code reviews, bug tracking) https://phabricator.org - Graphite (metrics) http://graphite.wikidot.com/ code.disqus.com Saturday, March 10, 12