Configuration Management is an Antipattern

Configuration Management is an Antipattern

Slides from SCaLE 15x, Pasadena, CA

8f906b03ddbe28e1044a392f560e6da6?s=128

Jonah Horowitz

March 05, 2017
Tweet

Transcript

  1. @jonahhorowitz Configuration Management is an anti-pattern @jonahhorowitz

  2. @jonahhorowitz jonah@laptop$ cvs update website jonah@laptop$ tar zcvf website.tar.gz website

    jonah@laptop$ scp website.tar.gz root@server1:/ var/something/ jonah@laptop$ ssh root@server1 server1# cd /var/something server1# mv website website-`date` server1# tar zxf website.tar.gz server1# /etc/init.d/website restart server1# ^D … Rinse, Repeat …
  3. @jonahhorowitz #!/bin/bash BOX=$1 NEWCODE=$2 scp $NEWCODE root@$BOX:/var/something/ ssh root@$BOX “(cd

    /var/something ; tar zxf $NEWCODE ; /etc/init.d/tomcat restart)
  4. @jonahhorowitz #!/bin/bash BOX=$1 NEWCODE=$2 scp $NEWCODE root@$BOX:/var/something/ ssh root@$BOX “(cd

    /var/something ; tar zxf $NEWCODE ; /etc/init.d/tomcat restart) jonah@laptop$ cvs update website jonah@laptop$ tar zcvf website.tar.gz website jonah@laptop$ for box in `cat serverlist\boxen.txt` ; do \ tools/update-code.sh $box website.tar.gz done
  5. @jonahhorowitz Server Install Process (2001) • Install server in rack

    • Use Mandrake Linux CD to install OS • Run through long manual configuration checklist - some of which was eventually scripted • Push latest code (using the earlier script) • Add to load balancer
  6. @jonahhorowitz Server Install Process (2012+) • Launch new Amazon AMI

    • Use the current version of Amazon Linux • Run through long manual configuration checklist - some of which was eventually scripted • Push latest code (using the earlier script) • Add to ELB
  7. @jonahhorowitz So, who am I? Jonah Horowitz Site Reliability Engineer

    Soon to be at Stripe incoming@jonahhorowitz.com
  8. @jonahhorowitz Automating Linux and Unix Nate Campi Kirk Bauer

  9. @jonahhorowitz CFEngine (2.x) was great... for its time Before CFEngine

    • Time to provision a new server: 1 Day • Chance a mistake was made: 50/50 • Percentage of fleet we understood: 70
  10. @jonahhorowitz CFEngine (2.x) was great... for its time Before CFEngine

    • Time to provision a new server: 1 Day • Chance a mistake was made: 50/50 • Percentage of fleet we understood: 70 After CFEngine 2 • Time to provision a new server: 1 hour • Chance a mistake was made: 1% • Percentage of fleet we understood: 99
  11. @jonahhorowitz

  12. @jonahhorowitz Puppet Chef Salt Ansible RedHat

  13. @jonahhorowitz What sucks about Config Management?

  14. @jonahhorowitz What sucks about Config Management?

  15. @jonahhorowitz Bad Option #1 Ops owns all configuration management What

    sucks about Config Management?
  16. @jonahhorowitz Bad Option #1 Ops owns all configuration management What

    sucks about Config Management? Bad Option #2 Ops doesn’t own all configuration management
  17. @jonahhorowitz Broken/Buggy/Out-of-Sync Deployments

  18. @jonahhorowitz Broken/Buggy/Out-of-Sync Deployments That one server…

  19. @jonahhorowitz Release Engineering Still Sucks

  20. @jonahhorowitz What’s the alternative?

  21. @jonahhorowitz What’s the alternative?

  22. @jonahhorowitz What’s the alternative?

  23. @jonahhorowitz Let’s walk through that again, slowly

  24. @jonahhorowitz • Base or Foundation AMI • Security patches •

    Infrastructure Packages (monitoring, logging, etc)
  25. @jonahhorowitz • Your application package and its dependencies

  26. @jonahhorowitz Tools Required • Package Build System (Gradle) • Image

    Build System (Aminator/Bakery/Docker/Packer) • Deployment System (Spinnaker/Terraform/ CloudFormation) • Service Discovery (Eureka/Zookeeper/ELBs/DNS?/Swarm/ Kubernetes) • Dynamic Configuration (Feature Flags/Fast Properties)
  27. @jonahhorowitz Benefits

  28. @jonahhorowitz Benefits • Simpler Operations

  29. @jonahhorowitz Benefits • Continuous Deployments

  30. @jonahhorowitz Benefits • Faster startup times • Horizontal/Auto-scaling • Instance

    Failure • Chaos Monkey • Cloud Reboots
  31. @jonahhorowitz Benefits • Configuration in-sync / no “cruft” / always

    a known state
  32. @jonahhorowitz Benefits • Same application code in Dev/Test/Prod

  33. @jonahhorowitz Benefits • Easier to respond to security threats

  34. @jonahhorowitz Benefits • Multi-region operations

  35. @jonahhorowitz Benefits • That one server… sticks out like a

    sore thumb
  36. @jonahhorowitz Release Strategies • Rolling Release • Blue/Green Releases

  37. @jonahhorowitz Caveats

  38. @jonahhorowitz Oh, that database thing…

  39. @jonahhorowitz Jonah Horowitz Site Reliability Engineer @jonahhorowitz incoming@jonahhorowitz.com https://jonahhorowitz.com/ ⃠