Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Configuration Management is an Antipattern

Configuration Management is an Antipattern

Slides from SCaLE 15x, Pasadena, CA

Jonah Horowitz

March 05, 2017
Tweet

More Decks by Jonah Horowitz

Other Decks in Technology

Transcript

  1. @jonahhorowitz
    Configuration Management
    is an anti-pattern
    @jonahhorowitz

    View Slide

  2. @jonahhorowitz
    [email protected]$ cvs update website
    [email protected]$ tar zcvf website.tar.gz website
    [email protected]$ scp website.tar.gz [email protected]:/
    var/something/
    [email protected]$ ssh [email protected]
    server1# cd /var/something
    server1# mv website website-`date`
    server1# tar zxf website.tar.gz
    server1# /etc/init.d/website restart
    server1# ^D
    … Rinse, Repeat …

    View Slide

  3. @jonahhorowitz
    #!/bin/bash
    BOX=$1
    NEWCODE=$2
    scp $NEWCODE [email protected]$BOX:/var/something/
    ssh [email protected]$BOX “(cd /var/something ; tar zxf $NEWCODE ; /etc/init.d/tomcat restart)

    View Slide

  4. @jonahhorowitz
    #!/bin/bash
    BOX=$1
    NEWCODE=$2
    scp $NEWCODE [email protected]$BOX:/var/something/
    ssh [email protected]$BOX “(cd /var/something ; tar zxf $NEWCODE ; /etc/init.d/tomcat restart)
    [email protected]$ cvs update website
    [email protected]$ tar zcvf website.tar.gz website
    [email protected]$ for box in `cat serverlist\boxen.txt` ; do \
    tools/update-code.sh $box website.tar.gz
    done

    View Slide

  5. @jonahhorowitz
    Server Install Process (2001)
    • Install server in rack
    • Use Mandrake Linux CD to install OS
    • Run through long manual configuration checklist
    - some of which was eventually scripted
    • Push latest code (using the earlier script)
    • Add to load balancer

    View Slide

  6. @jonahhorowitz
    Server Install Process (2012+)
    • Launch new Amazon AMI
    • Use the current version of Amazon Linux
    • Run through long manual configuration checklist
    - some of which was eventually scripted
    • Push latest code (using the earlier script)
    • Add to ELB

    View Slide

  7. @jonahhorowitz
    So, who am I?
    Jonah Horowitz
    Site Reliability Engineer
    Soon to be at Stripe
    [email protected]

    View Slide

  8. @jonahhorowitz
    Automating
    Linux and
    Unix
    Nate Campi
    Kirk Bauer

    View Slide

  9. @jonahhorowitz
    CFEngine (2.x) was great... for its time
    Before CFEngine
    • Time to provision a
    new server: 1 Day
    • Chance a mistake was
    made: 50/50
    • Percentage of fleet
    we understood: 70

    View Slide

  10. @jonahhorowitz
    CFEngine (2.x) was great... for its time
    Before CFEngine
    • Time to provision a
    new server: 1 Day
    • Chance a mistake was
    made: 50/50
    • Percentage of fleet
    we understood: 70
    After CFEngine 2
    • Time to provision a
    new server: 1 hour
    • Chance a mistake was
    made: 1%
    • Percentage of fleet
    we understood: 99

    View Slide

  11. @jonahhorowitz

    View Slide

  12. @jonahhorowitz
    Puppet Chef
    Salt
    Ansible
    RedHat

    View Slide

  13. @jonahhorowitz
    What sucks about
    Config Management?

    View Slide

  14. @jonahhorowitz
    What sucks about Config Management?

    View Slide

  15. @jonahhorowitz
    Bad Option #1
    Ops owns all
    configuration
    management
    What sucks about Config Management?

    View Slide

  16. @jonahhorowitz
    Bad Option #1
    Ops owns all
    configuration
    management
    What sucks about Config Management?
    Bad Option #2
    Ops doesn’t own all
    configuration
    management

    View Slide

  17. @jonahhorowitz
    Broken/Buggy/Out-of-Sync
    Deployments

    View Slide

  18. @jonahhorowitz
    Broken/Buggy/Out-of-Sync
    Deployments
    That one server…

    View Slide

  19. @jonahhorowitz
    Release Engineering
    Still Sucks

    View Slide

  20. @jonahhorowitz
    What’s the
    alternative?

    View Slide

  21. @jonahhorowitz
    What’s the
    alternative?

    View Slide

  22. @jonahhorowitz
    What’s the
    alternative?

    View Slide

  23. @jonahhorowitz
    Let’s walk through
    that again, slowly

    View Slide

  24. @jonahhorowitz
    • Base or Foundation AMI
    • Security patches
    • Infrastructure Packages (monitoring,
    logging, etc)

    View Slide

  25. @jonahhorowitz
    • Your application package and its
    dependencies

    View Slide

  26. @jonahhorowitz
    Tools Required
    • Package Build System (Gradle)
    • Image Build System (Aminator/Bakery/Docker/Packer)
    • Deployment System (Spinnaker/Terraform/
    CloudFormation)
    • Service Discovery (Eureka/Zookeeper/ELBs/DNS?/Swarm/
    Kubernetes)
    • Dynamic Configuration (Feature Flags/Fast Properties)

    View Slide

  27. @jonahhorowitz
    Benefits

    View Slide

  28. @jonahhorowitz
    Benefits
    • Simpler Operations

    View Slide

  29. @jonahhorowitz
    Benefits
    • Continuous Deployments

    View Slide

  30. @jonahhorowitz
    Benefits
    • Faster startup times
    • Horizontal/Auto-scaling
    • Instance Failure
    • Chaos Monkey
    • Cloud Reboots

    View Slide

  31. @jonahhorowitz
    Benefits
    • Configuration in-sync / no “cruft” /
    always a known state

    View Slide

  32. @jonahhorowitz
    Benefits
    • Same application code in Dev/Test/Prod

    View Slide

  33. @jonahhorowitz
    Benefits
    • Easier to respond to security threats

    View Slide

  34. @jonahhorowitz
    Benefits
    • Multi-region operations

    View Slide

  35. @jonahhorowitz
    Benefits
    • That one server… sticks out like a sore
    thumb

    View Slide

  36. @jonahhorowitz
    Release Strategies
    • Rolling Release
    • Blue/Green Releases

    View Slide

  37. @jonahhorowitz
    Caveats

    View Slide

  38. @jonahhorowitz
    Oh, that database
    thing…

    View Slide

  39. @jonahhorowitz
    Jonah Horowitz
    Site Reliability Engineer
    @jonahhorowitz
    [email protected]
    https://jonahhorowitz.com/

    View Slide