Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Operations Concepts & Tools

Operations Concepts & Tools

An intro to some important high-level aspects of code running in production given to some NCSU students -- mostly intended as "these are things you may encounter and may wish to look up" rather than something in-depth. (Not supremely valuable without the accompanying talk but maybe useful, idk)

Michael DeHaan

February 24, 2017

More Decks by Michael DeHaan

Other Decks in Programming


  1. WHY • Familiarity with how your code is deployed and

    managed in production leads to better code • Better understanding of performance and failure modes • It’s good to be friends with the ops team (inverse: even more true!) • Increasing shifts in shared responsibility, sometimes filed under overloaded umbrella-term “DevOps”.
  2. THINGS TO COVER • Classic vs Microservices Architectures • IaaS

    / Cloud APIs and Tools • Configuration Management / Immutable Systems • Monitoring / Log Collection • Load Balancers / Update Strategies • Backup / Disaster Recovery • Security Policy • Continuous Integration / Continuous Deployment
  3. IN THE BEGINNING • In the beginning (and still a

    lot today), software installs were largely run by systems administrators writing their own custom scripts • These scripts grew unmaintainable over time • Scripts could fail • Much of install processes were not fully automated even if some scripts existed • Upgrades were a frequent cause of widespread system failure
  4. IAAS / CLOUD • Misleading assumption that Cloud services (ex:

    Amazon, GCE) are primarily about renting IP addresses • ALSO: storage, databases, load balancers, firewalls/security, messaging, etc • Cloud topology control examples: CloudFormation (AWS), Terraform (generic) • Cloud API examples: Boto (AWS Python) • CLI Tools
  5. CONFIGURATION MANAGEMENT • Declarative description of what should be on

    a system • “Idempotence” & the GPS Analogy: F(x) = F(F(x)) • Typically “push” or “pull” based • Designed around Pull: Puppet, Chef • Designed around Push: Ansible
  6. IMMUTABLE SYSTEMS • Alternative strategy to configuration management • New

    images replace old images, rather than upgrading systems in place • Increases reliability and potentially decreases upgrade times • Cannot be as easily applied to stateful servers (databases, etc) • Can slow down development process • Image building: Packer, docker • Image management: EC2, Mesos/Kubernetes
  7. MONITORING • On-site: • Graphite, Ganglia, Nagios, Cacti, Munin •

    Hosted / Off-site: • Newrelic • Alerting vs trending • Application Performance Management (APM): • AppDynamics
  8. LOAD BALANCERS & AUTO SCALING • Typically more than one

    instance of a service is deployed • Routes requests between services • Closely related: auto-scaling groups • Warming up problems and solutions • TV show voting example
  9. BACKUP / DISASTER RECOVERY • You must be able to

    restore everything from backup • Minimize number/types of data sources • If backups are not tested they do not exist • Understanding multi-region and multi-datacenter
  10. HIDDEN MANAGEMENT COMPLEXITY • As you add management software, the

    management software often needs management • Be aware what happens when you lose a shard or key server • Some software upgrades “weird” • Holes in bucket: This software requires zookeeper, which requires etcd, …
  11. SECURITY POLICY • As the number of teams engaged in

    “self-service” type deployments happen… • Security scans increasingly need to happen at build-time • Consistency is mandatory • Code-review checks need to be in-place and not simply rubber-stamps
  12. CONTINUOUS INTEGRATION • Automatically build code when checked-in • Ideally:

    run unit tests as part of build step. • Typically: Jenkins. Also Travis/CI, CircleCI, Teamcity, Bamboo, others. • Dangers of inconsistent build job rules.
  13. CONTINUOUS DEPLOYMENT • Can’t get here overnight - This is

    a spectrum. • First requires full automation of a deploy, and a solid C.I. setup • When C.I. completes at least deploy to stage and run functional tests • Next step: if FTs pass, consider a deploy to prod
  14. ADDITIONAL RESOURCES • Unfortunately, moves fast. • Latest tech, but

    advice of varying quality: • news.ycombinator.com • Reddit.com/r/devops • Reddit.com/r/sysadmin