Operations Concepts & Tools

INTRO TO IT OPERATIONS CONCEPTS & TOOLS MICHAEL DEHAAN NCSU
SPRING 2017

WHY • Familiarity with how your code is deployed and
managed in production leads to better code • Better understanding of performance and failure modes • It’s good to be friends with the ops team (inverse: even more true!) • Increasing shifts in shared responsibility, sometimes filed under overloaded umbrella-term “DevOps”.

THINGS TO COVER • Classic vs Microservices Architectures • IaaS
/ Cloud APIs and Tools • Configuration Management / Immutable Systems • Monitoring / Log Collection • Load Balancers / Update Strategies • Backup / Disaster Recovery • Security Policy • Continuous Integration / Continuous Deployment

“TYPICAL” WEB ARCHITECTURE Load Balancer wwww1 www2 www3 db1 db2
message bus job1 job2

MICROSERVICES “ARCHITECTURES”

IN THE BEGINNING • In the beginning (and still a
lot today), software installs were largely run by systems administrators writing their own custom scripts • These scripts grew unmaintainable over time • Scripts could fail • Much of install processes were not fully automated even if some scripts existed • Upgrades were a frequent cause of widespread system failure

IAAS / CLOUD • Misleading assumption that Cloud services (ex:
Amazon, GCE) are primarily about renting IP addresses • ALSO: storage, databases, load balancers, firewalls/security, messaging, etc • Cloud topology control examples: CloudFormation (AWS), Terraform (generic) • Cloud API examples: Boto (AWS Python) • CLI Tools

CONFIGURATION MANAGEMENT • Declarative description of what should be on
a system • “Idempotence” & the GPS Analogy: F(x) = F(F(x)) • Typically “push” or “pull” based • Designed around Pull: Puppet, Chef • Designed around Push: Ansible

IMMUTABLE SYSTEMS • Alternative strategy to configuration management • New
images replace old images, rather than upgrading systems in place • Increases reliability and potentially decreases upgrade times • Cannot be as easily applied to stateful servers (databases, etc) • Can slow down development process • Image building: Packer, docker • Image management: EC2, Mesos/Kubernetes

MONITORING • On-site: • Graphite, Ganglia, Nagios, Cacti, Munin •
Hosted / Off-site: • Newrelic • Alerting vs trending • Application Performance Management (APM): • AppDynamics

LOG COLLECTION/SEARCH • Off-site: • Splunk • SumoLogic • Loggly
• On-site: • Logstash / “ELK Stack”

LOAD BALANCERS & AUTO SCALING • Typically more than one
instance of a service is deployed • Routes requests between services • Closely related: auto-scaling groups • Warming up problems and solutions • TV show voting example

BACKUP / DISASTER RECOVERY • You must be able to
restore everything from backup • Minimize number/types of data sources • If backups are not tested they do not exist • Understanding multi-region and multi-datacenter

HIDDEN MANAGEMENT COMPLEXITY • As you add management software, the
management software often needs management • Be aware what happens when you lose a shard or key server • Some software upgrades “weird” • Holes in bucket: This software requires zookeeper, which requires etcd, …

UPDATE STRATEGIES • Outages • Rolling updates • “Red/green, blue/green,
whatever” updates

SECURITY POLICY • As the number of teams engaged in
“self-service” type deployments happen… • Security scans increasingly need to happen at build-time • Consistency is mandatory • Code-review checks need to be in-place and not simply rubber-stamps

CONTINUOUS INTEGRATION • Automatically build code when checked-in • Ideally:
run unit tests as part of build step. • Typically: Jenkins. Also Travis/CI, CircleCI, Teamcity, Bamboo, others. • Dangers of inconsistent build job rules.

CONTINUOUS DEPLOYMENT • Can’t get here overnight - This is
a spectrum. • First requires full automation of a deploy, and a solid C.I. setup • When C.I. completes at least deploy to stage and run functional tests • Next step: if FTs pass, consider a deploy to prod

ADDITIONAL RESOURCES • Unfortunately, moves fast. • Latest tech, but
advice of varying quality: • news.ycombinator.com • Reddit.com/r/devops • Reddit.com/r/sysadmin

Operations Concepts & Tools

Operations Concepts & Tools

Michael DeHaan

More Decks by Michael DeHaan

Other Decks in Programming

Featured

Transcript

INTRO TO IT OPERATIONS CONCEPTS & TOOLS MICHAEL DEHAAN NCSU

WHY • Familiarity with how your code is deployed and

THINGS TO COVER • Classic vs Microservices Architectures • IaaS

“TYPICAL” WEB ARCHITECTURE Load Balancer wwww1 www2 www3 db1 db2

MICROSERVICES “ARCHITECTURES”

IN THE BEGINNING • In the beginning (and still a

IAAS / CLOUD • Misleading assumption that Cloud services (ex:

CONFIGURATION MANAGEMENT • Declarative description of what should be on

IMMUTABLE SYSTEMS • Alternative strategy to configuration management • New

MONITORING • On-site: • Graphite, Ganglia, Nagios, Cacti, Munin •

LOG COLLECTION/SEARCH • Off-site: • Splunk • SumoLogic • Loggly

LOAD BALANCERS & AUTO SCALING • Typically more than one

BACKUP / DISASTER RECOVERY • You must be able to

HIDDEN MANAGEMENT COMPLEXITY • As you add management software, the

UPDATE STRATEGIES • Outages • Rolling updates • “Red/green, blue/green,

SECURITY POLICY • As the number of teams engaged in

CONTINUOUS INTEGRATION • Automatically build code when checked-in • Ideally:

CONTINUOUS DEPLOYMENT • Can’t get here overnight - This is

ADDITIONAL RESOURCES • Unfortunately, moves fast. • Latest tech, but