Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I Got A Lot Of Problems With Infrastructure-As-...

Avatar for maxvt maxvt
August 03, 2017

I Got A Lot Of Problems With Infrastructure-As-Code And Now You're Gonna Hear About It

Infrastructure as code (IaC). There’s an O’Reilly book about it, and it’s considered “a key attribute of enabling best practices of DevOps” by Wikipedia, so it must be awesome, mustn’t it? However, our experience at PagerDuty of moving towards infrastructure as code has not been smooth. In fact, there has been plenty of grumbling and pushing away from the everything-as-code approach. So is IaC the best thing since sliced bread or an abstract idea that does not survive the challenges of making day-to-day operations faster?

This talk examines some of the principles and arguments for IaC, and shows where these principles and benefits can fall short based on the evolution of IaC at PagerDuty. It explores several alternatives and seeds some ideas about the possible future directions of Infrastructure as Code. Audience members will leave with a more nuanced, less buzzwordy understanding of the rationale for IaC and the ways in which the principles of IaC could be applied to ship things faster while still allowing fine-grained control.

Avatar for maxvt

maxvt

August 03, 2017
Tweet

More Decks by maxvt

Other Decks in Technology

Transcript

  1. @maxvt CfgMgmtCamp PDX ‘17 I got a lot of problems

    with Infrastructure-as-Code 
 and now you’re gonna 
 hear about it MAX TIMCHENKO SRE
  2. @maxvt CfgMgmtCamp PDX ‘17 Agenda 2 aka “the plan for

    the airing of grievances” Promises of
 Infra-as-Code Problems of
 Infra-as-Code What is 
 Infra-as-Code? What can we 
 do about it?
  3. @maxvt CfgMgmtCamp PDX ‘17 3 A Long, Long Time Ago…

    Common Tasks Software Computers Networking & More Image: UNIVAC 1108, https://en.wikipedia.org/wiki/File:UnivacII.jpg Computers per operator: 10-1 to 101
  4. @maxvt CfgMgmtCamp PDX ‘17 4 Pets to Cattle: We Are

    Here Common Tasks Software Computers Chef, Puppet, SaltStack, and Ansible logos Computers per DevOps engineer: 102 to 103 Networking & More
  5. @maxvt CfgMgmtCamp PDX ‘17 5 Cattle to Bacteria: We Want

    This Common Tasks Software Computers Networking & More Image: http://www.thinkgeek.com/product/1cb0/ Computers per infrastructure engineer: 103 to 106 and beyond
  6. @maxvt CfgMgmtCamp PDX ‘17 !wtf “Infrastructure as code” “IaC is

    the process of managing and provisioning computing, network, and other infrastructure through machine-readable definition files.” 6 Definitions: https://en.wikipedia.org/wiki/Infrastructure_as_Code, https://martinfowler.com/bliki/InfrastructureAsCode.html
  7. @maxvt CfgMgmtCamp PDX ‘17 Business Case for IaC 8 Faster

    Less Risky Cheaper Better
 Availability
  8. @maxvt CfgMgmtCamp PDX ‘17 11 • Repeatable, human-free process •

    All changes reviewed • No snowflakes • Provider independence IaC Reduces Risk
  9. @maxvt CfgMgmtCamp PDX ‘17 12 • Faster MTTR from incidents

    • Automated remediation • Blue/Green deploy models • Parallel change models IaC Improves Availability %
  10. @maxvt CfgMgmtCamp PDX ‘17 “Infrastructure is just code” 13 Use

    the same workflow as used for (modern) software development Now
 supports AWS
  11. @maxvt CfgMgmtCamp PDX ‘17 14 © 2008 Focus Shift /

    OSNews / Thom Holwerda. http://www.osnews.com/comics
  12. @maxvt CfgMgmtCamp PDX ‘17 15 commit 8a4394e... Author: <redacted> Date:

    Tue Feb 2 13:24:25 2016 -0500 plz halp i am bad at compooter how do i berksfile.lock? diff --git a/Berksfile.lock b/Berksfile.lock
  13. @maxvt CfgMgmtCamp PDX ‘17 “Infrastructure is just code” 16 Deployment

    pipeline Now
 supports AWS • Unit and system testing • Continuous integration • Small change deployments • Deployment automation
  14. @maxvt CfgMgmtCamp PDX ‘17 “Even developers can easily engage in

    the activities, because they can easily write infrastructure code in the languages that they are familiar with. In addition to this, the learning curve for most descriptive languages used by tools like ansible is not very steep. This makes devops even simpler for a developer.” 20 https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile
  15. @maxvt CfgMgmtCamp PDX ‘17 “IaC in familiar languages” 22 1.

    Java 2. C/C++ 3. Python 4. C# 5. VB.net PHP . . . Ruby . . . Golang 7. 10. 14. boto,
 Ansible Chef, Puppet,
 SparkleForm
  16. @maxvt CfgMgmtCamp PDX ‘17 23 We’ll program our
 infra in

    YAML! YAML is not a
 programming language!
  17. @maxvt CfgMgmtCamp PDX ‘17 24 We’ll program our
 infra in

    HCL! HCL is not a programming
 language either! {{ range $dc := datacenters }}{{ range service (printf "%s.hostgroupname@%s" (env "SERVICE_TAG") $dc) }} {{ if ne $dc $this_dc }} server {{.Node}} {{.Address}}:{{.Port}} check inter 5s backup{{end}}{{end}}{{end}}
  18. @maxvt CfgMgmtCamp PDX ‘17 “Even developers can easily engage in

    the activities, because they can easily write infrastructure code in the languages that they are familiar with. In addition to this, the learning curve for most descriptive languages used by tools like ansible is not very steep. This makes devops even simpler for a developer.” 26 https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile
  19. @maxvt CfgMgmtCamp PDX ‘17 “Even developers can easily engage in

    the activities, because they can easily write infrastructure code in the languages that they are familiar with. In addition to this, the learning curve for most descriptive languages used by tools like ansible is not very steep. This makes devops even simpler for a developer.” 27 https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile
  20. @maxvt CfgMgmtCamp PDX ‘17 “Even developers can easily engage in

    the activities, because they can easily write infrastructure code in the languages that they are familiar with. In addition to this, the learning curve for most descriptive languages used by tools like ansible is not very steep. This makes devops even simpler for a developer.” 28 https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile
  21. @maxvt CfgMgmtCamp PDX ‘17 “Even developers can easily engage in

    the activities, because they can easily write infrastructure code in the languages that they are familiar with. In addition to this, the learning curve for most descriptive languages used by tools like ansible is not very steep. This makes devops even simpler for a developer.” 29 https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile
  22. @maxvt CfgMgmtCamp PDX ‘17 Problems of IaC 30 Risk Availability

    Cost Speed Infrastructure expressed as Code Complexity of IaC Fidelity of Versioning Deployment Processes
  23. @maxvt CfgMgmtCamp PDX ‘17 32 Boilerplate Chef Terraform 2 AWS

    S3 buckets 193 lines of code with ACL (IAM) (not including 
 cross-account roles) 1 simple service (install and cron) 166 lines of code
  24. @maxvt CfgMgmtCamp PDX ‘17 33 Infrastructure abstraction https://www.terraform.io/intro/use-cases.html resource "azurerm_virtual_machine"

    "host" { location = "West US" vm_size = "Standard_A0" storage_image_reference { sku = "gentoo-linux" }
 … resource "aws_instance" "host" { ami = “ami-badc0ffee" availability_zone = "ca-central-1a" instance_type = "t2.medium" associate_public_ip_address = true ipv6_address_count = "1" … “Terraform is cloud-agnostic”, i.e. it supports multiple clouds.
  25. @maxvt CfgMgmtCamp PDX ‘17 34 Abstract resource resource "virtual_machine" "host"

    { base_os = "16.04" region = "us-west" instance_size = "medium" associate_public_ip_address = true with_ipv6 = true … An abstract resource can be instantiated on any compatible cloud.
  26. @maxvt CfgMgmtCamp PDX ‘17 35 Infra Independence is Bespoke instance_size

    = "medium" AWS m3.xlarge: 4 vCPU, 30 GB RAM, 2x80 GB SSD Azure vCPU RAM SSD D4_v3 4 16 32 D8_v3 8 32 64 D3_v2 4 14 200 A6 4 28 285
  27. @maxvt CfgMgmtCamp PDX ‘17 36 Infra Independence is Bespoke instance_size

    = "medium" AWS m3.xlarge: 4 vCPU, 30 GB RAM, 2x80 GB SSD ACU is Azure Compute Units Azure vCPU RAM SSD ACU D4_v3 4 16 32 160-190 D8_v3 8 32 64 160-190 D3_v2 4 14 200 210-250 A6 4 28 285 50-100
  28. @maxvt CfgMgmtCamp PDX ‘17 37 Abstraction in Terraform Terraform: 17

    different Go types for AWS tags. Abstraction story: write your own abstract providers in Golang against the TF plugin API. https://github.com/hashicorp/terraform/pull/14321/files
  29. @maxvt CfgMgmtCamp PDX ‘17 38 • New language, new concepts

    • Boilerplate did not go away • High-level docs still needed • For now, remains provider specific • Complex logic is harder to express Code-related Problems: Recap
  30. @maxvt CfgMgmtCamp PDX ‘17 40 • Learning curve can be

    steep • More code, more deploy processes • Another layer of indirection on top of infrastructure APIs, with its own bugs • Infrastructure for IaC • Fine-grained access controls a big pain Complexity of IaC
  31. @maxvt CfgMgmtCamp PDX ‘17 42 AWS Limits on IAM entities

    • 10 policies per group, hard limit • limited policies/user, default 10 • limited policies/role, default 10 • 2k user policy size • 5k group/managed policy size
  32. @maxvt CfgMgmtCamp PDX ‘17 43 Which IAM capabilities are needed?

    • ”…non-trivial problem to solve due to the many and varied capabilities of IAM across all of the AWS services and the difficulty of predicting the right granularity for a policy. Therefore we (the Terraform team) are not planning to move forward with any specific feature in this area…” • “…it’s often hard to determine from the Terraform documentation exactly what actions are being executed for a given resource, and thus know how to map what's in the AWS documentation…” https://github.com/hashicorp/terraform/issues/2834
  33. @maxvt CfgMgmtCamp PDX ‘17 44 • The promise of easy

    rollback is only achievable if everything is versioned. Versioning in IaC FROM ubuntu:latest sudo apt-get update && sudo apt-get upgrade -y gem “cucumber”, “>0.1.0”
  34. @maxvt CfgMgmtCamp PDX ‘17 45 If you use IaC to

    merely wrap processes that do not guarantee repeatability, you do not gain repeatability by using IaC. Versioning in IaC
  35. @maxvt CfgMgmtCamp PDX ‘17 47 • Single repository for infrastructure

    • Multiple small repositories for infra • Infra code in service code repositories IaC Deployment: Code Location
  36. @maxvt CfgMgmtCamp PDX ‘17 48 IaC Deployment: Code Location IaC

    Monorepo Multiple
 IaC repos IaC in 
 Service repos SRE ++ + Developers ++ Testing + Rollout + + Permissions ++ + OpenSourcing + +
  37. @maxvt CfgMgmtCamp PDX ‘17 50 PR, CI unit test for

    IaC, CI unit test for service, merge, pull, CD system test, refresh Terraform state, apply changes… IaC Deployment Processes: Speed “…change management processes are commonly ignored, bypassed, 
 or overruled by people who need 
 to get things done.” From “Infrastructure as Code”, Kief Morris, O’Reilly
  38. @maxvt CfgMgmtCamp PDX ‘17 51 IaC Deployment Processes: Workflow •

    Code repository (GitHub for most of us) • PR / Review / Plan process • Commit / Execution process • Rollbacks? • State management? Backups? • High availability?
  39. @maxvt CfgMgmtCamp PDX ‘17 53 • Give developers the power

    • Make it fast(er) • Automated infrastructure tests • Audit trail and context of changes • Reproducible past versions • Recovery from catastrophic incidents • Isolated development environments as a service • Reuse best practices / infrastructure design patterns Focus on the goals of Infra as Code
  40. @maxvt CfgMgmtCamp PDX ‘17 55 My problem with Infrastructure as

    Code Code is not 
 one of the goals but it’s right there in the name
  41. @maxvt CfgMgmtCamp PDX ‘17 56 My problem with Infrastructure as

    Code Code is not 
 one of the goals but it’s right there in the name and this constrains our thinking.
  42. @maxvt CfgMgmtCamp PDX ‘17 57 “Code” thinking “…code should be

    written to describe the desired state” “Human readable documentation can be generated from code” “they can easily write infrastructure code” “Done correctly, the scripts can be run on any cloud”
  43. @maxvt CfgMgmtCamp PDX ‘17 58 What if there is no

    code? • Can default behaviour
 be good enough? • Can code be relegated 
 to an expert feature? • Can we experiment
 easily and safely?
  44. @maxvt CfgMgmtCamp PDX ‘17 59 • Nobody wants to write

    IaC code • Simplified UI or text spec that expands into IaC code behind the scenes • WYSIWYG, validation, fast feedback. 
 vs. Office Code as an expert feature
  45. @maxvt CfgMgmtCamp PDX ‘17 61 • Modern, full featured language,

    not YAML or template based • Modularization, packaging, versioning, well-known central public repository… But if we must code…
  46. @maxvt CfgMgmtCamp PDX ‘17 65 • Making complexity “expert” helps

    • Teach the underlying concepts • More “full-service” solutions • Must make fine-grained ACLs easier • https://github.com/Clever/terrafam, perhaps? Addressing Complexity
  47. @maxvt CfgMgmtCamp PDX ‘17 66 • “Inactive” instead of “deleted”

    • Immutable snapshots for base OS, or make the host OS not matter • Immutable snapshots for services • Containers help a lot here Versioning and Rollback
  48. @maxvt CfgMgmtCamp PDX ‘17 67 • Need best practices for

    medium+ orgs • Single repo is easiest to start with • Tool (CI/CD) support for repo views and extracting IaC parts from service repos • More CI/CD workflows for IaC Deployment Processes
  49. @maxvt CfgMgmtCamp PDX ‘17 68 • Terraform, because multi-provider •

    Disposable interviewing environments • Disposable sales/support setups • AWS resource automation • VMs are harder for us (custom tooling) PagerDuty and Managed Infrastructure
  50. @maxvt CfgMgmtCamp PDX ‘17 Thank you 70 @maxvt https://www.pagerduty.com/careers and

    thanks to everyone working on open-source managed infrastructure tools building managed infrastructure makes you go ?