Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned from Scaling Infrastructure as Code

Lessons Learned from Scaling Infrastructure as Code

Originally presented at Summer Systems @Scale 2022.

You adopted an infrastructure as code tool like Terraform. What started as one person writing some configuration and deploying new infrastructure scales to everyone in the company writing their own infrastructure configuration and deploying their own systems. In this talk, we’ll share some of the lessons learned across the Terraform community when scaling infrastructure as code practices from one team to an entire company and its users. We’ll cover the patterns and practices that help address challenges of updating infrastructure, managing infrastructure modules, maintaining security, streamlining cost, and even upgrading and migrating tools.

Be8b596c46f4c9a1aec6a7586af33134?s=128

Rosemary Wang

June 29, 2022
Tweet

More Decks by Rosemary Wang

Other Decks in Technology

Transcript

  1. Copyright © 2020 HashiCorp June 29, 2022 Lessons Learned from

    Scaling Infrastructure as Code
  2. We started using __________. (Insert infrastructure as code tool.)

  3. Other teams started using it too. Now we have _____________.

    (Insert problem here.)
  4. Developer Advocate at HashiCorp 
 she/her 
 
 @joatmon08 joatmon08.github.io

    Rosemary Wang
  5. Delivery Development Security Cost

  6. 01 Development Lessons Learned / Challenges

  7. Problem: Changes break infrastructure.

  8. Empower every team to responsibly contribute to infrastructure.

  9. Standardize infrastructure resources. TERMINAL module "boundary" { depends_on = [module.vpc]

    source = "joatmon08/boundary/aws" version = "0.2.0" vpc_id = module.vpc.vpc_id vpc_cidr_block = module.vpc.vpc_cidr_block public_subnet_ids = module.vpc.public_subnets private_subnet_ids = module.vpc.database_subnets name = var.name key_pair_name = var.key_pair_name allow_cidr_blocks_to_workers = var.client_cidr_block allow_cidr_blocks_to_api = ["0.0.0.0/0"] boundary_db_password = random_password.boundary_database.result }
  10. Infrastructure Modules 1. Set opinionated defaults. 2. Allow collaboration (pull

    requests). 3. Apply a testing strategy. 
 unit, contract, & integration tests 4. Version modules.
  11. Infrastructure Configuration 1. Minimize blast radius of changes. 2. Use

    immutability to roll forward. 
 terraform apply -target, terraform taint 3. Use version control when possible.
  12. Note: If you use a monorepo, make sure your build

    tool can handle it.
  13. Decouple dependencies. TERMINAL data "aws_eks_cluster" "cluster" { name = var.aws_eks_cluster_id

    == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } data "aws_eks_cluster_auth" "cluster" { name = var.aws_eks_cluster_id == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } provider "kubernetes" { host = data.aws_eks_cluster.cluster.endpoin t cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data ) token = data.aws_eks_cluster_auth.cluster.toke n experiments { manifest_resource = tru e } }
  14. Dependency Injection 1. Reference outputs stored in state 2. Use

    the infrastructure API 3. Store and reference from configuration manager
  15. Note: Multiple infrastructure providers or environments will require additional abstraction

    or automation.
  16. 02 Delivery Lessons Learned / Challenges

  17. Problem: It takes too long to deploy changes.

  18. Minimize time from commit to production.

  19. terraform.io/plugin

  20. Download modules and plugins. github.com/hashicorp/go- getter TERMINAL > terraform ini

    t Initializing the backend.. . Initializing provider plugins.. . - terraform.io/builtin/terraform is built in to Terrafor m - Reusing previous version of hashicorp/aws from the dependency lock fil e - Installing hashicorp/aws v4.15.0.. . - Installed hashicorp/aws v4.15.0 (signed by HashiCorp ) - Installing hashicorp/boundary v1.0.6.. . - Installed hashicorp/boundary v1.0.6 (signed by HashiCorp )
  21. 1. Use internal artifact repository. 2. Cache providers & modules

    on local filesystem. 
 git submodule add terraform.io/language/providers/requirements#in-house-providers
  22. Refresh state. Reads information from infrastructure API. > terraform appl

    y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR
  23. Apply changes. Create, read, update, and delete resources with infrastructure

    API. > terraform appl y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR
  24. > terraform graph terraform.io/internals/graph

  25. 1. Enable concurrent operations. 
 terraform apply -parallelism=n 2. Tune

    infrastructure API (rate limiting). 
 create, read, update, delete 3. Modularize into fewer resources. 
 faster refresh & apply, fewer state locking conflicts
  26. Note: Infrastructure usually has a manual approval step.

  27. 03 Security Lessons Learned / Challenges

  28. Problem: Misconfiguration of infrastructure could compromise security.

  29. Use infrastructure as code to enforce security.

  30. CODE EDITOR import "tfplan/v2" as tfplan database_only_has_non_permissive_firewall_rules = rule {

    all database_firewall_rules as firewall_rule { firewall_rule.values.start_ip_address is not "0.0.0.0" and firewall_rule.values.end_ip_address is not "255.255.255.255" } } resources_with_tag_field_have_defined_tags = rule { all resources_with_tag_field as resource { resource.values.tags is not null } }
  31. 1. Standardize tests for static analysis of IaC. 
 secure

    standards & defaults 2. Enforce changes through IaC. 
 terraform apply -target, terraform taint 3. Enable dynamic analysis of infrastructure. 
 drift detection, automated reconciliation
  32. Note: Control access to infrastructure. Store secrets outside of IaC.

  33. 04 Cost Lessons Learned / Challenges

  34. Problem: We could be more efficient with our infrastructure.

  35. Apply cost management techniques to infrastructure as code.

  36. Commit changes. Run unit tests. Run integration tests. Estimate cost.

    Deploy changes. ✓ Security ✓ Cost compliance test_cpu_size_less_than_or_equal_to_32()
  37. Cost Compliance 1. Enforce tags. 
 expiration date, standard tagging

    2. Implement reboot schedule. 3. Set resource type, size, or reservation. 4. Check autoscaling enabled.
  38. Note: Testing in production can eliminate some development environments.

  39. Delivery Development Security Cost

  40. Thank You Rosemary Wang @joatmon08 joatmon08.github.io