Slide 1

Slide 1 text

Copyright © 2020 HashiCorp June 29, 2022 Lessons Learned from Scaling Infrastructure as Code

Slide 2

Slide 2 text

We started using __________. (Insert infrastructure as code tool.)

Slide 3

Slide 3 text

Other teams started using it too. Now we have _____________. (Insert problem here.)

Slide 4

Slide 4 text

Developer Advocate at HashiCorp 
 she/her 
 
 @joatmon08 joatmon08.github.io Rosemary Wang

Slide 5

Slide 5 text

Delivery Development Security Cost

Slide 6

Slide 6 text

01 Development Lessons Learned / Challenges

Slide 7

Slide 7 text

Problem: Changes break infrastructure.

Slide 8

Slide 8 text

Empower every team to responsibly contribute to infrastructure.

Slide 9

Slide 9 text

Standardize infrastructure resources. TERMINAL module "boundary" { depends_on = [module.vpc] source = "joatmon08/boundary/aws" version = "0.2.0" vpc_id = module.vpc.vpc_id vpc_cidr_block = module.vpc.vpc_cidr_block public_subnet_ids = module.vpc.public_subnets private_subnet_ids = module.vpc.database_subnets name = var.name key_pair_name = var.key_pair_name allow_cidr_blocks_to_workers = var.client_cidr_block allow_cidr_blocks_to_api = ["0.0.0.0/0"] boundary_db_password = random_password.boundary_database.result }

Slide 10

Slide 10 text

Infrastructure Modules 1. Set opinionated defaults. 2. Allow collaboration (pull requests). 3. Apply a testing strategy. 
 unit, contract, & integration tests 4. Version modules.

Slide 11

Slide 11 text

Infrastructure Configuration 1. Minimize blast radius of changes. 2. Use immutability to roll forward. 
 terraform apply -target, terraform taint 3. Use version control when possible.

Slide 12

Slide 12 text

Note: If you use a monorepo, make sure your build tool can handle it.

Slide 13

Slide 13 text

Decouple dependencies. TERMINAL data "aws_eks_cluster" "cluster" { name = var.aws_eks_cluster_id == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } data "aws_eks_cluster_auth" "cluster" { name = var.aws_eks_cluster_id == "" ? data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i d } provider "kubernetes" { host = data.aws_eks_cluster.cluster.endpoin t cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data ) token = data.aws_eks_cluster_auth.cluster.toke n experiments { manifest_resource = tru e } }

Slide 14

Slide 14 text

Dependency Injection 1. Reference outputs stored in state 2. Use the infrastructure API 3. Store and reference from configuration manager

Slide 15

Slide 15 text

Note: Multiple infrastructure providers or environments will require additional abstraction or automation.

Slide 16

Slide 16 text

02 Delivery Lessons Learned / Challenges

Slide 17

Slide 17 text

Problem: It takes too long to deploy changes.

Slide 18

Slide 18 text

Minimize time from commit to production.

Slide 19

Slide 19 text

terraform.io/plugin

Slide 20

Slide 20 text

Download modules and plugins. github.com/hashicorp/go- getter TERMINAL > terraform ini t Initializing the backend.. . Initializing provider plugins.. . - terraform.io/builtin/terraform is built in to Terrafor m - Reusing previous version of hashicorp/aws from the dependency lock fil e - Installing hashicorp/aws v4.15.0.. . - Installed hashicorp/aws v4.15.0 (signed by HashiCorp ) - Installing hashicorp/boundary v1.0.6.. . - Installed hashicorp/boundary v1.0.6 (signed by HashiCorp )

Slide 21

Slide 21 text

1. Use internal artifact repository. 2. Cache providers & modules on local filesystem. 
 git submodule add terraform.io/language/providers/requirements#in-house-providers

Slide 22

Slide 22 text

Refresh state. Reads information from infrastructure API. > terraform appl y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR

Slide 23

Slide 23 text

Apply changes. Create, read, update, and delete resources with infrastructure API. > terraform appl y module.hcp.data.aws_region.current: Reading.. . module.hcp.data.aws_region.current: Read complete after 0s [id=us-west-2 ] module.vpc.aws_eip.nat[0]: Refreshing state... [id=eipalloc-REDACTED ] ## omitted for clarit y Plan: 105 to add, 0 to change, 0 to destroy . CODE EDITOR

Slide 24

Slide 24 text

> terraform graph terraform.io/internals/graph

Slide 25

Slide 25 text

1. Enable concurrent operations. 
 terraform apply -parallelism=n 2. Tune infrastructure API (rate limiting). 
 create, read, update, delete 3. Modularize into fewer resources. 
 faster refresh & apply, fewer state locking conflicts

Slide 26

Slide 26 text

Note: Infrastructure usually has a manual approval step.

Slide 27

Slide 27 text

03 Security Lessons Learned / Challenges

Slide 28

Slide 28 text

Problem: Misconfiguration of infrastructure could compromise security.

Slide 29

Slide 29 text

Use infrastructure as code to enforce security.

Slide 30

Slide 30 text

CODE EDITOR import "tfplan/v2" as tfplan database_only_has_non_permissive_firewall_rules = rule { all database_firewall_rules as firewall_rule { firewall_rule.values.start_ip_address is not "0.0.0.0" and firewall_rule.values.end_ip_address is not "255.255.255.255" } } resources_with_tag_field_have_defined_tags = rule { all resources_with_tag_field as resource { resource.values.tags is not null } }

Slide 31

Slide 31 text

1. Standardize tests for static analysis of IaC. 
 secure standards & defaults 2. Enforce changes through IaC. 
 terraform apply -target, terraform taint 3. Enable dynamic analysis of infrastructure. 
 drift detection, automated reconciliation

Slide 32

Slide 32 text

Note: Control access to infrastructure. Store secrets outside of IaC.

Slide 33

Slide 33 text

04 Cost Lessons Learned / Challenges

Slide 34

Slide 34 text

Problem: We could be more efficient with our infrastructure.

Slide 35

Slide 35 text

Apply cost management techniques to infrastructure as code.

Slide 36

Slide 36 text

Commit changes. Run unit tests. Run integration tests. Estimate cost. Deploy changes. ✓ Security ✓ Cost compliance test_cpu_size_less_than_or_equal_to_32()

Slide 37

Slide 37 text

Cost Compliance 1. Enforce tags. 
 expiration date, standard tagging 2. Implement reboot schedule. 3. Set resource type, size, or reservation. 4. Check autoscaling enabled.

Slide 38

Slide 38 text

Note: Testing in production can eliminate some development environments.

Slide 39

Slide 39 text

Delivery Development Security Cost

Slide 40

Slide 40 text

Thank You Rosemary Wang @joatmon08 joatmon08.github.io