Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons Learned from Scaling Infrastructure as Code

Lessons Learned from Scaling Infrastructure as Code

Originally presented at Summer Systems @Scale 2022.

You adopted an infrastructure as code tool like Terraform. What started as one person writing some configuration and deploying new infrastructure scales to everyone in the company writing their own infrastructure configuration and deploying their own systems. In this talk, we’ll share some of the lessons learned across the Terraform community when scaling infrastructure as code practices from one team to an entire company and its users. We’ll cover the patterns and practices that help address challenges of updating infrastructure, managing infrastructure modules, maintaining security, streamlining cost, and even upgrading and migrating tools.

Rosemary Wang

June 29, 2022
Tweet

More Decks by Rosemary Wang

Other Decks in Technology

Transcript

  1. Copyright © 2020 HashiCorp
    June 29, 2022
    Lessons Learned
    from Scaling
    Infrastructure as
    Code

    View full-size slide

  2. We started using __________.


    (Insert infrastructure as code tool.)

    View full-size slide

  3. Other teams started using it too.


    Now we have _____________.


    (Insert problem here.)

    View full-size slide

  4. Developer Advocate at HashiCorp

    she/her


    @joatmon08


    joatmon08.github.io
    Rosemary Wang

    View full-size slide

  5. Delivery
    Development
    Security Cost

    View full-size slide

  6. 01
    Development
    Lessons Learned / Challenges

    View full-size slide

  7. Problem:


    Changes break infrastructure.

    View full-size slide

  8. Empower every team to
    responsibly contribute
    to infrastructure.

    View full-size slide

  9. Standardize infrastructure resources.
    TERMINAL
    module "boundary" {
    depends_on = [module.vpc]
    source = "joatmon08/boundary/aws"
    version = "0.2.0"
    vpc_id = module.vpc.vpc_id
    vpc_cidr_block = module.vpc.vpc_cidr_block
    public_subnet_ids = module.vpc.public_subnets
    private_subnet_ids = module.vpc.database_subnets
    name = var.name
    key_pair_name = var.key_pair_name
    allow_cidr_blocks_to_workers = var.client_cidr_block
    allow_cidr_blocks_to_api = ["0.0.0.0/0"]
    boundary_db_password =
    random_password.boundary_database.result
    }

    View full-size slide

  10. Infrastructure Modules


    1. Set opinionated defaults.


    2. Allow collaboration (pull requests).


    3. Apply a testing strategy.

    unit, contract, & integration tests


    4. Version modules.

    View full-size slide

  11. Infrastructure Configuration


    1. Minimize blast radius of changes.


    2. Use immutability to roll forward.

    terraform apply -target, terraform taint


    3. Use version control when possible.

    View full-size slide

  12. Note:


    If you use a monorepo,
    make sure your build tool
    can handle it.

    View full-size slide

  13. Decouple dependencies.
    TERMINAL
    data "aws_eks_cluster" "cluster"
    {

    name = var.aws_eks_cluster_id == "" ?
    data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i
    d

    }

    data "aws_eks_cluster_auth" "cluster"
    {

    name = var.aws_eks_cluster_id == "" ?
    data.terraform_remote_state.infrastructure.outputs.eks_cluster_id : var.aws_eks_cluster_i
    d

    }

    provider "kubernetes"
    {

    host = data.aws_eks_cluster.cluster.endpoin
    t

    cluster_ca_certificate =
    base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data
    )

    token = data.aws_eks_cluster_auth.cluster.toke
    n

    experiments
    {

    manifest_resource = tru
    e

    }

    }

    View full-size slide

  14. Dependency Injection


    1. Reference outputs stored in state


    2. Use the infrastructure API


    3. Store and reference from configuration
    manager

    View full-size slide

  15. Note:


    Multiple infrastructure providers
    or environments will require
    additional abstraction or
    automation.

    View full-size slide

  16. 02
    Delivery
    Lessons Learned / Challenges

    View full-size slide

  17. Problem:


    It takes too long to deploy
    changes.

    View full-size slide

  18. Minimize time from
    commit to production.

    View full-size slide

  19. terraform.io/plugin

    View full-size slide

  20. Download
    modules
    and plugins.


    github.com/hashicorp/go-
    getter
    TERMINAL
    > terraform ini
    t

    Initializing the backend..
    .

    Initializing provider plugins..
    .

    - terraform.io/builtin/terraform is built in to
    Terrafor
    m

    - Reusing previous version of hashicorp/aws
    from the dependency lock fil
    e

    - Installing hashicorp/aws v4.15.0..
    .

    - Installed hashicorp/aws v4.15.0 (signed by
    HashiCorp
    )

    - Installing hashicorp/boundary v1.0.6..
    .

    - Installed hashicorp/boundary v1.0.6 (signed
    by HashiCorp
    )

    View full-size slide

  21. 1. Use internal artifact repository.


    2. Cache providers & modules on local
    filesystem.

    git submodule add
    terraform.io/language/providers/requirements#in-house-providers

    View full-size slide

  22. Refresh
    state.


    Reads information from
    infrastructure API.
    > terraform appl
    y

    module.hcp.data.aws_region.current:
    Reading..
    .

    module.hcp.data.aws_region.current: Read
    complete after 0s [id=us-west-2
    ]

    module.vpc.aws_eip.nat[0]: Refreshing
    state... [id=eipalloc-REDACTED
    ]

    ## omitted for clarit
    y

    Plan: 105 to add, 0 to change, 0 to
    destroy
    .

    CODE EDITOR

    View full-size slide

  23. Apply
    changes.


    Create, read, update, and
    delete resources with
    infrastructure API.
    > terraform appl
    y

    module.hcp.data.aws_region.current:
    Reading..
    .

    module.hcp.data.aws_region.current: Read
    complete after 0s [id=us-west-2
    ]

    module.vpc.aws_eip.nat[0]: Refreshing
    state... [id=eipalloc-REDACTED
    ]

    ## omitted for clarit
    y

    Plan: 105 to add, 0 to change, 0 to
    destroy
    .

    CODE EDITOR

    View full-size slide

  24. > terraform
    graph
    terraform.io/internals/graph

    View full-size slide

  25. 1. Enable concurrent operations.

    terraform apply -parallelism=n


    2. Tune infrastructure API (rate limiting).

    create, read, update, delete


    3. Modularize into fewer resources.

    faster refresh & apply, fewer state locking
    conflicts

    View full-size slide

  26. Note:


    Infrastructure usually has
    a manual approval step.

    View full-size slide

  27. 03
    Security
    Lessons Learned / Challenges

    View full-size slide

  28. Problem:


    Misconfiguration of infrastructure
    could compromise security.

    View full-size slide

  29. Use infrastructure as
    code to enforce
    security.

    View full-size slide

  30. CODE EDITOR
    import "tfplan/v2" as tfplan
    database_only_has_non_permissive_firewall_rules = rule {
    all database_firewall_rules as firewall_rule {
    firewall_rule.values.start_ip_address is not "0.0.0.0" and
    firewall_rule.values.end_ip_address is not
    "255.255.255.255"
    }
    }
    resources_with_tag_field_have_defined_tags = rule {
    all resources_with_tag_field as resource {
    resource.values.tags is not null
    }
    }

    View full-size slide

  31. 1. Standardize tests for static analysis of IaC.

    secure standards & defaults


    2. Enforce changes through IaC.

    terraform apply -target, terraform taint


    3. Enable dynamic analysis of infrastructure.

    drift detection, automated reconciliation

    View full-size slide

  32. Note:


    Control access to
    infrastructure.


    Store secrets outside of IaC.

    View full-size slide

  33. 04
    Cost
    Lessons Learned / Challenges

    View full-size slide

  34. Problem:


    We could be more efficient with
    our infrastructure.

    View full-size slide

  35. Apply cost management
    techniques to
    infrastructure as code.

    View full-size slide

  36. Commit
    changes.
    Run unit
    tests.
    Run
    integration
    tests.
    Estimate
    cost.
    Deploy
    changes.
    ✓ Security


    ✓ Cost compliance
    test_cpu_size_less_than_or_equal_to_32()

    View full-size slide

  37. Cost Compliance


    1. Enforce tags.

    expiration date, standard tagging


    2. Implement reboot schedule.


    3. Set resource type, size, or reservation.


    4. Check autoscaling enabled.

    View full-size slide

  38. Note:


    Testing in production can
    eliminate some
    development environments.

    View full-size slide

  39. Delivery
    Development
    Security Cost

    View full-size slide

  40. Thank You


    Rosemary Wang


    @joatmon08


    joatmon08.github.io

    View full-size slide