Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing Your Cloud Infrastructure (as Code)

Testing Your Cloud Infrastructure (as Code)

Presented at Odyssey 2021.

You update your infrastructure modules or configurations and quickly push production changes. Meanwhile, you’re secretly hoping they all work but dreading the alert that something has gone wrong. Is there a way to alleviate some of the dread and catch errors before production?

In this talk, Rosemary will apply the test pyramid to testing cloud infrastructure and show how some types of tests can flag potential misconfigurations to infrastructure as code before pushing to production. She'll also compare the financial and time cost of running each test type. By the end of the session, you’ll leave with a better understanding of when to write tests for your cloud infrastructure, how to manage the cost of running tests, and which tests can benefit your changes.

Be8b596c46f4c9a1aec6a7586af33134?s=128

Rosemary Wang

March 31, 2021
Tweet

More Decks by Rosemary Wang

Other Decks in Programming

Transcript

  1. Copyright © 2020 HashiCorp Testing your Cloud Infrastructure (as Code)

    Rosemary Wang (@joatmon08) OdysseyConf | March 31, 2021 1
  2. Migrate to public cloud infrastructure. @JOATMON08 2

  3. Infrastructure as Code Declare the configuration you want and store

    it in source control. Immutable Create a new resource for every update, delete the old resource. On-Demand Create and delete resources when you need them (or don’t). 3 @JOATMON08
  4. @JOATMON08 4 We wrote some tests for our cloud infrastructure.

    Testing environments cost us 70% of our cloud provider bill. We pushed a change that affected production. We deactivated tests and deleted the testing environments
  5. Rosemary Wang (She/Her) Developer Advocate at HashiCorp @joatmon08 joatmon08.github.io 5

  6. @JOATMON08 6 Infrastructure tests to catch problems before production. Cost

    of environments for running tests.
  7. @JOATMON08 7 Integration Tests Contract Tests Unit Tests Cost* (Time,

    $$$) End-to-End Tests
  8. Unit Tests 8

  9. Unit Tests Test configuration or state metadata ▪ Lint and

    check syntax ▪ Any testing framework or language that parses data formats – Programming languages – Behavior-driven development (BDD) – Other tools (e.g., CloudFormation linter, HashiCorp Sentinel, terraform- compliance, Helm unittest plugin) 9 @JOATMON08
  10. @JOATMON08 10 You write some configuration. INFRASTRUCTURE AS CODE CONFIGURATION

    Tool uses configuration to generate changes to infrastructure state. LIST OF CHANGES TO INFRASTRUCTURE STATE UNIT TESTS PARSE CONFIGURATION OR LIST OF CHANGES.
  11. Unit Tests Cost assessment ▪ Worth writing a few… –

    Communicate expectations, including security – No infrastructure required ▪ Do not… – Work offline (i.e., retrieve remote state) – Prevent system failure 11 @JOATMON08
  12. Contract Tests 12

  13. Contract Tests Test configuration inputs and outputs ▪ Validate inputs

    or outputs – Password requirements – Special values (e.g., port ranges, constants) ▪ Public cloud APIs constantly change ▪ Check infrastructure dependencies ▪ Any testing framework or language for validation 13 @JOATMON08
  14. CODE EDITOR variable "listener_rule_priority" { type = number default =

    1 description = "Priority of listener rule between 1 to 50000" validation { condition = var.listener_rule_priority > 0 && var.listener_rule_priority < 50000 error_message = "The priority of listener rule must between 1 to 50000." } } @JOATMON08 14
  15. Contract Tests Cost assessment ▪ Worth writing a few… –

    (Usually) no infrastructure required – Fail fast for invalid input values ▪ Do not… – Work offline (i.e., retrieve remote state) – Prevent system failure 15 @JOATMON08
  16. Integration Tests 16

  17. Integration Tests Test if the configuration creates infrastructure ▪ Can

    you even apply the changes to infrastructure? ▪ Verify dependencies with real infrastructure ▪ Any testing framework or language with setup, test, teardown – Integration testing frameworks (e.g., TaskCat, kitchen, terratest, goss, Inspec) – Programming language – Add to delivery pipeline 17 @JOATMON08
  18. @JOATMON08 18 You push some configuration to source control. INFRASTRUCTURE

    AS CODE CONFIGURATION Pipeline sets up resources in testing environment. Pipeline tears down resources in testing environment. INTEGRATION TESTS
  19. CODE EDITOR @pytest.fixture(scope='session') def apply_changes(): generate_json(TEST_SERVER_NAME) assert os.path.exists(SERVER_CONFIGURATION_FILE) assert test_utils.initialize()

    == 0 yield test_utils.apply() assert test_utils.destroy() == 0 os.remove(SERVER_CONFIGURATION_FILE) def test_changes_should_add_1_resource(apply_changes): output = apply_changes[1].decode(encoding='utf-8').split('\n') assert 'Apply complete! Resources: 1 added, 0 changed, 0 destroyed' \ in output[-2] def test_server_is_in_running_state(apply_changes): gcp_server = test_utils.get_server(TEST_SERVER_NAME) assert gcp_server.state == NodeState.RUNNING 
 @JOATMON08 19 SETUP TEARDOWN TEST
  20. Integration Tests Cost assessment ▪ Worth writing… – Test dependencies

    – Build confidence in “successful” changes ▪ Do not… – Eliminate cost of resources – Speed up feedback loop 20 @JOATMON08
  21. End-to-End Tests 21

  22. End-to-End Tests Test if infrastructure supports the workflows ▪ Can

    you create or access resources? ▪ Verify end-to-end functionality ▪ Run after you apply changes in testing / production ▪ Includes smoke tests ▪ Any testing framework or language with enough access to create or read resources 22 @JOATMON08
  23. CODE EDITOR $ kitchen test -----> Starting Kitchen (v2.3.3) …

    Profile: End-to-End Tests for Application (default) ✔ db: Database: check routing from public to private subnet ✔ Host 10.128.0.43 port 27017 proto tcp should be reachable ✔ Host 10.128.0.43 port 27017 proto tcp should be resolvable ✔ Host 10.128.0.43 port 80 proto tcp should not be reachable ✔ outbound: Public Subnet: check routing out to public internet ✔ HTTP GET on https://hashicorp.com status should cmp == 301 Profile Summary: 2 successful controls, 0 control failures, 0 controls skipped Test Summary: 3 successful, 0 failures, 0 skipped @JOATMON08 23
  24. End-to-End Tests Cost assessment ▪ Worth writing… – Test functionality

    – Check system works before “release” ▪ Do not… – Eliminate cost of resources – Speed up feedback loop 24 @JOATMON08
  25. Look at our cloud bill! 25

  26. @JOATMON08 26 Integration Tests Contract Tests Unit Tests Cost* (Time,

    $$$) End-to-End Tests REDUCE ERRORS IN CONFIGURATION RUN ON EXISTING INFRASTRUCTURE
  27. Use Cheaper Resources ✅ Smaller size ✅ Shorter lifecycle ❌

    Not always accurate ❌ Drift between testing & production Use Infrastructure API Mocks ✅ No infrastructure ✅ Can run offline ❌ Dependencies? ❌ Drift between mock & actual APIs Delete Long-Lived Environments ✅ Elasticity ✅ Shorter lifecycle ❌ Confidence? ❌ Time spent creating environments 27 @JOATMON08
  28. How much is really from testing? Use resource tagging to

    answer this question. @JOATMON08 28
  29. Resource Tagging Identify why you have them in the first

    place ▪ Environment (testing/production) ▪ Test Type (integration/end-to-end) ▪ Repository (joatmon08/terraform-aws-listenerrule-nia) ▪ Teardown (true/false) 29 @JOATMON08
  30. AWS Listener Rule Integration Tests (02/2021) $15.12 1 Application Load

    Balancer us-east-1, 672 hours $0.00 1 Elastic IP us-east-1 $0.82 1 EC2 Instance us-east-1, t2.micro, 72 hours $15.94 Total us-east-1 @JOATMON08 30
  31. Solution: Use elasticity! Create and delete resources before and after

    testing. 31 @JOATMON08
  32. Problem: Assumes immutability. You might make some changes in-place. 32

    @JOATMON08
  33. Reality: Some long-lived resources. e.g., networking, databases, Kubernetes control planes

    33 @JOATMON08
  34. Time to Change Shorter = Setup & Teardown Number of

    Dependencies Fewer = Setup & Teardown Frequency of Change Less = Setup & Teardown Statefulness Less = Setup & Teardown 34
  35. @JOATMON08 35 We write some tests for our cloud infrastructure.

    We assess the cost based on tagging. We push the change to production. We can replace the long-lived resources for this set of tests.
  36. References Infrastructure testing is a heuristic. ▪ github.com/joatmon08/tdd-infrastructure ▪ puppet.com/blog/hitchhikers-guide-to-testing-infrastructure-as-and-code

    ▪ github.com/joatmon08/terraform-aws-listenerrule-nia ▪ hashicorp.com/resources/testing-your-hcl-modules-in-terraform 36 @JOATMON08
  37. Thank you! Rosemary Wang @joatmon08 joatmon08.github.io 37