Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Test in Production: Infrastructure Edition

Be8b596c46f4c9a1aec6a7586af33134?s=47 Rosemary Wang
November 04, 2019

Test in Production: Infrastructure Edition

Originally presented at Test in Production (Berlin) MeetUp on November 4, 2019.

How do we feature toggle, canary test, and AB test on production infrastructure? In this talk, we'll give some brief examples on each approach.

CORRECTIONS

- Initial example for talk briefly showed feature toggle for network. Updated example has a more clear framing of what the feature toggle looks like.
- Updated example for canary explicitly calls out a "green" VPC, as denoted in diagram.

Be8b596c46f4c9a1aec6a7586af33134?s=128

Rosemary Wang

November 04, 2019
Tweet

More Decks by Rosemary Wang

Other Decks in Technology

Transcript

  1. Copyright © 2019 HashiCorp Test in Production: Infrastructure Edition Test

    in Production Berlin | Nov. 4, 2019 Rosemary Wang | @joatmon08
  2. Network engineers live dangerously.

  3. My users = developers ▪ Push and deliver code at

    any time ▪ Availability of application depends on system
  4. How do we change infrastructure without impacting applications?*

  5. * While not spending extra money on staging.

  6. Approaches ▪ Shift-left testing (e.g., staging) ▪ Feature Toggles ▪

    Canary Testing ▪ A/B Testing
  7. Feature Toggles ▪ Preserve state, if possible ▪ Inject with

    roll forward mindset ▪ Don’t write toggles at the start
  8. Feature Toggles CODE EDITOR resource "aws_instance" "example_bionic" { count =

    var.enable_new_ami ? 1 : 0 instance_type = "t2.micro" ami = data.aws_ami.ubuntu_bionic.id vpc_security_group_ids = [aws_security_group.instances.1.id] subnet_id = aws_subnet.public.1.id tags = { Terraform = "true" Owner = var.owner Has_Toggle = var.enable_new_ami } }
  9. Canary Testing ▪ Smoke test before release ▪ Easier with

    container architectures – e.g., VM images for Kubernetes worker nodes
  10. Canary Testing CODE EDITOR resource "aws_instance" "canary" { count =

    var.enable_new_network ? 1 : 0 instance_type = "t2.micro" ami = data.aws_ami.ubuntu.id vpc_security_group_ids = [aws_security_group.instances _green.id] subnet_id = aws_subnet.public_green.id tags = { Name = "${var.prefix}-canary" Owner = var.owner } }
  11. VPC (blue) 10.128.0.0/24 VPC (green) 10.128.0.0/28 APP APP APP APP

    KITCHEN INSTANCE APP APP CANARY CAN I CONNECT?
  12. Kubernetes Control Plane Kubernetes Node Group (Insecure OS) Kubernetes Node

    Group (Secure OS) INTERNAL EXTERNAL EXTERNAL EXTERNAL EXTERNAL INTERNAL INTERNAL kubectl taint nodes external=true:NoExecute
  13. A/B Testing ▪ Infrastructure that affect upstream Service Level Objectives

    ▪ Hypotheses: –Does X batch process more quickly than Y? –Does X cost more than Y?
  14. Kafka FaaS “Data Lake” versus Kafka Spark “Data Lake” APPLICATION

    APPLICATION APPLICATION APPLICATION Does Spark + Kafka architecture process faster with lower cost?
  15. Conclusions ▪ Test in production organizes infrastructure blast radius ▪

    Risk mitigation over risk aversion ▪ “Infrastructure-as-Code” is heuristic
  16. github.com/joatmon08/test-in-production-for-infrastructure Rosemary Wang (she/her) Developer Advocate at HashiCorp @joatmon08 joatmon08

    linkedin.com/in/rosemarywang/ 16