Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Monkey for Fun and Profit

Chaos Monkey for Fun and Profit

The core idea of Chaos Engineering is to inject failures proactively in a controlled manner in order to gain confidence in our systems. Automating fault injection with tools like Chaos Monkey represents one of the advanced principles of Chaos Engineering. In this talk, Mathias is going to show you how to run your very own Chaos Monkey with Docker, and how to use it for both automated and manual fault injection.

(Talk given at Chaos Engineering Hamburg meetup: http://www.meetup.com/Chaos-Engineering-Hamburg/events/231567152/)

2190d7a468f51fa3be5eabfc9397a28b?s=128

Mathias Lafeldt

June 15, 2016
Tweet

Transcript

  1. Chaos Monkey for Fun and Profit 1

  2. Hello I'm Mathias Lafeldt. @mlafeldt 2

  3. Chaos Engineering 101 • Trigger failures before they happen in

    production • Gain confidence that our systems can withstand failures • Verify that things behave as we expect • Fix them if they don't • Netflix: http://principlesofchaos.org/ 3
  4. I work at Jimdo We ❤ Chaos Engineering 4

  5. GameDays at Jimdo 1. Gather the team in front of

    a big screen 2. Think up failure modes and estimate expected impact 3. Go through chaos experiments together 4. Write down measured impact 5. Create follow-up ticket for each flaw 5
  6. You don't need to automate chaos experiments to run continuously

    6
  7. But: Automating experiments may further increase confidence in your systems

    7
  8. 8

  9. Chaos Monkey • Most famous member of Netflix's Simian Army

    • Randomly terminates EC2 instances during business hours • Goal: survive terminations without customer impact • Change frequency, probability, and type of terminations 9
  10. Quick start $ git clone https://github.com/Netflix/SimianArmy . $ ./gradlew build

    # Java 8 $ vim src/main/resources/{client,simianarmy,chaos}.properties $ ./gradlew jettyRun 10
  11. Configuration ! $ grep -c simianarmy src/main/resources/*.properties src/main/resources/chaos.properties:40 src/main/resources/client.properties:18 src/main/resources/conformity.properties:27

    src/main/resources/janitor.properties:55 src/main/resources/log4j.properties:0 src/main/resources/simianarmy.properties:13 src/main/resources/volumeTagging.properties:4 11
  12. docker pull mlafeldt/simianarmy github.com/mlafeldt/docker-simianarmy 12

  13. Unleash the monkey! $ docker run -it --rm \ -e

    SIMIANARMY_CLIENT_AWS_ACCOUNTKEY=$AWS_ACCESS_KEY_ID \ -e SIMIANARMY_CLIENT_AWS_SECRETKEY=$AWS_SECRET_ACCESS_KEY \ -e SIMIANARMY_CLIENT_AWS_REGION=$AWS_REGION \ -e SIMIANARMY_CALENDAR_ISMONKEYTIME=true \ -e SIMIANARMY_CHAOS_ASG_ENABLED=true \ -e SIMIANARMY_CHAOS_LEASHED=false \ mlafeldt/simianarmy 13
  14. Configuration via etcd $ export ETCDCTL_ENDPOINT=http://$YOUR_ETCD_IP:2379 $ etcdctl set /simianarmy/client/aws/accountkey

    $AWS_ACCESS_KEY_ID $ etcdctl set /simianarmy/client/aws/secretkey $AWS_SECRET_KEY $ etcdctl set ... $ docker run -it --rm \ -e CONFD_OPTS="-backend=etcd -node=$ETCDCTL_ENDPOINT" \ mlafeldt/simianarmy # More confd backends: Consul, Vault, DynamoDB, etc. 14
  15. Chaos Monkey REST API /simianarmy/api/v1/chaos 15

  16. On-demand termination during GameDay events or whenever you feel like

    it ! 16
  17. Go client github.com/mlafeldt/chaosmonkey 17

  18. Installation $ go get -u github.com/mlafeldt/chaosmonkey $ chaosmonkey --version chaosmonkey

    v0.3.0 darwin/amd64 go1.6.2 18
  19. Trigger a chaos event $ chaosmonkey -endpoint http://example.com:8080 \ -group

    ExampleAutoScalingGroup \ -strategy ShutdownInstance 19
  20. Get past chaos events $ chaosmonkey -endpoint http://example.com:8080 InstanceID AutoScalingGroupName

    Region Strategy TriggeredAt i-741a78f8 ExampleAutoScalingGroup eu-west-1 DetachVolumes 2016-06-13T14:17:36Z i-c538184f ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T14:11:18Z i-615272eb ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T13:48:33Z 20
  21. List chaos strategies $ chaosmonkey -list-strategies ShutdownInstance BlockAllNetworkTraffic DetachVolumes BurnCpu

    BurnIo KillProcesses NullRoute FailEc2 FailDns FailDynamoDb FailS3 FillDisk NetworkCorruption NetworkLatency NetworkLoss 21
  22. AWS integration $ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=... $ export

    AWS_REGION=... # List auto scaling groups $ chaosmonkey -list-groups # Delete all data from SimpleDB $ chaosmonkey -wipe-state SIMIAN_ARMY 22
  23. Use with Docker $ docker run -it --rm -p 8080:8080

    \ -e SIMIANARMY_CHAOS_LEASHED=false \ -e SIMIANARMY_CHAOS_TERMINATEONDEMAND_ENABLED=true \ ... mlafeldt/simianarmy $ chaosmonkey -endpoint http://$DOCKER_HOST_IP:8080 ... 23
  24. Chaos Monkey in Wonderland • Just another service running on

    Jimdo's PaaS • One monkey per environment (prod/stage) • Auth proxy to protect public API endpoint • /simianarmy/ as HTTP health check • Two replicas behind ELB for high availability • Currently on-demand termination only 24
  25. Tip: Slack notifications 25

  26. Live Chaos Demo Let's destroy something in production! 26

  27. Production Ready newsletter https://tinyletter.com/production-ready Published articles include: Chaos Engineering 101

    Chaos Engineering: A Shift in Mindset Chaos Monkey for Fun and Profit A Little Story about Amazon ECS, systemd, and Chaos Monkey 27
  28. Thank you. https://tinyletter.com/production-ready @mlafeldt 28