Slide 1

Slide 1 text

Chaos Monkey for Fun and Profit 1

Slide 2

Slide 2 text

Hello I'm Mathias Lafeldt. @mlafeldt 2

Slide 3

Slide 3 text

Chaos Engineering 101 • Trigger failures before they happen in production • Gain confidence that our systems can withstand failures • Verify that things behave as we expect • Fix them if they don't • Netflix: http://principlesofchaos.org/ 3

Slide 4

Slide 4 text

I work at Jimdo We ❤ Chaos Engineering 4

Slide 5

Slide 5 text

GameDays at Jimdo 1. Gather the team in front of a big screen 2. Think up failure modes and estimate expected impact 3. Go through chaos experiments together 4. Write down measured impact 5. Create follow-up ticket for each flaw 5

Slide 6

Slide 6 text

You don't need to automate chaos experiments to run continuously 6

Slide 7

Slide 7 text

But: Automating experiments may further increase confidence in your systems 7

Slide 8

Slide 8 text

8

Slide 9

Slide 9 text

Chaos Monkey • Most famous member of Netflix's Simian Army • Randomly terminates EC2 instances during business hours • Goal: survive terminations without customer impact • Change frequency, probability, and type of terminations 9

Slide 10

Slide 10 text

Quick start $ git clone https://github.com/Netflix/SimianArmy . $ ./gradlew build # Java 8 $ vim src/main/resources/{client,simianarmy,chaos}.properties $ ./gradlew jettyRun 10

Slide 11

Slide 11 text

Configuration ! $ grep -c simianarmy src/main/resources/*.properties src/main/resources/chaos.properties:40 src/main/resources/client.properties:18 src/main/resources/conformity.properties:27 src/main/resources/janitor.properties:55 src/main/resources/log4j.properties:0 src/main/resources/simianarmy.properties:13 src/main/resources/volumeTagging.properties:4 11

Slide 12

Slide 12 text

docker pull mlafeldt/simianarmy github.com/mlafeldt/docker-simianarmy 12

Slide 13

Slide 13 text

Unleash the monkey! $ docker run -it --rm \ -e SIMIANARMY_CLIENT_AWS_ACCOUNTKEY=$AWS_ACCESS_KEY_ID \ -e SIMIANARMY_CLIENT_AWS_SECRETKEY=$AWS_SECRET_ACCESS_KEY \ -e SIMIANARMY_CLIENT_AWS_REGION=$AWS_REGION \ -e SIMIANARMY_CALENDAR_ISMONKEYTIME=true \ -e SIMIANARMY_CHAOS_ASG_ENABLED=true \ -e SIMIANARMY_CHAOS_LEASHED=false \ mlafeldt/simianarmy 13

Slide 14

Slide 14 text

Configuration via etcd $ export ETCDCTL_ENDPOINT=http://$YOUR_ETCD_IP:2379 $ etcdctl set /simianarmy/client/aws/accountkey $AWS_ACCESS_KEY_ID $ etcdctl set /simianarmy/client/aws/secretkey $AWS_SECRET_KEY $ etcdctl set ... $ docker run -it --rm \ -e CONFD_OPTS="-backend=etcd -node=$ETCDCTL_ENDPOINT" \ mlafeldt/simianarmy # More confd backends: Consul, Vault, DynamoDB, etc. 14

Slide 15

Slide 15 text

Chaos Monkey REST API /simianarmy/api/v1/chaos 15

Slide 16

Slide 16 text

On-demand termination during GameDay events or whenever you feel like it ! 16

Slide 17

Slide 17 text

Go client github.com/mlafeldt/chaosmonkey 17

Slide 18

Slide 18 text

Installation $ go get -u github.com/mlafeldt/chaosmonkey $ chaosmonkey --version chaosmonkey v0.3.0 darwin/amd64 go1.6.2 18

Slide 19

Slide 19 text

Trigger a chaos event $ chaosmonkey -endpoint http://example.com:8080 \ -group ExampleAutoScalingGroup \ -strategy ShutdownInstance 19

Slide 20

Slide 20 text

Get past chaos events $ chaosmonkey -endpoint http://example.com:8080 InstanceID AutoScalingGroupName Region Strategy TriggeredAt i-741a78f8 ExampleAutoScalingGroup eu-west-1 DetachVolumes 2016-06-13T14:17:36Z i-c538184f ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T14:11:18Z i-615272eb ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T13:48:33Z 20

Slide 21

Slide 21 text

List chaos strategies $ chaosmonkey -list-strategies ShutdownInstance BlockAllNetworkTraffic DetachVolumes BurnCpu BurnIo KillProcesses NullRoute FailEc2 FailDns FailDynamoDb FailS3 FillDisk NetworkCorruption NetworkLatency NetworkLoss 21

Slide 22

Slide 22 text

AWS integration $ export AWS_ACCESS_KEY_ID=... $ export AWS_SECRET_ACCESS_KEY=... $ export AWS_REGION=... # List auto scaling groups $ chaosmonkey -list-groups # Delete all data from SimpleDB $ chaosmonkey -wipe-state SIMIAN_ARMY 22

Slide 23

Slide 23 text

Use with Docker $ docker run -it --rm -p 8080:8080 \ -e SIMIANARMY_CHAOS_LEASHED=false \ -e SIMIANARMY_CHAOS_TERMINATEONDEMAND_ENABLED=true \ ... mlafeldt/simianarmy $ chaosmonkey -endpoint http://$DOCKER_HOST_IP:8080 ... 23

Slide 24

Slide 24 text

Chaos Monkey in Wonderland • Just another service running on Jimdo's PaaS • One monkey per environment (prod/stage) • Auth proxy to protect public API endpoint • /simianarmy/ as HTTP health check • Two replicas behind ELB for high availability • Currently on-demand termination only 24

Slide 25

Slide 25 text

Tip: Slack notifications 25

Slide 26

Slide 26 text

Live Chaos Demo Let's destroy something in production! 26

Slide 27

Slide 27 text

Production Ready newsletter https://tinyletter.com/production-ready Published articles include: Chaos Engineering 101 Chaos Engineering: A Shift in Mindset Chaos Monkey for Fun and Profit A Little Story about Amazon ECS, systemd, and Chaos Monkey 27

Slide 28

Slide 28 text

Thank you. https://tinyletter.com/production-ready @mlafeldt 28