Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Monkey for Fun and Profit

Chaos Monkey for Fun and Profit

The core idea of Chaos Engineering is to inject failures proactively in a controlled manner in order to gain confidence in our systems. Automating fault injection with tools like Chaos Monkey represents one of the advanced principles of Chaos Engineering. In this talk, Mathias is going to show you how to run your very own Chaos Monkey with Docker, and how to use it for both automated and manual fault injection.

(Talk given at Chaos Engineering Hamburg meetup: http://www.meetup.com/Chaos-Engineering-Hamburg/events/231567152/)

Mathias Lafeldt

June 15, 2016
Tweet

More Decks by Mathias Lafeldt

Other Decks in Technology

Transcript

  1. Chaos Monkey
    for Fun and Profit
    1

    View Slide

  2. Hello
    I'm Mathias Lafeldt.
    @mlafeldt
    2

    View Slide

  3. Chaos Engineering 101
    • Trigger failures before they happen in production
    • Gain confidence that our systems can withstand failures
    • Verify that things behave as we expect
    • Fix them if they don't
    • Netflix: http://principlesofchaos.org/
    3

    View Slide

  4. I work at Jimdo
    We ❤ Chaos Engineering
    4

    View Slide

  5. GameDays at Jimdo
    1. Gather the team in front of a big screen
    2. Think up failure modes and estimate expected impact
    3. Go through chaos experiments together
    4. Write down measured impact
    5. Create follow-up ticket for each flaw
    5

    View Slide

  6. You don't need to
    automate chaos experiments
    to run continuously
    6

    View Slide

  7. But:
    Automating experiments
    may further increase
    confidence in your systems
    7

    View Slide

  8. 8

    View Slide

  9. Chaos Monkey
    • Most famous member of Netflix's Simian Army
    • Randomly terminates EC2 instances during business hours
    • Goal: survive terminations without customer impact
    • Change frequency, probability, and type of terminations
    9

    View Slide

  10. Quick start
    $ git clone https://github.com/Netflix/SimianArmy .
    $ ./gradlew build # Java 8
    $ vim src/main/resources/{client,simianarmy,chaos}.properties
    $ ./gradlew jettyRun
    10

    View Slide

  11. Configuration !
    $ grep -c simianarmy src/main/resources/*.properties
    src/main/resources/chaos.properties:40
    src/main/resources/client.properties:18
    src/main/resources/conformity.properties:27
    src/main/resources/janitor.properties:55
    src/main/resources/log4j.properties:0
    src/main/resources/simianarmy.properties:13
    src/main/resources/volumeTagging.properties:4
    11

    View Slide

  12. docker pull mlafeldt/simianarmy
    github.com/mlafeldt/docker-simianarmy
    12

    View Slide

  13. Unleash the monkey!
    $ docker run -it --rm \
    -e SIMIANARMY_CLIENT_AWS_ACCOUNTKEY=$AWS_ACCESS_KEY_ID \
    -e SIMIANARMY_CLIENT_AWS_SECRETKEY=$AWS_SECRET_ACCESS_KEY \
    -e SIMIANARMY_CLIENT_AWS_REGION=$AWS_REGION \
    -e SIMIANARMY_CALENDAR_ISMONKEYTIME=true \
    -e SIMIANARMY_CHAOS_ASG_ENABLED=true \
    -e SIMIANARMY_CHAOS_LEASHED=false \
    mlafeldt/simianarmy
    13

    View Slide

  14. Configuration via etcd
    $ export ETCDCTL_ENDPOINT=http://$YOUR_ETCD_IP:2379
    $ etcdctl set /simianarmy/client/aws/accountkey $AWS_ACCESS_KEY_ID
    $ etcdctl set /simianarmy/client/aws/secretkey $AWS_SECRET_KEY
    $ etcdctl set ...
    $ docker run -it --rm \
    -e CONFD_OPTS="-backend=etcd -node=$ETCDCTL_ENDPOINT" \
    mlafeldt/simianarmy
    # More confd backends: Consul, Vault, DynamoDB, etc.
    14

    View Slide

  15. Chaos Monkey
    REST API
    /simianarmy/api/v1/chaos
    15

    View Slide

  16. On-demand
    termination
    during GameDay events
    or whenever you feel like it !
    16

    View Slide

  17. Go client
    github.com/mlafeldt/chaosmonkey
    17

    View Slide

  18. Installation
    $ go get -u github.com/mlafeldt/chaosmonkey
    $ chaosmonkey --version
    chaosmonkey v0.3.0 darwin/amd64 go1.6.2
    18

    View Slide

  19. Trigger a chaos event
    $ chaosmonkey -endpoint http://example.com:8080 \
    -group ExampleAutoScalingGroup \
    -strategy ShutdownInstance
    19

    View Slide

  20. Get past chaos events
    $ chaosmonkey -endpoint http://example.com:8080
    InstanceID AutoScalingGroupName Region Strategy TriggeredAt
    i-741a78f8 ExampleAutoScalingGroup eu-west-1 DetachVolumes 2016-06-13T14:17:36Z
    i-c538184f ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T14:11:18Z
    i-615272eb ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T13:48:33Z
    20

    View Slide

  21. List chaos strategies
    $ chaosmonkey -list-strategies
    ShutdownInstance
    BlockAllNetworkTraffic
    DetachVolumes
    BurnCpu
    BurnIo
    KillProcesses
    NullRoute
    FailEc2
    FailDns
    FailDynamoDb
    FailS3
    FillDisk
    NetworkCorruption
    NetworkLatency
    NetworkLoss
    21

    View Slide

  22. AWS integration
    $ export AWS_ACCESS_KEY_ID=...
    $ export AWS_SECRET_ACCESS_KEY=...
    $ export AWS_REGION=...
    # List auto scaling groups
    $ chaosmonkey -list-groups
    # Delete all data from SimpleDB
    $ chaosmonkey -wipe-state SIMIAN_ARMY
    22

    View Slide

  23. Use with Docker
    $ docker run -it --rm -p 8080:8080 \
    -e SIMIANARMY_CHAOS_LEASHED=false \
    -e SIMIANARMY_CHAOS_TERMINATEONDEMAND_ENABLED=true \
    ...
    mlafeldt/simianarmy
    $ chaosmonkey -endpoint http://$DOCKER_HOST_IP:8080 ...
    23

    View Slide

  24. Chaos Monkey in Wonderland
    • Just another service running on Jimdo's PaaS
    • One monkey per environment (prod/stage)
    • Auth proxy to protect public API endpoint
    • /simianarmy/ as HTTP health check
    • Two replicas behind ELB for high availability
    • Currently on-demand termination only
    24

    View Slide

  25. Tip: Slack notifications
    25

    View Slide

  26. Live Chaos Demo
    Let's destroy something in
    production!
    26

    View Slide

  27. Production Ready newsletter
    https://tinyletter.com/production-ready
    Published articles include:
    Chaos Engineering 101
    Chaos Engineering: A Shift in Mindset
    Chaos Monkey for Fun and Profit
    A Little Story about Amazon ECS, systemd, and Chaos Monkey
    27

    View Slide

  28. Thank you.
    https://tinyletter.com/production-ready
    @mlafeldt
    28

    View Slide