Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chaos Monkey for Fun and Profit

Chaos Monkey for Fun and Profit

The core idea of Chaos Engineering is to inject failures proactively in a controlled manner in order to gain confidence in our systems. Automating fault injection with tools like Chaos Monkey represents one of the advanced principles of Chaos Engineering. In this talk, Mathias is going to show you how to run your very own Chaos Monkey with Docker, and how to use it for both automated and manual fault injection.

(Talk given at Chaos Engineering Hamburg meetup: http://www.meetup.com/Chaos-Engineering-Hamburg/events/231567152/)

Mathias Lafeldt

June 15, 2016
Tweet

More Decks by Mathias Lafeldt

Other Decks in Technology

Transcript

  1. Chaos Monkey
    for Fun and Profit
    1

    View full-size slide

  2. Hello
    I'm Mathias Lafeldt.
    @mlafeldt
    2

    View full-size slide

  3. Chaos Engineering 101
    • Trigger failures before they happen in production
    • Gain confidence that our systems can withstand failures
    • Verify that things behave as we expect
    • Fix them if they don't
    • Netflix: http://principlesofchaos.org/
    3

    View full-size slide

  4. I work at Jimdo
    We ❤ Chaos Engineering
    4

    View full-size slide

  5. GameDays at Jimdo
    1. Gather the team in front of a big screen
    2. Think up failure modes and estimate expected impact
    3. Go through chaos experiments together
    4. Write down measured impact
    5. Create follow-up ticket for each flaw
    5

    View full-size slide

  6. You don't need to
    automate chaos experiments
    to run continuously
    6

    View full-size slide

  7. But:
    Automating experiments
    may further increase
    confidence in your systems
    7

    View full-size slide

  8. Chaos Monkey
    • Most famous member of Netflix's Simian Army
    • Randomly terminates EC2 instances during business hours
    • Goal: survive terminations without customer impact
    • Change frequency, probability, and type of terminations
    9

    View full-size slide

  9. Quick start
    $ git clone https://github.com/Netflix/SimianArmy .
    $ ./gradlew build # Java 8
    $ vim src/main/resources/{client,simianarmy,chaos}.properties
    $ ./gradlew jettyRun
    10

    View full-size slide

  10. Configuration !
    $ grep -c simianarmy src/main/resources/*.properties
    src/main/resources/chaos.properties:40
    src/main/resources/client.properties:18
    src/main/resources/conformity.properties:27
    src/main/resources/janitor.properties:55
    src/main/resources/log4j.properties:0
    src/main/resources/simianarmy.properties:13
    src/main/resources/volumeTagging.properties:4
    11

    View full-size slide

  11. docker pull mlafeldt/simianarmy
    github.com/mlafeldt/docker-simianarmy
    12

    View full-size slide

  12. Unleash the monkey!
    $ docker run -it --rm \
    -e SIMIANARMY_CLIENT_AWS_ACCOUNTKEY=$AWS_ACCESS_KEY_ID \
    -e SIMIANARMY_CLIENT_AWS_SECRETKEY=$AWS_SECRET_ACCESS_KEY \
    -e SIMIANARMY_CLIENT_AWS_REGION=$AWS_REGION \
    -e SIMIANARMY_CALENDAR_ISMONKEYTIME=true \
    -e SIMIANARMY_CHAOS_ASG_ENABLED=true \
    -e SIMIANARMY_CHAOS_LEASHED=false \
    mlafeldt/simianarmy
    13

    View full-size slide

  13. Configuration via etcd
    $ export ETCDCTL_ENDPOINT=http://$YOUR_ETCD_IP:2379
    $ etcdctl set /simianarmy/client/aws/accountkey $AWS_ACCESS_KEY_ID
    $ etcdctl set /simianarmy/client/aws/secretkey $AWS_SECRET_KEY
    $ etcdctl set ...
    $ docker run -it --rm \
    -e CONFD_OPTS="-backend=etcd -node=$ETCDCTL_ENDPOINT" \
    mlafeldt/simianarmy
    # More confd backends: Consul, Vault, DynamoDB, etc.
    14

    View full-size slide

  14. Chaos Monkey
    REST API
    /simianarmy/api/v1/chaos
    15

    View full-size slide

  15. On-demand
    termination
    during GameDay events
    or whenever you feel like it !
    16

    View full-size slide

  16. Go client
    github.com/mlafeldt/chaosmonkey
    17

    View full-size slide

  17. Installation
    $ go get -u github.com/mlafeldt/chaosmonkey
    $ chaosmonkey --version
    chaosmonkey v0.3.0 darwin/amd64 go1.6.2
    18

    View full-size slide

  18. Trigger a chaos event
    $ chaosmonkey -endpoint http://example.com:8080 \
    -group ExampleAutoScalingGroup \
    -strategy ShutdownInstance
    19

    View full-size slide

  19. Get past chaos events
    $ chaosmonkey -endpoint http://example.com:8080
    InstanceID AutoScalingGroupName Region Strategy TriggeredAt
    i-741a78f8 ExampleAutoScalingGroup eu-west-1 DetachVolumes 2016-06-13T14:17:36Z
    i-c538184f ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T14:11:18Z
    i-615272eb ExampleAutoScalingGroup eu-west-1 ShutdownInstance 2016-06-13T13:48:33Z
    20

    View full-size slide

  20. List chaos strategies
    $ chaosmonkey -list-strategies
    ShutdownInstance
    BlockAllNetworkTraffic
    DetachVolumes
    BurnCpu
    BurnIo
    KillProcesses
    NullRoute
    FailEc2
    FailDns
    FailDynamoDb
    FailS3
    FillDisk
    NetworkCorruption
    NetworkLatency
    NetworkLoss
    21

    View full-size slide

  21. AWS integration
    $ export AWS_ACCESS_KEY_ID=...
    $ export AWS_SECRET_ACCESS_KEY=...
    $ export AWS_REGION=...
    # List auto scaling groups
    $ chaosmonkey -list-groups
    # Delete all data from SimpleDB
    $ chaosmonkey -wipe-state SIMIAN_ARMY
    22

    View full-size slide

  22. Use with Docker
    $ docker run -it --rm -p 8080:8080 \
    -e SIMIANARMY_CHAOS_LEASHED=false \
    -e SIMIANARMY_CHAOS_TERMINATEONDEMAND_ENABLED=true \
    ...
    mlafeldt/simianarmy
    $ chaosmonkey -endpoint http://$DOCKER_HOST_IP:8080 ...
    23

    View full-size slide

  23. Chaos Monkey in Wonderland
    • Just another service running on Jimdo's PaaS
    • One monkey per environment (prod/stage)
    • Auth proxy to protect public API endpoint
    • /simianarmy/ as HTTP health check
    • Two replicas behind ELB for high availability
    • Currently on-demand termination only
    24

    View full-size slide

  24. Tip: Slack notifications
    25

    View full-size slide

  25. Live Chaos Demo
    Let's destroy something in
    production!
    26

    View full-size slide

  26. Production Ready newsletter
    https://tinyletter.com/production-ready
    Published articles include:
    Chaos Engineering 101
    Chaos Engineering: A Shift in Mindset
    Chaos Monkey for Fun and Profit
    A Little Story about Amazon ECS, systemd, and Chaos Monkey
    27

    View full-size slide

  27. Thank you.
    https://tinyletter.com/production-ready
    @mlafeldt
    28

    View full-size slide