Slide 1

Slide 1 text

© 2016 Mesosphere, Inc. All Rights Reserved. CONSTRUCTIVE DESTRUCTIVENESS FOR CONTAINERS 1 Michael Hausenblas | ContainerCamp, London | 2016-09-09

Slide 2

Slide 2 text

© 2016 Mesosphere, Inc. All Rights Reserved. MOTIVATION 2

Slide 3

Slide 3 text

© 2016 Mesosphere, Inc. All Rights Reserved. WHY WOULD I WANT TO DO THAT? 3

Slide 4

Slide 4 text

© 2016 Mesosphere, Inc. All Rights Reserved. WHY I USUALLY DON'T DO IT … 4 • My code is bug free • It costs time/money • 'Uncool' work

Slide 5

Slide 5 text

© 2016 Mesosphere, Inc. All Rights Reserved. TOWARDS A FRAMEWORK 5

Slide 6

Slide 6 text

© 2016 Mesosphere, Inc. All Rights Reserved. TERMINOLOGY 6 unit testing property based testing fault injection resilience testing smoke testing stress testing canary deployments dev prod cost-benefit language dependent

Slide 7

Slide 7 text

© 2016 Mesosphere, Inc. All Rights Reserved. SCOPES 7 container host & intra-host service (app/business)

Slide 8

Slide 8 text

© 2016 Mesosphere, Inc. All Rights Reserved. FALLACIES OF DISTRIBUTED COMPUTING 8 • The network is reliable. • Latency is zero. • Bandwidth is infinite. • The network is secure. • Topology doesn't change. • There is one administrator. • Transport cost is zero. • The network is homogeneous.

Slide 9

Slide 9 text

© 2016 Mesosphere, Inc. All Rights Reserved. CHAOS ENGINEERING 9 Pioneered by Netflix • Define normal behaviour • Define expectations • Test hypothesis

Slide 10

Slide 10 text

© 2015 Mesosphere, Inc. All Rights Reserved. FAILURE MODES 10 container master worker container container container container master master worker worker worker worker worker container container container container container container container container

Slide 11

Slide 11 text

© 2016 Mesosphere, Inc. All Rights Reserved. ASPECTS 11 • automation (health checks) • black box testing (what happens if you cut off the connection to environment?) • dimensions • disk I/O • time (skew) • network traffic • requests per second served • cascading failures

Slide 12

Slide 12 text

© 2016 Mesosphere, Inc. All Rights Reserved. STATEFUL SERVICES 12

Slide 13

Slide 13 text

© 2016 Mesosphere, Inc. All Rights Reserved. APPLYING RESILIENCE TESTING 13

Slide 14

Slide 14 text

© 2016 Mesosphere, Inc. All Rights Reserved. TOOLING 14 • Chaos Monkey
 http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html 
 https://medium.com/production-ready/chaos-monkey-for-fun-and- profit-87e2f343db31 • Wehkamp's Half-Life2/microservices attack tool
 https://www.wehkamplabs.com/blog/2016/06/02/docker-and-zombies/

Slide 15

Slide 15 text

© 2016 Mesosphere, Inc. All Rights Reserved. TOOLING 15 • Pumba
 http://blog.terranillius.com/post/pumba_docker_chaos_testing/ • Mizaru
 https://github.com/giantswarm/mizaru • Blockade
 https://github.com/dcm-oss/blockade • Kubernetes (sig-testing)
 https://github.com/kubernetes/kubernetes/issues/4548 • DC/OS: DRAX
 https://github.com/dcos-labs/drax

Slide 16

Slide 16 text

© 2016 Mesosphere, Inc. All Rights Reserved. JEPSEN.IO 16

Slide 17

Slide 17 text

© 2016 Mesosphere, Inc. All Rights Reserved. RESOURCES 17

Slide 18

Slide 18 text

© 2015 Mesosphere, Inc. All Rights Reserved. 18 Resilience Engineering in Practice: A Guidebook

Slide 19

Slide 19 text

© 2015 Mesosphere, Inc. All Rights Reserved. 19 https://www.youtube.com/watch?v=4fFDFbi3toc

Slide 20

Slide 20 text

© 2015 Mesosphere, Inc. All Rights Reserved. 20 https://github.com/spacejam Big kudos to Tyler Neely!

Slide 21

Slide 21 text

© 2016 Mesosphere, Inc. All Rights Reserved. Q & A 21 • @mhausenblas • mhausenblas.info • [email protected] https://dcos.io