Chaos Testing for Docker Containers

Who am I? ‣Alexei Ledenev (@alexeiled) ‣Chief of Research @codefresh.io
‣Open Source Projects ‣github.com/alexei-led/pumba ‣github.com/codefresh-io/microci ‣#docker #k8s #aws #gcloud

Complex Systems "Sooner or later, any complex system will fail,
and software systems are no exception. Failure can occur anytime and almost anywhere. So you should never get too comfortable."

Last Year Outages • IBM Cloud, January 26 • GitLab,
January 31 • AWS, February 28 • Microsoft Azure, March 16 • ... • Visit http://outage.report/

What can we do to achieve better Quality? More testing?
Better monitoring? Functional Testing Performance Testing Integration Testing Penetration Testing Acceptance Testing Log Analytics Monitoring Alerts Failure Predictions

Building distributed software today is easier than ever

CAP Theorem “Of three properties of shared-data systems (Consistency, Availability
and tolerance to network Partitions) only two can be achieved at any given moment in time.” Eric Brewer

Chaos Engineering • Embrace the failure! • Defines an empirical
approach to resilience testing of distributed software systems • Chaos Experiment - define a "normal/steady" state of the system (e.g. by monitoring a set of system and business metrics) - pseudo-randomly inject faults (e.g. by terminating VMs, killing containers or changing network behavior) - try to discover system weaknesses by deviation from expected or steady-state behavior The harder it is to disrupt the steady state, the more confidence we have in the behavior of the system. http://principlesofchaos.org/

https://github.com/Netﬂix/SimianArmy

Google :// Chaos Monkey for Docker Warthog

What is Pumba(a)? 1. Pumbaa is a well-known supporting character
(warthog) from Disney’s animated ﬁlm The Lion King 2. In Swahili, pumbaa means “to be foolish, silly, weak- minded, careless, negligent” 3. It's also an open source Chaos Testing tool for Docker containers 1. https://github.com/gaia-adm/pumba 2. Linux, Windows, MacOS, Docker

What Pumba can do? • Pumba disturbs Docker runtime environment,
injecting different failures • The "victim" container can be specified, providing name/s or regex • Radom selection is also supported (with `--random` flag) • It's possible to define a repeatable time interval and duration parameters to better control the Chaos • Pumba can disturb either single Docker host, Swarm cluster, and Kubernetes cluster

Pumba Docker Chaos Commands 1. stop running Docker container 2.
kill (send termination or other signal) to the main process within a Docker container 3. remove "victim" containers, with their links and volumes 4. pause all processes within a "victim" Docker container for a speciﬁed time

demo time ...

Examples # stop random container once in a 10 minutes
$ pumba --random --interval 10m kill --signal SIGSTOP # every 15 minutes kill `mysql` container and # every hour remove containers starting with "cf" $ pumba --interval 15m kill --signal SIGTERM mysql & $ pumba --interval 1h rm re2:^cf & # every 5 min randomly kill "worker1" or "worker2" containers # and every 3 minutes pause "queue" container for 15s $ pumba --random --interval 5m kill --signal SIGKILL worker1 worker2 & $ pumba --interval 3m pause --duration 15s queue &

Pumba Network Chaos Commands 1. Pumba can emulate network failures
at container level (filter by IP too) 2. delay egress traffic for the specified containers 3. add packet-loss based on different probability loss models (2-3-4 state Markov, Gilbert, Simple Gilbert and Bernoulli) 4. rate limit egress traffic for the specified containers

# add 3 seconds delay for all outgoing packets #
on (default) network device of Docker container for 5 minutes $ pumba netem --duration 5m delay --time 3000 mydb # add a delay of 3000ms ± 30ms, # with the next random element depending 20% on the last one, # for all outgoing packets on device of all Docker container, # with name start with for 10 minutes $ pumba netem --duration 5m --interface eth1 delay \ --time 3000 --jitter 30 --correlation 20 re2:^hp # add a delay of 3000ms ± 40ms, where variation in delay # is described by normal distribution, # for all outgoing packets on main network device of randomly # chosen Docker container # from the specified list, for 5 minutes $ pumba --random netem --duration 5m delay --time 3000 \ --jitter 40 --distribution normal \ container1 container2 container3

Pumba Netem under the hood • The Linux kernel offers
a native framework for routing, bridging, firewalling, address translation and much else. • Before a packet leaves the output interface, it passes through Linux Traffic Control (tc). This component is a powerful tool for scheduling, shaping, classifying and prioritizing traffic. • The basic component of Linux Traffic Control is the queuing discipline (qdisc). The simplest implementation of a qdisc is first in first out (FIFO). There are others too. • The network emulation (netem) project adds queuing disciplines that emulate wide area network properties such as latency, jitter, loss, duplication, corruption and reordering.

demo time ... pumba netem loss: https://asciinema.org/a/82430 pumba netem delay:
https://asciinema.org/a/82428

Chaos Testing for Docker Containers

Chaos Testing for Docker Containers

Alexei Ledenev

Other Decks in Programming

Featured

Transcript