FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH Lothar I am solutions architect and digital disruptor. Since 2009, I work at the intersection between cloud and analytics. Digital disruption is coming to ever more sectors and I want to understand its technological, societal and economical impacts. Before 2009, I managed large project budgets, turned to an architect later on and built a digital radiology and migrated the Miles & More. @lwieske news.trivadis.com/blog
applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
maintenance update at the end of September 2014 in order to patch a security vulnerability in a Xen hypervisor affecting about 10% of their global fleet of cloud servers. • Netflix has a long history of using their Simian army - Chaos Monkey, Gorilla and Kong – to force reboots of their servers in order to see how the overall system reacts and what can be done to improve resilience. The problem this time was that the operation would affect some of their database servers, more exactly 218 Cassandra nodes. It is one thing to perform a live restart of a server streaming a video, and it is a lot more difficult to do the same to a stateful database. • Out of our 2700+ production Cassandra nodes, 218 were rebooted. • 22 Cassandra nodes were on hardware that did not reboot successfully. • They were detected and replaced with minimal human intervention. • Netflix experienced 0 downtime that weekend.
ideal application of Chaos Engineering, applied to the processes of experimentation described above. The degree to which these principles are pursued strongly correlates to the confidence we can have in a distributed system at scale. • Build a Hypothesis around Steady State Behavior • Vary Real-world Events • Run Experiments in Production • Automate Experiments to Run Continuously • Minimize Blast Radius • Experimenting in production has the potential to cause unnecessary customer pain. While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained.
Your System. Complexity Is Part Of Your System. Testing In Production? Yes You Can! You Should Chaos Engineer Everything Cloud and Microservices – Among Others
mobile app to give feedback on each session • Use "My schedule" if you have registered for a session • Otherwise use "Agenda" and the search function • If the mobile app does not work (or if you have a Windows smartphone / Desktop), use your smartphone browser • URL: http://trivadis.quickmobileplatform.eu/ • User name: <your_loginname> (such as “svv”) • Password: sent by e-mail...