Slide 1

Slide 1 text

The Future of Chaos Engineering In Pursuit of the Unknown Unknowns Crystal Hirschorn VP Engineering, Global Strategy & Operations, Condé Nast @cfhirschorn

Slide 2

Slide 2 text

"Complexity doesn't allow us to think in linear, unidirectional terms along which progress or regress could be plotted."

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

BRAZIL AUSTRALIA ARABIA CHINA FRANCE GERMANY INDIA ITALY JAPAN KOREA LATIN AMERICA NETHER LANDS POLAND PORTUGAL SOUTH AFRICA SPAIN TAIWAN THAILAND TURKEY UK UKRAINE HUNGARY BULGARIA ICELAND ROMANIA CZECH REP SLOVAKIA MEXICO RUSSIA

Slide 5

Slide 5 text

COMPLICATED Known Unknowns SIMPLE Known knowns COMPLEX Unknown Unknowns CHAOTIC Unknowables Emergent Practice Good Practice Novel Practice Best Practice Disorder

Slide 6

Slide 6 text

https://www.youtube.com/watch?v=cefJd2v037U Experimenting effectively

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Modern architecture evolution

Slide 9

Slide 9 text

Modern architecture evolution

Slide 10

Slide 10 text

Modern architectures: Microservices

Slide 11

Slide 11 text

Modern architectures: Service Mesh

Slide 12

Slide 12 text

Modern architectures: Serverless Synchronous (push) Asynchronous (event) Streaming

Slide 13

Slide 13 text

Modern architectures: Applications

Slide 14

Slide 14 text

Modern architectures: Front-end

Slide 15

Slide 15 text

The Root Cause Fallacy: A Brief Story of a Web Platform Outage

Slide 16

Slide 16 text

The Root Cause Fallacy: A Brief Story of a Web Platform Outage

Slide 17

Slide 17 text

The Root Cause Fallacy: A Brief Story of a Web Platform Outage

Slide 18

Slide 18 text

The Root Cause Fallacy: A Brief Story of a Web Platform Outage

Slide 19

Slide 19 text

https://www.oreilly.com/library/view/distributed-systems-observability/9781492033431/ch04.html

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

“Progress depends on our changing the world to fit us. Not the other way around.”

Slide 26

Slide 26 text

Organisational Pressures and Constraints Regulators Policies Economics Competition Governance Logistics Management Outside influences Internal (org) influences Operator influences Efficiency Trade Offs Automation Time criticality Esoteric knowledge Mental models Ergonomics OpEx vs CapEx pressures Lacking details Culture norms Geopolitical Vendors Societal culture Workload Cognitive switching

Slide 27

Slide 27 text

At what cost? https://www.gremlin.com/ecommerce-cost-of-downtime/

Slide 28

Slide 28 text

http://www.safetydifferently.com/the-varieties-of-human-work/ An alternative approach to post mortems

Slide 29

Slide 29 text

Invite a diverse audience to your post-incident learning reviews

Slide 30

Slide 30 text

Actions. What can we turn into hypotheses / experiments?

Slide 31

Slide 31 text

Actions. Other sources for learning opportunities. Action 1 Description: Gaps identified in architectural knowledge. Mary will do a 2 weeks rotation to shadow and pair on team Orion. Artefacts: Whiteboard diagrams from post-incident review Owner: Orion Action 2 Description: Incident Management process did not flow in expected order. Escalations were delayed. Schedule more role playing and game days. Artefacts: Game Day template Incident Management Process Owner: SRE Action 3 Description: Too many graphs are being displayed in single dashboard. Many are not easily discernible by product engineering. Zenith to work with Orion and Hydra teams on system metrics visualisation strategy. Artefacts: DataDog dashboard (timestamped to match incident timings) Owner: Zenith

Slide 32

Slide 32 text

CI/CD/CV pipelines

Slide 33

Slide 33 text

Tooling and Toolchains

Slide 34

Slide 34 text

Tooling and Toolchains https://medium.com/@adhorn/injecting-chaos-to-amazon-ec2-using-amazon-system-manager-ca95ee7878f5

Slide 35

Slide 35 text

Multi-vector attacks

Slide 36

Slide 36 text

It’s Stochastic, It’s Fantastic.

Slide 37

Slide 37 text

████████╗██╗ ██╗ █████╗ ███╗ ██╗██╗ ██╗ ██╗ ██╗ ██████╗ ██╗ ██╗ ╚══██╔══╝██║ ██║██╔══██╗████╗ ██║██║ ██╔╝ ╚██╗ ██╔╝██╔═══██╗██║ ██║ ██║ ███████║███████║██╔██╗ ██║█████╔╝ ╚████╔╝ ██║ ██║██║ ██║ ██║ ██╔══██║██╔══██║██║╚██╗██║██╔═██╗ ╚██╔╝ ██║ ██║██║ ██║ ██║ ██║ ██║██║ ██║██║ ╚████║██║ ██╗ ██║ ╚██████╔╝╚██████╔╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ Crystal Hirschorn VP Engineering, Global Strategy & Operations, Condé Nast @cfhirschorn