Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dojo.pdf
Search
Rich Burroughs
April 18, 2019
Technology
0
130
dojo.pdf
Rich Burroughs
April 18, 2019
Tweet
Share
More Decks by Rich Burroughs
See All by Rich Burroughs
Virtual_Kubernetes_Clusters__Tips_and_Tricks_-_Rejekts.pdf
richburroughs
0
1.2k
What On-Call Does to Us
richburroughs
1
120
Other Decks in Technology
See All in Technology
さくらのIaaS基盤のモニタリングとOpenTelemetry/OSC Hokkaido 2025
fujiwara3
3
440
開発生産性を組織全体の「生産性」へ! 部門間連携の壁を越える実践的ステップ
sudo5in5k
2
7k
生まれ変わった AWS Security Hub (Preview) を紹介 #reInforce_osaka / reInforce New Security Hub
masahirokawahara
0
470
スタートアップに選択肢を 〜生成AIを活用したセカンダリー事業への挑戦〜
nstock
0
170
面倒な作業はAIにおまかせ。Flutter開発をスマートに効率化
ruideengineer
0
250
オーティファイ会社紹介資料 / Autify Company Deck
autifyhq
10
130k
使いたいMCPサーバーはWeb APIをラップして自分で作る #QiitaBash
bengo4com
0
1.9k
KiCadでPad on Viaの基板作ってみた
iotengineer22
0
300
無意味な開発生産性の議論から抜け出すための予兆検知とお金とAI
i35_267
4
13k
Should Our Project Join the CNCF? (Japanese Recap)
whywaita
PRO
0
340
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
2
16k
KubeCon + CloudNativeCon Japan 2025 Recap by CA
ponkio_o
PRO
0
300
Featured
See All Featured
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
We Have a Design System, Now What?
morganepeng
53
7.7k
Faster Mobile Websites
deanohume
307
31k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Gamification - CAS2011
davidbonilla
81
5.4k
Fireside Chat
paigeccino
37
3.5k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Adopting Sorbet at Scale
ufuk
77
9.5k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.6k
Bash Introduction
62gerente
613
210k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Transcript
Learning Through Failure Rich Burroughs Community Manager Gremlin, Inc. @richburroughs
None
None
Complexity is constantly increasing
None
None
None
What's changed?
None
None
"Catastrophe is always just around the corner"
"Change introduces new forms of failure"
"All practitioner actions are gambles"
None
None
What are some ways we can learn more about systems?
None
None
None
Chaos Engineering
"The science of performing intentional experimentation on a system by
injecting precise and measured amounts of harm to observe how the system responds for the purpose of improving the system’s resilience."
None
Prerequisites —Observability —Blameless Culture
Scientific Method —Ask a question —Research —Form a hypothesis —Experiment
to test the hypothesis —Analyze data and draw a conclusion —Share the results
Types of attacks —Shutdown —CPU —Memory —I/O —Network Latency —Packet
Loss —DNS —Blackhole
None
The goal is to experiment in Production
None
Example experiment —Application: Front End —Attack: CPU —Hypothesis: Adding CPU
load will cause additional hosts to spin up in our Autoscaling Group —Abort condition: Latency increases by 20%
Example experiment #2 —Application: Front End —Attack: Blackhole —Hypothesis: Blackholing
the hostname for the Twilio API will cause the SMS transmissions to time out —Abort condition: Error rate increases by 20%
Don't experiment on things you know are broken
None
Questions —Were we able to measure the results? —Did the
system respond the way we expected? —Are there things we need to fix?
Run experiments to simulate an incident you've had
What comes after Game Days?
Continuous Chaos
Maturity model —Running manual experiments —Running experiments using Chaos Engineering
tools —Regularly scheduled Game Days —Experimenting in Production —Continuous Chaos
Next steps: —Join our Chaos Engineering Slack: gremlin.com/ slack —Read
tutorials: gremlin.com/community —Chaos Conf: chaosconf.io —Gremlin Free: go.gremlin.com/richchaos
Thank you! Twitter: @richburroughs Email:
[email protected]
Slides: https://github.com/richburroughs/ dojo201904