Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dojo.pdf
Search
Rich Burroughs
April 18, 2019
Technology
0
130
dojo.pdf
Rich Burroughs
April 18, 2019
Tweet
Share
More Decks by Rich Burroughs
See All by Rich Burroughs
Virtual_Kubernetes_Clusters__Tips_and_Tricks_-_Rejekts.pdf
richburroughs
0
1.2k
What On-Call Does to Us
richburroughs
1
120
Other Decks in Technology
See All in Technology
From Natural Language to K8s Operations: The MCP Architecture and Practice of kubectl-ai
appleboy
0
200
QA業務を変える(!?)AIを併用した不具合分析の実践
ma2ri
0
140
Observability — Extending Into Incident Response
nari_ex
1
250
会社を支える Pythonという言語戦略 ~なぜPythonを主要言語にしているのか?~
curekoshimizu
3
660
20251027_マルチエージェントとは
almondo_event
1
380
クラウドとリアルの融合により、製造業はどう変わるのか?〜クラスメソッドの製造業への取組と共に〜
hamadakoji
0
420
初めてのDatabricks Apps開発
taka_aki
1
370
可観測性は開発環境から、開発環境にもオブザーバビリティ導入のススメ
layerx
PRO
0
160
OCIjp_Oracle AI World_Recap
shinpy
1
180
NLPコロキウム20251022_超効率化への挑戦: LLM 1bit量子化のロードマップ
yumaichikawa
2
440
20251027_findyさん_音声エージェントLT
almondo_event
2
380
OpenTelemetry が拡げる Gemini CLI の可観測性
phaya72
2
2.2k
Featured
See All Featured
Optimising Largest Contentful Paint
csswizardry
37
3.5k
The Invisible Side of Design
smashingmag
302
51k
Code Reviewing Like a Champion
maltzj
526
40k
GraphQLとの向き合い方2022年版
quramy
49
14k
The World Runs on Bad Software
bkeepers
PRO
72
11k
Documentation Writing (for coders)
carmenintech
75
5.1k
Practical Orchestrator
shlominoach
190
11k
Context Engineering - Making Every Token Count
addyosmani
8
300
Done Done
chrislema
185
16k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Balancing Empowerment & Direction
lara
5
700
Rails Girls Zürich Keynote
gr2m
95
14k
Transcript
Learning Through Failure Rich Burroughs Community Manager Gremlin, Inc. @richburroughs
None
None
Complexity is constantly increasing
None
None
None
What's changed?
None
None
"Catastrophe is always just around the corner"
"Change introduces new forms of failure"
"All practitioner actions are gambles"
None
None
What are some ways we can learn more about systems?
None
None
None
Chaos Engineering
"The science of performing intentional experimentation on a system by
injecting precise and measured amounts of harm to observe how the system responds for the purpose of improving the system’s resilience."
None
Prerequisites —Observability —Blameless Culture
Scientific Method —Ask a question —Research —Form a hypothesis —Experiment
to test the hypothesis —Analyze data and draw a conclusion —Share the results
Types of attacks —Shutdown —CPU —Memory —I/O —Network Latency —Packet
Loss —DNS —Blackhole
None
The goal is to experiment in Production
None
Example experiment —Application: Front End —Attack: CPU —Hypothesis: Adding CPU
load will cause additional hosts to spin up in our Autoscaling Group —Abort condition: Latency increases by 20%
Example experiment #2 —Application: Front End —Attack: Blackhole —Hypothesis: Blackholing
the hostname for the Twilio API will cause the SMS transmissions to time out —Abort condition: Error rate increases by 20%
Don't experiment on things you know are broken
None
Questions —Were we able to measure the results? —Did the
system respond the way we expected? —Are there things we need to fix?
Run experiments to simulate an incident you've had
What comes after Game Days?
Continuous Chaos
Maturity model —Running manual experiments —Running experiments using Chaos Engineering
tools —Regularly scheduled Game Days —Experimenting in Production —Continuous Chaos
Next steps: —Join our Chaos Engineering Slack: gremlin.com/ slack —Read
tutorials: gremlin.com/community —Chaos Conf: chaosconf.io —Gremlin Free: go.gremlin.com/richchaos
Thank you! Twitter: @richburroughs Email:
[email protected]
Slides: https://github.com/richburroughs/ dojo201904