Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dojo.pdf
Search
Rich Burroughs
April 18, 2019
Technology
0
130
dojo.pdf
Rich Burroughs
April 18, 2019
Tweet
Share
More Decks by Rich Burroughs
See All by Rich Burroughs
Virtual_Kubernetes_Clusters__Tips_and_Tricks_-_Rejekts.pdf
richburroughs
0
1.1k
What On-Call Does to Us
richburroughs
1
110
Other Decks in Technology
See All in Technology
Cloudflareで実現する AIエージェント ワークフロー基盤
kmd09
0
280
Accessibility Inspectorを活用した アプリのアクセシビリティ向上方法
hinakko
0
170
データ基盤におけるIaCの重要性とその運用
mtpooh
2
250
JAWS-UG20250116_iOSアプリエンジニアがAWSreInventに行ってきた(真面目編)
totokit4
0
140
2025年の挑戦 コーポレートエンジニアの技術広報/techpr5
nishiuma
0
140
CDKのコードレビューを楽にするパッケージcdk-mentorを作ってみた/cdk-mentor
tomoki10
0
200
チームが毎日小さな変化と適応を続けたら1年間でスケール可能なアジャイルチームができた話 / Building a Scalable Agile Team
kakehashi
2
220
今年一年で頑張ること / What I will do my best this year
pauli
1
220
2025年に挑戦したいこと
molmolken
0
150
Git scrapingで始める継続的なデータ追跡 / Git Scraping
ohbarye
5
470
re:Invent 2024のふりかえり
beli68
0
110
.NET AspireでAzure Functionsやクラウドリソースを統合する
tsubakimoto_s
0
180
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Unsuck your backbone
ammeep
669
57k
Code Reviewing Like a Champion
maltzj
521
39k
Producing Creativity
orderedlist
PRO
343
39k
Product Roadmaps are Hard
iamctodd
PRO
50
11k
It's Worth the Effort
3n
183
28k
How GitHub (no longer) Works
holman
312
140k
Why Our Code Smells
bkeepers
PRO
335
57k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
127
18k
Adopting Sorbet at Scale
ufuk
74
9.2k
Designing for Performance
lara
604
68k
Raft: Consensus for Rubyists
vanstee
137
6.7k
Transcript
Learning Through Failure Rich Burroughs Community Manager Gremlin, Inc. @richburroughs
None
None
Complexity is constantly increasing
None
None
None
What's changed?
None
None
"Catastrophe is always just around the corner"
"Change introduces new forms of failure"
"All practitioner actions are gambles"
None
None
What are some ways we can learn more about systems?
None
None
None
Chaos Engineering
"The science of performing intentional experimentation on a system by
injecting precise and measured amounts of harm to observe how the system responds for the purpose of improving the system’s resilience."
None
Prerequisites —Observability —Blameless Culture
Scientific Method —Ask a question —Research —Form a hypothesis —Experiment
to test the hypothesis —Analyze data and draw a conclusion —Share the results
Types of attacks —Shutdown —CPU —Memory —I/O —Network Latency —Packet
Loss —DNS —Blackhole
None
The goal is to experiment in Production
None
Example experiment —Application: Front End —Attack: CPU —Hypothesis: Adding CPU
load will cause additional hosts to spin up in our Autoscaling Group —Abort condition: Latency increases by 20%
Example experiment #2 —Application: Front End —Attack: Blackhole —Hypothesis: Blackholing
the hostname for the Twilio API will cause the SMS transmissions to time out —Abort condition: Error rate increases by 20%
Don't experiment on things you know are broken
None
Questions —Were we able to measure the results? —Did the
system respond the way we expected? —Are there things we need to fix?
Run experiments to simulate an incident you've had
What comes after Game Days?
Continuous Chaos
Maturity model —Running manual experiments —Running experiments using Chaos Engineering
tools —Regularly scheduled Game Days —Experimenting in Production —Continuous Chaos
Next steps: —Join our Chaos Engineering Slack: gremlin.com/ slack —Read
tutorials: gremlin.com/community —Chaos Conf: chaosconf.io —Gremlin Free: go.gremlin.com/richchaos
Thank you! Twitter: @richburroughs Email:
[email protected]
Slides: https://github.com/richburroughs/ dojo201904