Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
dojo.pdf
Search
Rich Burroughs
April 18, 2019
Technology
0
130
dojo.pdf
Rich Burroughs
April 18, 2019
Tweet
Share
More Decks by Rich Burroughs
See All by Rich Burroughs
Virtual_Kubernetes_Clusters__Tips_and_Tricks_-_Rejekts.pdf
richburroughs
0
1.2k
What On-Call Does to Us
richburroughs
1
120
Other Decks in Technology
See All in Technology
useEffectってなんで非推奨みたいなこと言われてるの?
maguroalternative
9
5.9k
Claude Code はじめてガイド -1時間で学べるAI駆動開発の基本と実践-
oikon48
37
18k
IaC を使いたくないけどポリシー管理をどうにかしたい
kazzpapa3
1
210
『星の世界の地図の話: Google Sky MapをAI Agentでよみがえらせる』 - Google Developers DevFest Tokyo 2025
taniiicom
0
450
タグ付きユニオン型を便利に使うテクニックとその注意点
uhyo
1
390
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
37k
ローカルVLM OCRモデル + Gemini 3.0 Proで日本語性能を試す
gotalab555
1
260
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.9k
オープンデータの内製化から分かったGISデータを巡る行政の課題
naokim84
2
1.3k
Data Hubグループ 紹介資料
sansan33
PRO
0
2.3k
生成AIシステムとAIエージェントに関する性能や安全性の評価
shibuiwilliam
2
300
セキュリティAIエージェントの現在と未来 / PSS #2 Takumi Session
flatt_security
2
410
Featured
See All Featured
Speed Design
sergeychernyshev
33
1.3k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.3k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Being A Developer After 40
akosma
91
590k
Bash Introduction
62gerente
615
210k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Transcript
Learning Through Failure Rich Burroughs Community Manager Gremlin, Inc. @richburroughs
None
None
Complexity is constantly increasing
None
None
None
What's changed?
None
None
"Catastrophe is always just around the corner"
"Change introduces new forms of failure"
"All practitioner actions are gambles"
None
None
What are some ways we can learn more about systems?
None
None
None
Chaos Engineering
"The science of performing intentional experimentation on a system by
injecting precise and measured amounts of harm to observe how the system responds for the purpose of improving the system’s resilience."
None
Prerequisites —Observability —Blameless Culture
Scientific Method —Ask a question —Research —Form a hypothesis —Experiment
to test the hypothesis —Analyze data and draw a conclusion —Share the results
Types of attacks —Shutdown —CPU —Memory —I/O —Network Latency —Packet
Loss —DNS —Blackhole
None
The goal is to experiment in Production
None
Example experiment —Application: Front End —Attack: CPU —Hypothesis: Adding CPU
load will cause additional hosts to spin up in our Autoscaling Group —Abort condition: Latency increases by 20%
Example experiment #2 —Application: Front End —Attack: Blackhole —Hypothesis: Blackholing
the hostname for the Twilio API will cause the SMS transmissions to time out —Abort condition: Error rate increases by 20%
Don't experiment on things you know are broken
None
Questions —Were we able to measure the results? —Did the
system respond the way we expected? —Are there things we need to fix?
Run experiments to simulate an incident you've had
What comes after Game Days?
Continuous Chaos
Maturity model —Running manual experiments —Running experiments using Chaos Engineering
tools —Regularly scheduled Game Days —Experimenting in Production —Continuous Chaos
Next steps: —Join our Chaos Engineering Slack: gremlin.com/ slack —Read
tutorials: gremlin.com/community —Chaos Conf: chaosconf.io —Gremlin Free: go.gremlin.com/richchaos
Thank you! Twitter: @richburroughs Email:
[email protected]
Slides: https://github.com/richburroughs/ dojo201904