Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Four years of breaking things in production, on...
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Eric Sigler
November 09, 2017
Technology
0
62
Four years of breaking things in production, on purpose.
Presented at Chaos Day Twin Cities, November 2017.
Eric Sigler
November 09, 2017
Tweet
Share
More Decks by Eric Sigler
See All by Eric Sigler
Instrumenting The Rest Of The Company: Hunting For Metrics
esigler
0
400
A Brief Introduction To DevOps
esigler
0
120
Humans are terrible compilers: A User's Guide
esigler
0
130
Do You Know If Your Service Is Working Properly? A Guide To Being Paranoid.
esigler
0
190
"Is there any strong objection?"
esigler
0
240
Fear, Uncertainty, and Continuous Deployment
esigler
1
130
3AM, a survey.
esigler
0
250
Strategies For Being On Call & Keeping Your Sanity At The Same Time
esigler
0
180
Engineering for Engineers
esigler
0
110
Other Decks in Technology
See All in Technology
日本の85%が使う公共SaaSは、どう育ったのか
taketakekaho
1
240
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
生成AIと余白 〜開発スピードが向上した今、何に向き合う?〜
kakehashi
PRO
0
160
ECS障害を例に学ぶ、インシデント対応に備えたAIエージェントの育て方 / How to develop AI agents for incident response with ECS outage
iselegant
4
400
ClickHouseはどのように大規模データを活用したAIエージェントを全社展開しているのか
mikimatsumoto
0
270
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
12
5.6k
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
2.1k
SRE Enabling戦記 - 急成長する組織にSREを浸透させる戦いの歴史
markie1009
0
170
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
3
2.6k
pool.ntp.orgに ⾃宅サーバーで 参加してみたら...
tanyorg
0
1.1k
30万人の同時アクセスに耐えたい!新サービスの盤石なリリースを支える負荷試験 / SRE Kaigi 2026
genda
4
1.4k
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
340
58k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
160
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
71
A Modern Web Designer's Workflow
chriscoyier
698
190k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
300
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
89
The SEO Collaboration Effect
kristinabergwall1
0
350
How to make the Groovebox
asonas
2
1.9k
Design in an AI World
tapps
0
150
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
170
The Curse of the Amulet
leimatthew05
1
8.7k
Transcript
Eric Sigler, Head of DevOps, PagerDuty @esigler Four years of
breaking things in production, on purpose.
@esigler Obligatory disclaimer: This is what works for us. Take
away ideas, not dogmas.
@esigler
@esigler 2013: Every Friday, 1 hour. 2013 2014 2015 2016
2017
@esigler 2013 2014 2015 2016 2017
None
@esigler 2014: Expanding Scope 2013 2014 2015 2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler 2015: Automation 2013 2014 2015 2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler 2016: Adding In Randomness 2013 2014 2015 2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler Also 2016: Putting It All Together 2013 2014 2015
2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler 2017: Distributing Knowledge 2013 2014 2015 2016 2017
@esigler 2013 2014 2015 2016 2017
@esigler Failure Friday sessions: 133 Faults injected: 708 Fault injections
resulting in a public postmortem: 3
@esigler Simulated full AZ failures: 4 Simulated full Region failures:
3 Simulated partial Disaster Recovery: 2
@esigler Tickets created from Failure Friday: over 225 Distinct services
that had faults injected: 49
@esigler
@esigler Optimized for learning first, tooling second Built the toolchain
to enable other teams Distributed chaos engineering knowledge
@esigler