Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Universe 2015 Talk - Your software is br...
Search
James Smith
October 02, 2015
Technology
120
1
Share
GitHub Universe 2015 Talk - Your software is broken — pay attention: Rethinking production monitoring
My talk from GitHub Universe 2015's "Deploy" track
James Smith
October 02, 2015
More Decks by James Smith
See All by James Smith
Why Are Android Apps So Crash-Prone?
loopj
0
190
RailsConf 2016 Talk - Your software is broken — pay attention: Rethinking production monitoring
loopj
1
410
Building A Popular Open-Source Android Library - Best practices and lessons learned
loopj
4
490
Building A Popular Open-Source javascript Library
loopj
0
97
JavaScript Stack Traces: The good, the bad, and the ugly
loopj
1
220
Other Decks in Technology
See All in Technology
サンプリングは「作る」のか「使う」のか? 分散トレースのコストと運用を両立する実践的戦略 / Why you need the tail sampling and why you don't want it
ymotongpoo
4
170
『生成AI時代のクレデンシャルとパーミッション設計 — Claude Code を起点に』の執筆企画
takuros
3
2.3k
Vision Banana: Image Generators are Generalist Vision Learners
kzykmyzw
0
360
[Scram Fest Niigata2026]Quality as Code〜AIにQAの思考を再現させる試み〜
masamiyajiri
1
310
100マイクロサービスのTerraform/Kubernetes管理地獄から抜け出すためのAI活用術
markie1009
0
140
ESP32 IoTを動かしながらメモリ使用量を観測してみた話
zozotech
PRO
0
110
セキュリティ対策、何からはじめる? CloudNative環境の脅威モデリングと リスク評価実践入門 #cloudnativekaigi
varu3
5
800
ワールドカフェ再び、そしてゴール・ルール・ロール・ツール / World Café Revisited, and the Goals-Rules-Roles-Tools
ks91
PRO
0
130
会社説明資料|株式会社ギークプラス ソフトウェア事業部
geekplus_tech
0
220
「強制アップデート」か「チームの自律」か?エンタープライズが辿り着いたプラットフォームのハイブリッド運用/cloudnative-kaigi-hybrid-platform-operations
mhrtech
0
180
サイボウズ、プラットフォームエンジニアリング始めるってよ ― プラットフォームチームの事業貢献と組織アラインメントの強化
ueokande
0
100
雑談は、センサーだった
bitkey
PRO
2
230
Featured
See All Featured
A Tale of Four Properties
chriscoyier
163
24k
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Accessibility Awareness
sabderemane
1
110
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
130
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
280
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
390
Crafting Experiences
bethany
1
140
The SEO identity crisis: Don't let AI make you average
varn
0
460
My Coaching Mixtape
mlcsv
0
120
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
500
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Transcript
RETHINKING PRODUCTION MONITORING YOUR SOFTWARE IS BROKEN — PAY ATTENTION
JAMES SMITH loopj loopj
None
CODE TEST DEPLOY YOLO ¯\_(ϑ)_/¯
CODE TEST DEPLOY YOLO CODE TEST DEPLOY CONFIDENCE ¯\_(ϑ)_/¯ :)
STABILITY PERFORMANCE AVAILABILITY
DELIVERING AN AWESOME EXPERIENCE TO CUSTOMERS
WHY MONITORING MATTERS
YOUR APP WILL LIVE OR DIE BASED ON ITS QUALITY
— CUSTOMERS HAVE A CHOICE
84% OF USERS ABANDON AFTER TWO CRASHES
49% OF ENGINEERING TIME FINDING & FIXING BUGS
SINS OF PRODUCTION MONITORING WHAT AM I DOING WRONG?
1. PRETENDING NOTHING IS WRONG
“But I’ve written tests!” “The QA Team will check that!”
“Works great for me!”
2. WAITING FOR CUSTOMERS TO COMPLAIN
“Nobody complained so everything must be OK”
3. LACK OF VISIBILITY
“We’ll just check the logs” “Did you remember to add
a log statement?”
4. LACK OF OWNERSHIP
“Not my problem!” “I’ve got a feature to ship” “My
code works fine”
HOW CAN WE DO BETTER?
ACCEPT AUTOMATE AGGREGATE NOTIFY PRIORITIZE DIAGNOSE TEND CORE PRINCIPLES OF
PRODUCTION MONITORING
1. ACCEPT ACCEPT THAT YOUR SOFTWARE WILL BREAK AFTER SHIPPING
2. AUTOMATE ADD HOOKS TO DETECT CRASHES/ERRORS/ISSUES IN PRODUCTION
3. AGGREGATE DON'T JUST HAVE A STREAM OF EVENTS -
GROUP LIKE ISSUES TOGETHER
4. NOTIFY ALERT YOUR DEV TEAM WHERE THEY ALREADY COMMUNICATE
5. PRIORITIZE YOU CAN'T FIX EVERY ERROR - SO FOCUS
ON THE MOST HARMFUL ONES
6. DIAGNOSE KNOWING ABOUT ISSUES ISN'T ENOUGH - THEY MUST
BE ACTIONABLE
7. TEND MAKE AN ORGANIZATIONAL CHANGE - SOMEONE NEEDS TO
CARE ABOUT ERRORS
TAKING ACTION
TOOLS
USES “FAILURE” HOOKS
ASSESS IMPACT
ASSESS SEVERITY
CAPTURES DIAGNOSTIC DATA
WORKFLOW
USE TEAM CHAT
EMBRACE COLLABORATION
TRACK PROGRESS OF FIXES
TEAM STRUCTURES
EMBRACE RAPID ITERATION
CREATE A “BUG TEAM”
OR CREATE A “BUG ROTATION”
OR KNOW “WHO LAST TOUCHED THIS CODE”?
TL;DR
AVOID THE SINS
EMBRACE CORE PRINCIPLES
TAKE ACTION
THANK YOU!
QUESTIONS?
IS HIRING! bugsnag.com/jobs @bugsnag