Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
GitHub Universe 2015 Talk - Your software is br...
Search
James Smith
October 02, 2015
Technology
1
110
GitHub Universe 2015 Talk - Your software is broken — pay attention: Rethinking production monitoring
My talk from GitHub Universe 2015's "Deploy" track
James Smith
October 02, 2015
Tweet
Share
More Decks by James Smith
See All by James Smith
Why Are Android Apps So Crash-Prone?
loopj
0
180
RailsConf 2016 Talk - Your software is broken — pay attention: Rethinking production monitoring
loopj
1
400
Building A Popular Open-Source Android Library - Best practices and lessons learned
loopj
4
480
Building A Popular Open-Source javascript Library
loopj
0
91
JavaScript Stack Traces: The good, the bad, and the ugly
loopj
1
210
Other Decks in Technology
See All in Technology
SQLAlchemy の select(User).where(User.id =="123") を理解してみる/sqlalchemy deep dive
3l4l5
3
250
[VPoE Global Summit] サービスレベル目標による信頼性への投資最適化
satos
0
210
今この時代に技術とどう向き合うべきか
gree_tech
PRO
2
2.1k
「魔法少女まどか☆マギカ Magia Exedra」におけるバックエンドの技術選定
gree_tech
PRO
0
110
だいたい分かった気になる 『SREの知識地図』 / introduction-to-sre-knowledge-map-book
katsuhisa91
PRO
3
1.3k
AWS UG Grantでグローバル20名に選出されてre:Inventに行く話と、マルチクラウドセキュリティの教科書を執筆した話 / The Story of Being Selected for the AWS UG Grant to Attending re:Invent, and Writing a Multi-Cloud Security Textbook
yuj1osm
1
130
Building a cloud native business on open source
lizrice
0
170
GraphRAG グラフDBを使ったLLM生成(自作漫画DBを用いた具体例を用いて)
seaturt1e
1
110
QA業務を変える(!?)AIを併用した不具合分析の実践
ma2ri
0
110
Railsの話をしよう
yahonda
0
170
アウトプットから始めるOSSコントリビューション 〜eslint-plugin-vueの場合〜 #vuefes
bengo4com
3
400
混合雲環境整合異質工作流程工具運行關鍵業務 Job 的經驗分享
yaosiang
0
140
Featured
See All Featured
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Code Review Best Practice
trishagee
72
19k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
The Invisible Side of Design
smashingmag
302
51k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
For a Future-Friendly Web
brad_frost
180
10k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Unsuck your backbone
ammeep
671
58k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
Transcript
RETHINKING PRODUCTION MONITORING YOUR SOFTWARE IS BROKEN — PAY ATTENTION
JAMES SMITH loopj loopj
None
CODE TEST DEPLOY YOLO ¯\_(ϑ)_/¯
CODE TEST DEPLOY YOLO CODE TEST DEPLOY CONFIDENCE ¯\_(ϑ)_/¯ :)
STABILITY PERFORMANCE AVAILABILITY
DELIVERING AN AWESOME EXPERIENCE TO CUSTOMERS
WHY MONITORING MATTERS
YOUR APP WILL LIVE OR DIE BASED ON ITS QUALITY
— CUSTOMERS HAVE A CHOICE
84% OF USERS ABANDON AFTER TWO CRASHES
49% OF ENGINEERING TIME FINDING & FIXING BUGS
SINS OF PRODUCTION MONITORING WHAT AM I DOING WRONG?
1. PRETENDING NOTHING IS WRONG
“But I’ve written tests!” “The QA Team will check that!”
“Works great for me!”
2. WAITING FOR CUSTOMERS TO COMPLAIN
“Nobody complained so everything must be OK”
3. LACK OF VISIBILITY
“We’ll just check the logs” “Did you remember to add
a log statement?”
4. LACK OF OWNERSHIP
“Not my problem!” “I’ve got a feature to ship” “My
code works fine”
HOW CAN WE DO BETTER?
ACCEPT AUTOMATE AGGREGATE NOTIFY PRIORITIZE DIAGNOSE TEND CORE PRINCIPLES OF
PRODUCTION MONITORING
1. ACCEPT ACCEPT THAT YOUR SOFTWARE WILL BREAK AFTER SHIPPING
2. AUTOMATE ADD HOOKS TO DETECT CRASHES/ERRORS/ISSUES IN PRODUCTION
3. AGGREGATE DON'T JUST HAVE A STREAM OF EVENTS -
GROUP LIKE ISSUES TOGETHER
4. NOTIFY ALERT YOUR DEV TEAM WHERE THEY ALREADY COMMUNICATE
5. PRIORITIZE YOU CAN'T FIX EVERY ERROR - SO FOCUS
ON THE MOST HARMFUL ONES
6. DIAGNOSE KNOWING ABOUT ISSUES ISN'T ENOUGH - THEY MUST
BE ACTIONABLE
7. TEND MAKE AN ORGANIZATIONAL CHANGE - SOMEONE NEEDS TO
CARE ABOUT ERRORS
TAKING ACTION
TOOLS
USES “FAILURE” HOOKS
ASSESS IMPACT
ASSESS SEVERITY
CAPTURES DIAGNOSTIC DATA
WORKFLOW
USE TEAM CHAT
EMBRACE COLLABORATION
TRACK PROGRESS OF FIXES
TEAM STRUCTURES
EMBRACE RAPID ITERATION
CREATE A “BUG TEAM”
OR CREATE A “BUG ROTATION”
OR KNOW “WHO LAST TOUCHED THIS CODE”?
TL;DR
AVOID THE SINS
EMBRACE CORE PRINCIPLES
TAKE ACTION
THANK YOU!
QUESTIONS?
IS HIRING! bugsnag.com/jobs @bugsnag