Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
Berlin 2013 - Session - Jeff Weinstein
Monitorama
September 20, 2013
2
500
Berlin 2013 - Session - Jeff Weinstein
Monitorama
September 20, 2013
Tweet
Share
More Decks by Monitorama
See All by Monitorama
monitorama
1
340
monitorama
0
270
monitorama
4
560
monitorama
0
210
monitorama
5
600
monitorama
6
570
monitorama
1
430
monitorama
0
310
monitorama
0
480
Featured
See All Featured
jakevdp
775
200k
lara
590
61k
davidbonilla
70
3.5k
lynnandtonic
272
16k
robhawkes
52
2.8k
danielanewman
1
480
productmarketing
5
660
carmenhchung
26
1.4k
scottboms
251
11k
jrom
114
7.1k
lara
16
2.6k
rmw
11
740
Transcript
How monitoring can improve the rest of the company
Monitorama EU 2013 @jeff_weinstein
I real-time and batch data analytics
Monitoring can wildly improve the whole company by
sharing data and sharing techniques.
Monitoring Folks Developers Business Analysts
ExecuIves & Product Data ScienIsts Data
Apps & Services & Systems Users
Data Code & Config Monitoring
Some problems…
Data Processing Apps Systems Logs /
Events Metrics Graphs & Alerts Apps 3rd Party Reports & Queries ETL AnalyIc Systems Monitoring: Streaming BI: Batch
Data Needs Logs Metrics Logs Metrics
Streaming Batch Data Monitoring BI
Data Tools Stack Monitoring • Ad hoc
– sed, grep, awk – ES, LogStash, Splunk, … • Storage – Hosts, Ganglia, OTSDB – Central syslog server • VisualizaIon/ReporIng – Graphite, RRDTool, 3rd party – Homegrown • AlerIng/EscalaIon – Nagios, Sensu, PagerDuty, … Rest of company • Ad hoc – Excel, SQL, Hive – MapReduce, … • Storage – Lots o’ databases, Excel – Hadoop, RDBMS… • VisualizaIon/ReporIng – Excel, R, Tableau ... – Dinosaur apps, … • AlerIng/EscalaIon – nada
Metrics
Views Unintelligible generated views Too granular for long
term trends Lack of historical Intolerant to anomalies
Team and incenIves • What team? • Change
vs. reliability • Planning • Budget • Churn
Good or bad? • Specific Tools • Decentralized
• Focus • Ownership • Lost context • Siloed work • Data dark • Misunderstanding
Some fixes
End to End Data Pipeline ü Structured logs ü (Config)
ü Measure once ü AutomaIc metrics ü API ü Graph tools ü Glossary ü AnnotaIons and tags ü Pipeline
Structured events • JSON (or whatever) • (opIonal)
config • Tags per key – Type – Tag: latency, funnel,… – DescripIon – Storage
Auto: Graphs, Glossary, & Storage • Graphs and dashboards
• * templates • Views and stats • Glossary • Batch analyIcs • Long term storage
build learn communicate inspire
Developers • Logging toolkit • Data pipeline
• Pain points • Outage causes • Deployment pracIces • EscalaIon playbook • Measurement as TDD • Monitor staging env
Business Analysts • Structured logs • Config
for ETL • Metrics definiIons • Slices and visualizaIons • Data size and cardinality • Outages and delays • Flexibility • VisualizaIon and tools
Data ScienIsts • Access to (meta)data • Query
monitoring • StaIsIcs and models • New data streams • Context of data issues • What’s in the logs • Validate algorithms • Teach stats and models!
Product & ExecuIves • Curated dashboards • Graph/alert
tools • Learn the business • PrioriIze alerts by $ • Incident post mortems • Metrics granularity • Data driven decisions • Recognize and celebrate
Monitoring can become the data plahorm and improve all
teams with its techniques.
Icons from The Noun Project: Dmitry Baranovskiy, Benjamin Orlovski, Luis
Prado, MikaDo Nguyen, Yarden Gilboa, Javier Cabezas, Icons Pusher, Jeremy Bristol, Blake Thomas, RiIka Khasgiwale, Mayene de Leon, Yorlmar Campos, Sergey Shmid @jeff_weinstein Thanks! hiring ;)