Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The Hardest Problem in Data
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Ronnie Chen
August 24, 2017
Technology
0
230
The Hardest Problem in Data
Ronnie Chen
August 24, 2017
Tweet
Share
More Decks by Ronnie Chen
See All by Ronnie Chen
ChaosConf 2018
ronnieftw
4
1.8k
devopsdays MSP 2018: Staying Alive
ronnieftw
1
660
Luck Driven Development: Building for Serendipity in Slack's Data Platform
ronnieftw
1
510
Staying Alive: Patterns for Failure Management From the Bottom of the Ocean
ronnieftw
0
270
Scaling Data at Slack: A Series of Unfortunate Events
ronnieftw
0
1.7k
Other Decks in Technology
See All in Technology
OWASP Top 10:2025 リリースと 少しの日本語化にまつわる裏話
okdt
PRO
3
740
Oracle Cloud Observability and Management Platform - OCI 運用監視サービス概要 -
oracle4engineer
PRO
2
14k
MCPでつなぐElasticsearchとLLM - 深夜の障害対応を楽にしたい / Bridging Elasticsearch and LLMs with MCP
sashimimochi
0
170
SREじゃなかった僕らがenablingを通じて「SRE実践者」になるまでのリアル / SRE Kaigi 2026
aeonpeople
6
2.3k
What happened to RubyGems and what can we learn?
mikemcquaid
0
300
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
170
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
130
GitHub Issue Templates + Coding Agentで簡単みんなでIaC/Easy IaC for Everyone with GitHub Issue Templates + Coding Agent
aeonpeople
1
220
AzureでのIaC - Bicep? Terraform? それ早く言ってよ会議
torumakabe
1
540
仕様書駆動AI開発の実践: Issue→Skill→PRテンプレで 再現性を作る
knishioka
2
650
AI駆動開発を事業のコアに置く
tasukuonizawa
1
180
量子クラウドサービスの裏側 〜Deep Dive into OQTOPUS〜
oqtopus
0
110
Featured
See All Featured
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
81
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
57
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
430
Measuring & Analyzing Core Web Vitals
bluesmoon
9
750
A Tale of Four Properties
chriscoyier
162
24k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
310
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
YesSQL, Process and Tooling at Scale
rocio
174
15k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
320
Code Review Best Practice
trishagee
74
20k
Designing Experiences People Love
moore
144
24k
Transcript
The Hardest Problem in Data Ronnie Chen @rondoftw Data Engineering
Slack 1 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
2 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
→ Machine learning → Predictive modeling → Neural networks →
Artificial intelligence 3 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Counting ?! 4 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
5 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
A simple counting problem 6 — WriteSpeakCode 2017 | Ronnie
Chen @rondoftw
The Rules: 1. Only one number 2. Convince me it's
correct 7 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
How many friends do you have? 8 — WriteSpeakCode 2017
| Ronnie Chen @rondoftw
Will I get the same number if... !"#$ I ask
every person you know if they consider you their friend? 9 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Will I get the same number if... ! " I
ask every person that knows you if they think you would consider them a friend? 10 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Is this the number of people that you'd tell a
secret to? 11 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
But it depends!! 12 — WriteSpeakCode 2017 | Ronnie Chen
@rondoftw
How many users do we have? 13 — WriteSpeakCode 2017
| Ronnie Chen @rondoftw
SELECT COUNT(*) FROM prod.users 14 — WriteSpeakCode 2017 | Ronnie
Chen @rondoftw
user_id name email deleted 1 Alice alice@*** 2 Bob bob@***
true 3 Carol 15 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
SELECT COUNT(*) FROM prod.users WHERE deleted != true AND email
!= null 16 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
SELECT COUNT(*) FROM prod.users WHERE last_active > 2017-07-24 17 —
WriteSpeakCode 2017 | Ronnie Chen @rondoftw
user_id email 12334
[email protected]
38602
[email protected]
52981
[email protected]
67640
[email protected]
18 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
¯\_(ϑ)_/¯ 19 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
What are you not even aware of? 20 — WriteSpeakCode
2017 | Ronnie Chen @rondoftw
Okay, I get it. But what's the big deal? 21
— WriteSpeakCode 2017 | Ronnie Chen @rondoftw
26% of professional computing jobs were held by women in
2016 22 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
23 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Numbers give you authority and the appearance of objectivity 24
— WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Counting is power. 25 — WriteSpeakCode 2017 | Ronnie Chen
@rondoftw
26 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Counts can determine funding, set agendas, and shift priorities 27
— WriteSpeakCode 2017 | Ronnie Chen @rondoftw
Machine learning is like money laundering for bias — Maciej
Cegłowski, founder of @Pinboard 28 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
29 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
30 — WriteSpeakCode 2017 | Ronnie Chen @rondoftw
What you count determines what is important. 31 — WriteSpeakCode
2017 | Ronnie Chen @rondoftw