Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Data Science London
July 03, 2012
Technology
150
2
Share
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1.1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
140
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
300
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.4k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.2k
Other Decks in Technology
See All in Technology
BigQuery の Cross-cloud Lakehouse への歩み
phaya72
2
330
最低限これだけ押さえれ大丈夫_Claude Enterprise/Team企業展開ガバナンス入門
tkikuchi
1
660
AIプラットフォームを運用し続けるための可観測性
tanimuyk
4
1k
価格.comをAI駆動で全面刷新する ー 30年分の技術的負債を返し、次の30年の土台をつくる ー / AI Engineering Summit Tokyo 2026
tkyowa
16
14k
『家族アルバム みてね』における インシデント対応との向き合い方 / Approach incident response in Family Album
kohbis
2
290
「コーディング」しない人のための Claude Code 入門 ChatGPT の次の一歩 — 業務に組み込む 育成・共有・自動化
rfdnxbro
2
1.1k
Java正規表現エンジン(NFA)の仕組みと パフォーマンスを維持するための最適化手法
takeuchi_132917
0
170
AI-DLCを活用した高品質・安全なAI駆動開発実践 / AI Driven Development
yoshidashingo
1
300
AI Engineering Summit Tokyo 2026 AIの前に、やることがある 〜医療データ企業の4フェーズ〜
dtaniwaki
0
920
Diagnosing performance problems without the guesswork
elenatanasoiu
0
150
Strands Agents超入門
kintotechdev
1
160
Generative UI × A2UI で AI エージェントを作った話 AI-DLC も使ってみた!
kmiya84377
1
310
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.8k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Odyssey Design
rkendrick25
PRO
2
660
4 Signs Your Business is Dying
shpigford
187
22k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
130
jQuery: Nuts, Bolts and Bling
dougneiner
66
8.5k
Rails Girls Zürich Keynote
gr2m
96
14k
The SEO identity crisis: Don't let AI make you average
varn
0
480
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
720
Prompt Engineering for Job Search
mfonobong
0
330
First, design no harm
axbom
PRO
2
1.2k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012