Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Data Science London
July 03, 2012
Technology
140
2
Share
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1.1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
140
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
290
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.4k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.2k
Other Decks in Technology
See All in Technology
Choose your own adventure in agentic design patterns
glaforge
0
160
Hacobu Tech Deck
hacobu
PRO
0
130
AgentCore Managed Harness を使ってみよう
yakumo
2
250
データを"持てない"環境でのアノテーション基盤設計
sansantech
PRO
1
150
基盤を育てる 外部SaaS連携の運用
gamonges_dresscode
1
120
Revisiting [CLS] and Patch Token Interaction in Vision Transformers
yu4u
0
400
Scovilleモバイルエンジニア募集中.pdf
julienrudin
0
120
Oracle Cloud Infrastructure:2026年4月度サービス・アップデート
oracle4engineer
PRO
0
120
「SaaSの次の時代」に重要性を増すステークホルダーマネジメントの要諦 ~解像度を圧倒的に高めPdMの価値を最大化させる方法~
kakehashi
PRO
3
2.7k
AIコーディング時代における、ソフトウェアサプライチェーン攻撃に対する防衛術(簡易版)
soysoysoyb
0
160
【技術書典20】OpenFOAM(自宅で深める流体解析)流れと熱移動(2)
kamakiri1225
0
280
AI時代 に増える データ活用先
takahal
0
330
Featured
See All Featured
What does AI have to do with Human Rights?
axbom
PRO
1
2.1k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.4k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
4 Signs Your Business is Dying
shpigford
187
22k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
430
Six Lessons from altMBA
skipperchong
29
4.2k
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
330
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
270
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
340
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012