Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
280
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
Excelデータ分析で学ぶディメンショナルモデリング ~アジャイルデータモデリングへ向けて~ by @Kazaneya_PR / 20251126
kazaneya
PRO
3
790
AI 時代のデータ戦略
na0
8
3k
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
37k
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
970
Introduction to Bill One Development Engineer
sansan33
PRO
0
320
経営から紐解くデータマネジメント
pacocat
9
1.8k
レガシーシステム刷新における TypeSpec スキーマ駆動開発のすゝめ
tsukuha
4
890
"'TSのAPI型安全”の対価は誰が払う?不公平なスキーマ駆動に終止符を打つハイブリッド戦略
hal_spidernight
0
210
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
Eight Engineering Unit 紹介資料
sansan33
PRO
0
5.7k
Design System Documentation Tooling 2025
takanorip
1
850
"なるべくスケジューリングしない" を実現する "PreferNoSchedule" taint
superbrothers
0
130
Featured
See All Featured
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.7k
How to train your dragon (web standard)
notwaldorf
97
6.4k
It's Worth the Effort
3n
187
29k
Designing for humans not robots
tammielis
254
26k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
Writing Fast Ruby
sferik
630
62k
Git: the NoSQL Database
bkeepers
PRO
432
66k
The World Runs on Bad Software
bkeepers
PRO
72
12k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
69k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012