Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
140
2
Share
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1.1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
290
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.2k
Other Decks in Technology
See All in Technology
ハーネスエンジニアリング×AI適応開発
aictokamiya
3
1.4k
Oracle Cloud Infrastructure:2026年3月度サービス・アップデート
oracle4engineer
PRO
0
340
Sansanの認証基盤を支えるアーキテクチャとその振り返り
sansantech
PRO
1
150
「できない」のアウトプット 同人誌『精神を壊してからの』シリーズ出版を 通して得られたこと
comi190327
3
550
40代からのアウトプット ― 経験は価値ある学びに変わる / 20260404 Naoki Takahashi
shift_evolve
PRO
5
800
AWS DevOps Agent or Kiro の使いどころを考える_20260402
masakiokuda
0
150
第26回FA設備技術勉強会 - Claude/Claude_codeでデータ分析 -
happysamurai294
0
360
The essence of decision-making lies in primary data
kaminashi
0
240
推し活エージェント
yuntan_t
1
720
Databricks Lakehouse Federationで 運用負荷ゼロのデータ連携
nek0128
0
110
Babylon.js Japan Activities (2026/4)
limes2018
0
160
「活動」は激変する。「ベース」は変わらない ~ 4つの軸で捉える_AI時代ソフトウェア開発マネジメント
sentokun
0
140
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
50
14k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
250
Typedesign – Prime Four
hannesfritz
42
3k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Amusing Abliteration
ianozsvald
1
150
Become a Pro
speakerdeck
PRO
31
5.9k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.1k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.7k
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.4k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
190
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012