Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
980
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
270
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
[TechNight #90-1] 本当に使える?ZDMの新機能を実践検証してみた
oracle4engineer
PRO
3
140
新卒3年目の後悔〜機械学習モデルジョブの運用を頑張った話〜
kameitomohiro
0
380
Prox Industries株式会社 会社紹介資料
proxindustries
0
200
Oracle Audit Vault and Database Firewall 20 概要
oracle4engineer
PRO
3
1.6k
25分で解説する「最小権限の原則」を実現するための AWS「ポリシー」大全
opelab
9
2.2k
In Praise of "Normal" Engineers (LDX3)
charity
3
1.2k
より良いプロダクトの開発を目指して - 情報を中心としたプロダクト開発 #phpcon #phpcon2025
bengo4com
0
340
Observability в PHP без боли. Олег Мифле, тимлид Altenar
lamodatech
0
270
Definition of Done
kawaguti
PRO
6
460
Oracle Cloud Infrastructure:2025年6月度サービス・アップデート
oracle4engineer
PRO
1
140
標準技術と独自システムで作る「つらくない」SaaS アカウント管理 / Effortless SaaS Account Management with Standard Technologies & Custom Systems
yuyatakeyama
2
1k
強化されたAmazon Location Serviceによる新機能と開発者体験
dayjournal
2
150
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
Building an army of robots
kneath
306
45k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
How to Ace a Technical Interview
jacobian
277
23k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Product Roadmaps are Hard
iamctodd
PRO
53
11k
Embracing the Ebb and Flow
colly
86
4.7k
Practical Orchestrator
shlominoach
188
11k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
How GitHub (no longer) Works
holman
314
140k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012