Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
130
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
930
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
120
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
260
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
AGIについてChatGPTに聞いてみた
blueb
0
130
CysharpのOSS群から見るModern C#の現在地
neuecc
2
3.5k
10XにおけるData Contractの導入について: Data Contract事例共有会
10xinc
6
660
マルチモーダル / AI Agent / LLMOps 3つの技術トレンドで理解するLLMの今後の展望
hirosatogamo
37
12k
ExaDB-D dbaascli で出来ること
oracle4engineer
PRO
0
3.9k
適材適所の技術選定 〜GraphQL・REST API・tRPC〜 / Optimal Technology Selection
kakehashi
1
690
iOSチームとAndroidチームでブランチ運用が違ったので整理してます
sansantech
PRO
0
150
TypeScript、上達の瞬間
sadnessojisan
46
13k
AIチャットボット開発への生成AI活用
ryomrt
0
170
Application Development WG Intro at AppDeveloperCon
salaboy
0
190
rootlessコンテナのすゝめ - 研究室サーバーでもできる安全なコンテナ管理
kitsuya0828
3
390
EventHub Startup CTO of the year 2024 ピッチ資料
eventhub
0
120
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Optimising Largest Contentful Paint
csswizardry
33
2.9k
Documentation Writing (for coders)
carmenintech
65
4.4k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
665
120k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
A Tale of Four Properties
chriscoyier
156
23k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
506
140k
Into the Great Unknown - MozCon
thekraken
32
1.5k
Navigating Team Friction
lara
183
14k
Thoughts on Productivity
jonyablonski
67
4.3k
BBQ
matthewcrist
85
9.3k
Building Your Own Lightsaber
phodgson
103
6.1k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012