Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
990
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
280
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
ZOZOTOWNの大規模マーケティングメール配信を支えるアーキテクチャ
zozotech
PRO
0
350
React Server ComponentsでAPI不要の開発体験
polidog
PRO
0
260
「AIと一緒にやる」が当たり前になるまでの奮闘記
kakehashi
PRO
3
150
リリース2ヶ月で収益化した話
kent_code3
1
290
Nx × AI によるモノレポ活用 〜コードジェネレーター編〜
puku0x
0
580
ユーザー課題を愛し抜く――AI時代のPdM価値
kakehashi
PRO
1
120
家族の思い出を形にする 〜 1秒動画の生成を支えるインフラアーキテクチャ
ojima_h
3
1.1k
AIエージェントを現場で使う / 2025.08.07 著者陣に聞く!現場で活用するためのAIエージェント実践入門(Findyランチセッション)
smiyawaki0820
6
1.1k
Claude CodeでKiroの仕様駆動開発を実現させるには...
gotalab555
3
1.1k
全員が手を動かす組織へ - 生成AIが変えるTVerの開発現場 / everyone-codes-genai-transforms-tver-development
tohae
0
190
2時間で300+テーブルをデータ基盤に連携するためのAI活用 / FukuokaDataEngineer
sansan_randd
0
150
LLMをツールからプラットフォームへ〜Ai Workforceの戦略〜 #BetAIDay
layerx
PRO
1
980
Featured
See All Featured
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
880
KATA
mclloyd
32
14k
Making the Leap to Tech Lead
cromwellryan
134
9.5k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
47
9.6k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.6k
Gamification - CAS2011
davidbonilla
81
5.4k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.8k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012