Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1.1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
290
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
顧客との商談議事録をみんなで読んで顧客解像度を上げよう
shibayu36
0
260
ブロックテーマ、WordPress でウェブサイトをつくるということ / 2026.02.07 Gifu WordPress Meetup
torounit
0
190
プロダクト成長を支える開発基盤とスケールに伴う課題
yuu26
4
1.3k
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
13k
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
380
コスト削減から「セキュリティと利便性」を担うプラットフォームへ
sansantech
PRO
3
1.5k
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
140
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
360
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
620
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
3.9k
ランサムウェア対策としてのpnpm導入のススメ
ishikawa_satoru
0
180
Webhook best practices for rock solid and resilient deployments
glaforge
2
300
Featured
See All Featured
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
130
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
380
Six Lessons from altMBA
skipperchong
29
4.2k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
What's in a price? How to price your products and services
michaelherold
247
13k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Optimizing for Happiness
mojombo
379
71k
[SF Ruby Conf 2025] Rails X
palkan
1
760
Google's AI Overviews - The New Search
badams
0
910
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012