Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
280
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
ガチな登山用デバイスからこんにちは
halka
1
230
研究開発と製品開発、両利きのロボティクス
youtalk
1
510
Agile PBL at New Grads Trainings
kawaguti
PRO
1
380
20250910_障害注入から効率的復旧へ_カオスエンジニアリング_生成AIで考えるAWS障害対応.pdf
sh_fk2
3
190
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
3
260
DDD集約とサービスコンテキスト境界との関係性
pandayumi
2
280
La gouvernance territoriale des données grâce à la plateforme Terreze
bluehats
0
150
20250903_1つのAWSアカウントに複数システムがある環境におけるアクセス制御をABACで実現.pdf
yhana
3
530
機械学習を扱うプラットフォーム開発と運用事例
lycorptech_jp
PRO
0
220
Rustから学ぶ 非同期処理の仕組み
skanehira
1
130
生成AI時代のデータ基盤設計〜ペースレイヤリングで実現する高速開発と持続性〜 / Levtech Meetup_Session_2
sansan_randd
1
150
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
330
Featured
See All Featured
Building Adaptive Systems
keathley
43
2.7k
RailsConf 2023
tenderlove
30
1.2k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
Done Done
chrislema
185
16k
How STYLIGHT went responsive
nonsquared
100
5.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
187
55k
We Have a Design System, Now What?
morganepeng
53
7.8k
How to Think Like a Performance Engineer
csswizardry
26
1.9k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Testing 201, or: Great Expectations
jmmastey
45
7.6k
Making Projects Easy
brettharned
117
6.4k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012