Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Resolved
Search
Data Science London
July 03, 2012
Technology
2
130
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
870
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
110
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
250
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
Algyan イベント振り返り
linyixian
0
180
【SORACOM UG】SIM Deep Dive セキュアエレメント編
soracom
PRO
0
250
0→1開発における技術選定において一番大切なこと
bicstone
1
320
プロデザ! BY リクルート vol.18_リクルートのリサーチ実践組織「リサーチブーストコミュニティ」
recruitengineers
PRO
2
230
[2024年3月版] Databricksのシステムアーキテクチャ
databricksjapan
7
1.9k
プロトタイピングによる不確実性の低減 / Reducing Uncertainty through Prototyping
ohbarye
3
210
「ふりかえりのふりかえり」をふりかえり、実のあるふりかえりにする
naitosatoshi
0
220
LLM とプロンプトエンジニアリング/チューターをビルドする / LLM and Prompt Engineering and Building Tutors
ks91
PRO
0
220
シン・Kafka / shin-kafka
oracle4engineer
PRO
6
2.7k
キャラクター制御のためのプロンプト術 for LINE Bot
uezo
0
520
2024-04-06 AMeDAS to Lagoon SORACOM UG 2024-04-06
anysonica
0
120
なぜ NOT A HOTEL が Web3 に取り組むのか - NOT A HOTEL TECH TALK
ynunokawa
0
160
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
319
23k
StorybookのUI Testing Handbookを読んだ
zakiyama
10
4.6k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
1
3.4k
Building Flexible Design Systems
yeseniaperezcruz
318
37k
Robots, Beer and Maslow
schacon
PRO
154
7.9k
How STYLIGHT went responsive
nonsquared
92
4.8k
Intergalactic Javascript Robots from Outer Space
tanoku
266
26k
Rails Girls Zürich Keynote
gr2m
91
13k
A Philosophy of Restraint
colly
195
16k
No one is an island. Learnings from fostering a developers community.
thoeni
14
2.1k
Unsuck your backbone
ammeep
662
57k
The Language of Interfaces
destraynor
151
23k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012