Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Human Cloning: The Data Scientist Bottleneck Re...
Search
Data Science London
July 03, 2012
Technology
2
140
Human Cloning: The Data Scientist Bottleneck Resolved
Presentation by Dr. Alex Farquhar, Data Scientist @ForwardTek at Data Science London 22/02/12
Data Science London
July 03, 2012
Tweet
Share
More Decks by Data Science London
See All by Data Science London
Semi-Supervised Anomaly Detection
datasciencelondon
0
1k
Hacking the Rail: Ingesting, analysing & visualising realtime streaming data
datasciencelondon
1
47k
Stateful Data-Parallel Processing
datasciencelondon
0
47k
Semantic web warmed up: Ontologies for the IoT
datasciencelondon
0
130
IoT data ingestion pipelines and Clojure transducers
datasciencelondon
0
280
TrendCalculus: A data science for trends
datasciencelondon
1
48k
Data Science in Mobile Health
datasciencelondon
1
8.3k
Large-scale Recommender Systems on Just a PC (with GraphChi)
datasciencelondon
1
17k
Taming Graph Dynamics at Scale
datasciencelondon
0
8.1k
Other Decks in Technology
See All in Technology
SOC2取得の全体像
shonansurvivors
1
360
組織観点からIAM Identity CenterとIAMの設計を考える
nrinetcom
PRO
1
160
AI Agentと MCP Serverで実現する iOSアプリの 自動テスト作成の効率化
spiderplus_cb
0
470
FastAPIの魔法をgRPC/Connect RPCへ
monotaro
PRO
1
700
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
9k
履歴 on Rails: Bitemporal Data Modelで実現する履歴管理/history-on-rails-with-bitemporal-data-model
hypermkt
0
2k
いま注目しているデータエンジニアリングの論点
ikkimiyazaki
0
580
生成AIを活用したZennの取り組み事例
ryosukeigarashi
0
200
From Prompt to Product @ How to Web 2025, Bucharest, Romania
janwerner
0
110
KAGのLT会 #8 - 東京リージョンでGAしたAmazon Q in QuickSightを使って、報告用の資料を作ってみた
0air
0
200
20250929_QaaS_vol20
mura_shin
0
110
動画データのポテンシャルを引き出す! Databricks と AI活用への奮闘記(現在進行形)
databricksjapan
0
140
Featured
See All Featured
It's Worth the Effort
3n
187
28k
Building Applications with DynamoDB
mza
96
6.6k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Reflections from 52 weeks, 52 projects
jeffersonlam
352
21k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
We Have a Design System, Now What?
morganepeng
53
7.8k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
What’s in a name? Adding method to the madness
productmarketing
PRO
23
3.7k
4 Signs Your Business is Dying
shpigford
185
22k
Visualization
eitanlees
148
16k
Agile that works and the tools we love
rasmusluckow
331
21k
Transcript
HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar
Friday, 24 February 2012
0 5,000 10,000 15,000 20,000 2008 2009 2010 2011 2012
2013 2014 2015 2016 2017 exabytes data (IDC/EMC report 2008) Friday, 24 February 2012
By 2018, the United States alone could face a shortage
of 140,000 to 190,000 data people... Friday, 24 February 2012
WE’RE ALL DOOMED Friday, 24 February 2012
DATA PEOPLE? © Drew Conway Friday, 24 February 2012
MAYBE WE CAN JUST.... • 1 statistician + 1 developer
≈ 1 data scientist? Friday, 24 February 2012
HOW ABOUT.... • 4 statisticians + 4 developers ≈ 4
Data Scientists? Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO? • Train more new data scientists
(not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
WHAT CAN WE DO? • Do more work Friday, 24
February 2012
DOING MORE • simplify (fob the work off) • automate
(fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
SIMPLIFY & AUTOMATE • Counting stuff is not much fun
Friday, 24 February 2012
Hive Hadoop TSV files SIMPLIFY & AUTOMATE Friday, 24 February
2012
AUTOMATE / PARALLELISE Hadoop Job magic Friday, 24 February 2012
AUTOMATE / PARALLELISE Lots of jobs at once Job 1
Job 2 Job 3 Job 4 Hadoop magic Friday, 24 February 2012
TOOLS • something thats allows fast iteration i.e. not java
• R, ruby, python Friday, 24 February 2012
PARALLELISE Friday, 24 February 2012
ITERATE • try different things • improve what works •
dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
WE’RE NOT ALL DOOMED Friday, 24 February 2012