Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
93
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
法人支出管理領域におけるソフトウェアアーキテクチャに基づいたテスト戦略の実践
ogugu9
1
110
あなたの知らないDateのひみつ / The Secret of "Date" You Haven't known #tqrk16
expajp
0
110
こがヘンだよ!Snowflake?サービス名称へのこだわり
tarotaro0129
0
110
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
980
オープンデータの内製化から分かったGISデータを巡る行政の課題
naokim84
2
1.3k
段階的に進める、 挫折しない自宅サーバ入門
yu_kod
5
2.2k
HIG学習用スライド
yuukiw00w
0
110
プラットフォームエンジニアリングとは何であり、なぜプラットフォームエンジニアリングなのか
doublemarket
1
550
“決まらない”NSM設計への処方箋 〜ビットキーにおける現実的な指標デザイン事例〜 / A Prescription for "Stuck" NSM Design: Bitkey’s Practical Case Study
bitkey
PRO
1
340
なぜ使われないのか?──定量×定性で見極める本当のボトルネック
kakehashi
PRO
1
760
Playwrightのソースコードに見る、自動テストを自動で書く技術
yusukeiwaki
2
720
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3k
Code Review Best Practice
trishagee
73
19k
Raft: Consensus for Rubyists
vanstee
140
7.2k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Speed Design
sergeychernyshev
33
1.4k
The Cult of Friendly URLs
andyhume
79
6.7k
Become a Pro
speakerdeck
PRO
30
5.7k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.3k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
A designer walks into a library…
pauljervisheath
210
24k
Six Lessons from altMBA
skipperchong
29
4.1k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license