Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
93
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
日々のSlackアラート確認運用をCustom Chat Modesで楽にした話 / 日々のSlackアラート確認運用をCustom Chat Modesで楽にした話
imamotohikaru
0
300
Copilotの精度を上げる!カスタムプロンプト入門.pdf
ismk
10
3.2k
エンジニア採用と 技術広報の取り組みと注力点/techpr1112
nishiuma
0
130
[JDDStudy #10] 社内Agent勉強会の取り組み紹介
yp_genzitsu
1
130
CloudFormationコンソールから、実際に作られたリソースを辿れるようになろう!
amixedcolor
0
110
QAEが生成AIと越える、ソフトウェア開発の境界線
rinchsan
0
1k
自己的售票系統自己做!
eddie
0
370
フライトコントローラPX4の中身(制御器)を覗いてみた
santana_hammer
1
140
從裝潢設計圖到 Home Assistant:打造智慧家庭的實戰與踩坑筆記
kewang
0
160
プロダクトエンジニアとしてのマインドセットの育み方 / How to improve product engineer mindset
saka2jp
2
210
プログラミング言語を書く前に日本語を書く── AI 時代に求められる「言葉で考える」力/登壇資料(井田 献一朗)
hacobu
PRO
0
120
【AWS reInvent 2025 関西組 事前勉強会】re:Inventの“感動と興奮”を思い出してモチベ爆上げしたいです
ttelltte
0
130
Featured
See All Featured
How to Think Like a Performance Engineer
csswizardry
28
2.3k
Fireside Chat
paigeccino
41
3.7k
Designing for Performance
lara
610
69k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.2k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.2k
Making Projects Easy
brettharned
120
6.4k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
Code Reviewing Like a Champion
maltzj
527
40k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
Raft: Consensus for Rubyists
vanstee
140
7.2k
For a Future-Friendly Web
brad_frost
180
10k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license