Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
100
FGED Keynote
mndoci
3
93
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
180
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
10年の共創が示す、これからの開発者と企業の関係 ~ Crossroad
soracom
PRO
1
150
How to achieve interoperable digital identity across Asian countries
fujie
0
110
KMP の Swift export
kokihirokawa
0
330
OCI Network Firewall 概要
oracle4engineer
PRO
1
7.8k
Windows で省エネ
murachiakira
0
160
What is BigQuery?
aizack_harks
0
130
20250929_QaaS_vol20
mura_shin
0
110
Findy Team+のSOC2取得までの道のり
rvirus0817
0
310
o11yで育てる、強い内製開発組織
_awache
3
110
Modern_Data_Stack最新動向クイズ_買収_AI_激動の2025年_.pdf
sagara
0
190
ZOZOのAI活用実践〜社内基盤からサービス応用まで〜
zozotech
PRO
0
160
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
9k
Featured
See All Featured
Producing Creativity
orderedlist
PRO
347
40k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
114
20k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
890
Navigating Team Friction
lara
189
15k
The World Runs on Bad Software
bkeepers
PRO
71
11k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.7k
How to Ace a Technical Interview
jacobian
280
23k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license