Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
97
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
Embedded SREの終わりを設計する 「なんとなく」から計画的な自立支援へ
sansantech
PRO
2
550
MySQLのJSON機能の活用術
ikomachi226
0
120
3分でわかる!新機能 AWS Transform custom
sato4mi
1
270
学生・新卒・ジュニアから目指すSRE
hiroyaonoe
1
260
Digitization部 紹介資料
sansan33
PRO
1
6.7k
Regional_NAT_Gatewayについて_basicとの違い_試した内容スケールアウト_インについて_IPv6_dual_networkでの使い分けなど.pdf
cloudevcode
1
200
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
漸進的過負荷の原則
sansantech
PRO
3
430
システムのアラート調査をサポートするAI Agentの紹介/Introduction to an AI Agent for System Alert Investigation
taddy_919
2
1.1k
開発メンバーが語るFindy Conferenceの裏側とこれから
sontixyou
2
420
フロントエンド開発者のための「厄払い」
optim
0
180
Data Hubグループ 紹介資料
sansan33
PRO
0
2.7k
Featured
See All Featured
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
52
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
53
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
210
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
0
260
Utilizing Notion as your number one productivity tool
mfonobong
2
210
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
49
Leo the Paperboy
mayatellez
4
1.4k
A Tale of Four Properties
chriscoyier
162
24k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license