Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
97
Platforms for scientific data analysis
mndoci
3
87
FGED Keynote
mndoci
3
88
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
100
Other Decks in Technology
See All in Technology
事業を差別化する技術を生み出す技術
pyama86
2
520
データベースの負荷を紐解く/untangle-the-database-load
emiki
2
550
DevinでAI AWSエンジニア製造計画 序章 〜CDKを添えて〜/devin-load-to-aws-engineer
tomoki10
0
210
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
3
540
Exadata Database Service on Cloud@Customer セキュリティ、ネットワーク、および管理について
oracle4engineer
PRO
2
1.6k
OPENLOGI Company Profile for engineer
hr01
1
20k
大規模アジャイルフレームワークから学ぶエンジニアマネジメントの本質
staka121
PRO
3
1.6k
RayでPHPのデバッグをちょっと快適にする
muno92
PRO
0
200
DeepSeekとは?何がいいの? - Databricksと学ぶDeepSeek! 〜これからのLLMに備えよ!〜
taka_aki
1
180
Amazon Q Developerの無料利用枠を使い倒してHello worldを表示させよう!
nrinetcom
PRO
2
120
エンジニア主導の企画立案を可能にする組織とは?
recruitengineers
PRO
1
310
AIエージェント開発のノウハウと課題
pharma_x_tech
9
4.8k
Featured
See All Featured
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
6
580
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
Practical Orchestrator
shlominoach
186
10k
Thoughts on Productivity
jonyablonski
69
4.5k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
100
18k
Rebuilding a faster, lazier Slack
samanthasiow
80
8.9k
GitHub's CSS Performance
jonrohan
1030
460k
Designing for humans not robots
tammielis
250
25k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.3k
How to train your dragon (web standard)
notwaldorf
91
5.9k
Music & Morning Musume
bryan
46
6.4k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
deesingh@amazon.com Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license