Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
95
Platforms for scientific data analysis
mndoci
3
85
FGED Keynote
mndoci
3
87
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
140
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
100
Other Decks in Technology
See All in Technology
実践! ソフトウェアエンジニアリングの価値の計測 ── Effort、Output、Outcome、Impact
nomuson
0
2.1k
自社 200 記事を元に整理した読みやすいテックブログを書くための Tips 集
masakihirose
2
330
0→1事業こそPMは営業すべし / pmconf #落選お披露目 / PM should do sales in zero to one
roki_n_
PRO
1
1.5k
データ基盤におけるIaCの重要性とその運用
mtpooh
4
530
Git scrapingで始める継続的なデータ追跡 / Git Scraping
ohbarye
5
500
KMP with Crashlytics
sansantech
PRO
0
240
iPadOS18でフローティングタブバーを解除してみた
sansantech
PRO
1
140
embedパッケージを深掘りする / Deep Dive into embed Package in Go
task4233
1
220
2025年の挑戦 コーポレートエンジニアの技術広報/techpr5
nishiuma
0
140
2024年活動報告会(人材育成推進WG・ビジネスサブWG) / 20250114-OIDF-J-EduWG-BizSWG
oidfj
0
230
ABWGのRe:Cap!
hm5ug
1
120
デジタルアイデンティティ人材育成推進ワーキンググループ 翻訳サブワーキンググループ 活動報告 / 20250114-OIDF-J-EduWG-TranslationSWG
oidfj
0
540
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Building Applications with DynamoDB
mza
93
6.2k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Practical Orchestrator
shlominoach
186
10k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
570
Why Our Code Smells
bkeepers
PRO
335
57k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
44
9.4k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
10
870
Making the Leap to Tech Lead
cromwellryan
133
9k
Gamification - CAS2011
davidbonilla
80
5.1k
Testing 201, or: Great Expectations
jmmastey
41
7.2k
Designing Experiences People Love
moore
139
23k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license