Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
97
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
[Data & AI Summit '25 Fall] AIでデータ活用を進化させる!Google Cloudで作るデータ活用の未来
kirimaru
0
4.2k
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.9k
『君の名は』と聞く君の名は。 / Your name, you who asks for mine.
nttcom
1
140
2025-12-27 Claude CodeでPRレビュー対応を効率化する@機械学習社会実装勉強会第54回
nakamasato
4
1.4k
形式手法特論:コンパイラの「正しさ」は証明できるか? #burikaigi / BuriKaigi 2026
ytaka23
14
3.5k
Authlete で実装する MCP OAuth 認可サーバー #CIMD の実装を添えて
watahani
0
390
20251225_たのしい出張報告&IgniteRecap!
ponponmikankan
0
110
テストセンター受験、オンライン受験、どっちなんだい?
yama3133
0
200
Introduction to Bill One Development Engineer
sansan33
PRO
0
340
Java 25に至る道
skrb
3
140
First-Principles-of-Scrum
hiranabe
2
980
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
4
21k
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
A Modern Web Designer's Workflow
chriscoyier
698
190k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
1
220
What the history of the web can teach us about the future of AI
inesmontani
PRO
0
390
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
61
47k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
2.8k
Site-Speed That Sticks
csswizardry
13
1k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
41
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Paper Plane (Part 1)
katiecoart
PRO
0
2.7k
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license