Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
91
FGED Keynote
mndoci
3
89
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
AWSのマルチアカウント管理 ベストプラクティス最新版 2025 / Multi-Account management on AWS best practice 2025
ohmura
4
310
Mastraに入門してみた ~AWS CDKを添えて~
tsukuboshi
0
280
ElixirがHW化され、最新CPU/GPU/NWを過去のものとする数万倍、高速+超省電力化されたWeb/動画配信/AIが動く日
piacerex
0
150
Cross Data Platforms Meetup LT 20250422
tarotaro0129
1
700
PagerDuty×ポストモーテムで築く障害対応文化/Building a culture of incident response with PagerDuty and postmortems
aeonpeople
2
350
AIでめっちゃ便利になったけど、結局みんなで学ぶよねっていう話
kakehashi
PRO
0
280
SDカードフォレンジック
su3158
1
630
Linuxのパッケージ管理とアップデート基礎知識
go_nishimoto
0
390
プロダクト開発におけるAI時代の開発生産性
shnjtk
2
240
Spring Bootで実装とインフラをこれでもかと分離するための試み
shintanimoto
7
860
意思決定を支える検索体験を目指してやってきたこと
hinatades
PRO
0
220
AI Agentを「期待通り」に動かすために:設計アプローチの模索と現在地
kworkdev
PRO
2
470
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
229
18k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
13
1.4k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
How GitHub (no longer) Works
holman
314
140k
Making the Leap to Tech Lead
cromwellryan
133
9.2k
Reflections from 52 weeks, 52 projects
jeffersonlam
349
20k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
670
Rails Girls Zürich Keynote
gr2m
94
13k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Building Adaptive Systems
keathley
41
2.5k
Practical Orchestrator
shlominoach
186
11k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license