Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
99
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
AIエージェント、 社内展開の前に知っておきたいこと
oracle4engineer
PRO
2
150
TypeScript 7.0の現在地と備え方
uhyo
7
1.7k
[JAWSDAYS2026]Who is responsible for IAM
mizukibbb
0
820
20260311 ビジネスSWG活動報告(デジタルアイデンティティ人材育成推進WG Ph2 活動報告会)
oidfj
0
340
1GB RAMのラズピッピで何ができるのか試してみよう / 20260319-rpijam-1gb-rpi-whats-possible
akkiesoft
0
180
JAWSDAYS2026_A-6_現場SEが語る 回せるセキュリティ運用~設計で可視化、AIで加速する「楽に回る」運用設計のコツ~
shoki_hata
0
3k
Keycloak を使った SSO で CockroachDB にログインする / CockroachDB SSO with Keycloak
kota2and3kan
0
160
AI実装による「レビューボトルネック」を解消する仕様駆動開発(SDD)/ ai-sdd-review-bottleneck
rakus_dev
0
150
NewSQL_ ストレージ分離と分散合意を用いたスケーラブルアーキテクチャ
hacomono
PRO
4
380
複数クラスタ運用と検索の高度化:ビズリーチにおけるElastic活用事例 / ElasticON Tokyo2026
visional_engineering_and_design
0
170
"作る"から"使われる"へ:Backstage 活用の現在地
sbtechnight
0
180
JAWSDAYS2026 [C02] 楽しく学ぼう!AWSとは?AWSの歴史 入門
hiragahh
0
170
Featured
See All Featured
Optimizing for Happiness
mojombo
378
71k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
84
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
1
1.4k
Large-scale JavaScript Application Architecture
addyosmani
515
110k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
How to train your dragon (web standard)
notwaldorf
97
6.6k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.4k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
240
BBQ
matthewcrist
89
10k
WCS-LA-2024
lcolladotor
0
480
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license