Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
180
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
92
Platforms for scientific data analysis
mndoci
3
78
FGED Keynote
mndoci
3
67
Open Mic Science - May 7, 2012
mndoci
4
1.2k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
240
A Platform for Data Science
mndoci
6
13k
Intel Theater Presentation @ SC11
mndoci
6
160
Talk at West Coast Association of Shared Directors meeting
mndoci
3
140
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
98
Other Decks in Technology
See All in Technology
任意コード実行の原理
ffri
0
170
SREsのためのSRE定着ガイド
netmarkjp
10
1.6k
技術広報として2023年度に頑張ったこと / What we did well in FY2023 as a DevRel
pauli
5
460
Tohoku.Tech #1 「Cursorを使ったRaspberry Piの開発」by ねこまた
jun2882
0
250
なんで私に登壇依頼が?! ~頼られるエンジニアになるためには~ /
mixi_engineers
PRO
2
200
生成AIの不確実性と向き合うためのオブジェクト指向設計
tkikuchi1002
2
660
検証からプロダクトへ: シームレスなLLM開発の ためのしくみ作り
nunukim
1
160
Cloud Friendly(?) Jenkins. How we failed to make Jenkins cloud native and what we learned?
onenashev
PRO
0
110
TCA入門したてなので、自分が馴染みのある実装と比較しながらキャッチアップしてみる
fumiyasac0921
1
370
チーム単位で保守性を高める:独自指標と向上にむけた実践
tarappo
0
300
ビジネスとコード品質の接合点 そしてコード品質がそこに及ぼす影響 / The Intersections of Business and Engineering, and The Impact of Code Quality There
mtx2s
10
1k
複数の LLM モデルを扱う上で直面した辛みまとめ
kazuyaseki
1
230
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1023
450k
We Have a Design System, Now What?
morganepeng
42
6.7k
Web development in the modern age
philhawksworth
201
10k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
14
1.3k
No one is an island. Learnings from fostering a developers community.
thoeni
14
2k
The Illustrated Children's Guide to Kubernetes
chrisshort
28
46k
Statistics for Hackers
jakevdp
789
220k
Why Our Code Smells
bkeepers
PRO
330
56k
What’s in a name? Adding method to the madness
productmarketing
PRO
14
2.6k
How to name files
jennybc
62
92k
GraphQLとの向き合い方2022年版
quramy
28
12k
Embracing the Ebb and Flow
colly
78
4.1k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license