Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
95
Platforms for scientific data analysis
mndoci
3
85
FGED Keynote
mndoci
3
85
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
140
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
100
Other Decks in Technology
See All in Technology
あの日俺達が夢見たサーバレスアーキテクチャ/the-serverless-architecture-we-dreamed-of
tomoki10
0
440
Postman と API セキュリティ / Postman and API Security
yokawasa
0
200
KnowledgeBaseDocuments APIでベクトルインデックス管理を自動化する
iidaxs
1
260
TSKaigi 2024 の登壇から広がったコミュニティ活動について
tsukuha
0
160
フロントエンド設計にモブ設計を導入してみた / 20241212_cloudsign_TechFrontMeetup
bengo4com
0
1.9k
社内イベント管理システムを1週間でAKSからACAに移行した話し
shingo_kawahara
0
180
複雑性の高いオブジェクト編集に向き合う: プラガブルなReactフォーム設計
righttouch
PRO
0
110
Opcodeを読んでいたら何故かphp-srcを読んでいた話
murashotaro
0
180
PHP ユーザのための OpenTelemetry 入門 / phpcon2024-opentelemetry
shin1x1
1
190
新機能VPCリソースエンドポイント機能検証から得られた考察
duelist2020jp
0
220
統計データで2024年の クラウド・インフラ動向を眺める
ysknsid25
2
840
Amazon Kendra GenAI Index 登場でどう変わる? 評価から学ぶ最適なRAG構成
naoki_0531
0
110
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
45
7k
The Power of CSS Pseudo Elements
geoffreycrofte
73
5.4k
Building Your Own Lightsaber
phodgson
103
6.1k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
5
440
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Git: the NoSQL Database
bkeepers
PRO
427
64k
Building Adaptive Systems
keathley
38
2.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
170
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
Six Lessons from altMBA
skipperchong
27
3.5k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license