Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
100
Platforms for scientific data analysis
mndoci
3
91
FGED Keynote
mndoci
3
89
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
250
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
110
Other Decks in Technology
See All in Technology
Dynamic Reteaming And Self Organization
miholovesq
3
300
ソフトウェア開発現代史: "LeanとDevOpsの科学"の「科学」とは何か? - DORA Report 10年の変遷を追って - #DevOpsDaysTokyo
takabow
0
370
20250413_湘南kaggler会_音声認識で使うのってメルス・・・なんだっけ?
sugupoko
1
460
All You Need Is Kusa 〜Slackデータで始めるデータドリブン〜
jonnojun
0
150
Amazon S3 Tables + Amazon Athena / Apache Iceberg
okaru
0
260
AIコーディングの最前線 〜活用のコツと課題〜
pharma_x_tech
1
300
フロントエンドも盛り上げたい!フロントエンドCBとAmplifyの軌跡
mkdev10
2
270
Classmethod AI Talks(CATs) #20 司会進行スライド(2025.04.10) / classmethod-ai-talks-aka-cats_moderator-slides_vol20_2025-04-10
shinyaa31
0
150
LLM とプロンプトエンジニアリング/チューターをビルドする / LLM, Prompt Engineering and Building Tutors
ks91
PRO
1
250
Micro Frontends: Necessity, Implementation, and Challenges
rainerhahnekamp
2
470
サーバレス、コンテナ、データベース特化型機能をご紹介。CloudWatch をもっと使いこなそう!
o11yfes2023
0
150
こんなデータマートは嫌だ。どんな? / waiwai-data-meetup-202504
shuntak
6
1.9k
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2k
The Cost Of JavaScript in 2023
addyosmani
49
7.7k
Being A Developer After 40
akosma
91
590k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.2k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Optimising Largest Contentful Paint
csswizardry
36
3.2k
4 Signs Your Business is Dying
shpigford
183
22k
[RailsConf 2023] Rails as a piece of cake
palkan
54
5.4k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
119
51k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
178
53k
The Cult of Friendly URLs
andyhume
78
6.3k
GraphQLの誤解/rethinking-graphql
sonatard
71
10k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
deesingh@amazon.com Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license