Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
110
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
98
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
170
予期せぬコストの急増を障害のように扱う――「コスト版ポストモーテム」の導入とその後の改善
muziyoshiz
1
1.7k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
茨城の思い出を振り返る ~CDKのセキュリティを添えて~ / 20260201 Mitsutoshi Matsuo
shift_evolve
PRO
1
220
【5分でわかる】セーフィー エンジニア向け会社紹介
safie_recruit
0
42k
We Built for Predictability; The Workloads Didn’t Care
stahnma
0
140
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
160
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
Context Engineeringの取り組み
nutslove
0
320
セキュリティについて学ぶ会 / 2026 01 25 Takamatsu WordPress Meetup
rocketmartue
1
290
Featured
See All Featured
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
250
The Curious Case for Waylosing
cassininazir
0
230
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
170
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
910
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
122
21k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
130
A Modern Web Designer's Workflow
chriscoyier
698
190k
Practical Orchestrator
shlominoach
191
11k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license