Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
200
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
110
FGED Keynote
mndoci
3
99
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
260
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
190
Talk at West Coast Association of Shared Directors meeting
mndoci
3
150
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
オンプレとGoogle Cloudを安全に繋ぐための、セキュア通信の勘所
waiwai2111
3
1.1k
Kaggleの経験が実務にどう活きているか / kaggle_findy
sansan_randd
4
720
Digitization部 紹介資料
sansan33
PRO
1
7k
LLM活用の壁を超える:リクルートR&Dの戦略と打ち手
recruitengineers
PRO
1
240
クラウド時代における一時権限取得
krrrr38
1
160
三菱UFJ銀行におけるエンタープライズAI駆動開発のリアル / Enterprise AI_Driven Development at MUFG Bank: The Real Story
muit
11
21k
「使いにくい」も「運用疲れ」も卒業する UIデザイナーとエンジニアが創る持続可能な内製開発
nrinetcom
PRO
1
780
「ストレッチゾーンに挑戦し続ける」ことって難しくないですか? メンバーの持続的成長を支えるEMの環境設計
sansantech
PRO
1
310
入門DBSC
ynojima
0
130
Oracle Cloud Infrastructure:2026年2月度サービス・アップデート
oracle4engineer
PRO
0
220
「データとの対話」の現在地と未来
kobakou
0
1.3k
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
180
Featured
See All Featured
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.3k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
98
BBQ
matthewcrist
89
10k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
What's in a price? How to price your products and services
michaelherold
247
13k
Music & Morning Musume
bryan
47
7.1k
Site-Speed That Sticks
csswizardry
13
1.1k
Reality Check: Gamification 10 Years Later
codingconduct
0
2k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
Believing is Seeing
oripsolob
1
70
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license