Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
3
190
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
Tweet
Share
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
94
Platforms for scientific data analysis
mndoci
3
83
FGED Keynote
mndoci
3
84
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
240
A Platform for Data Science
mndoci
6
14k
Intel Theater Presentation @ SC11
mndoci
6
170
Talk at West Coast Association of Shared Directors meeting
mndoci
3
140
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
100
Other Decks in Technology
See All in Technology
BLADE: An Attempt to Automate Penetration Testing Using Autonomous AI Agents
bbrbbq
0
310
なぜ今 AI Agent なのか _近藤憲児
kenjikondobai
4
1.4k
DynamoDB でスロットリングが発生したとき/when_throttling_occurs_in_dynamodb_short
emiki
0
130
AWS Lambda のトラブルシュートをしていて思うこと
kazzpapa3
2
180
エンジニア人生の拡張性を高める 「探索型キャリア設計」の提案
tenshoku_draft
1
130
Oracle Cloud Infrastructureデータベース・クラウド:各バージョンのサポート期間
oracle4engineer
PRO
28
13k
100 名超が参加した日経グループ横断の競技型 AWS 学習イベント「Nikkei Group AWS GameDay」の紹介/mediajaws202411
nikkei_engineer_recruiting
1
170
SSMRunbook作成の勘所_20241120
koichiotomo
2
150
信頼性に挑む中で拡張できる・得られる1人のスキルセットとは?
ken5scal
2
540
Flutterによる 効率的なAndroid・iOS・Webアプリケーション開発の事例
recruitengineers
PRO
0
100
安心してください、日本語使えますよ―Ubuntu日本語Remix提供休止に寄せて― 2024-11-17
nobutomurata
1
1k
プロダクト活用度で見えた真実 ホリゾンタルSaaSでの顧客解像度の高め方
tadaken3
0
110
Featured
See All Featured
Building Your Own Lightsaber
phodgson
103
6.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
26
2.1k
A Philosophy of Restraint
colly
203
16k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Done Done
chrislema
181
16k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
Facilitating Awesome Meetings
lara
50
6.1k
Ruby is Unlike a Banana
tanoku
97
11k
Docker and Python
trallard
40
3.1k
Code Review Best Practice
trishagee
64
17k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license