Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Deepak Singh
October 01, 2011
Technology
210
3
Share
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
120
FGED Keynote
mndoci
3
100
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
270
A Platform for Data Science
mndoci
6
15k
Intel Theater Presentation @ SC11
mndoci
6
200
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
130
Other Decks in Technology
See All in Technology
Building a Study Buddy AI Agent from Scratch: From Passive Chatbots to Autonomous Systems
itchimonji
0
140
Modernizing Your HCL Connections Experience: Visual Report to chain, Profile Enhancements, and AI Integration
wannesrams
0
290
AIエージェントの支払い基盤 AgentCore Payments概要
kmiya84377
1
140
CyberAgent YJC Connect
shimaf4979
1
170
世界の中心でApp Runnerを叫ぶ FINAL
tsukuboshi
0
250
MySQL 9.7がやってきた ~これまでのあらすじと基本情報~ @ 日本MySQLユーザ会会2026年04月 / mysql97-yattekita
sakaik
0
180
ハーネスエンジニアリング入門
knishioka
0
130
AI対話分析の夢と、汚いデータの現実 Looker / Dataplex / Dataform で実現する品質ファーストな基盤設計
waiwai2111
0
230
サービスの信頼性を高めるため、形骸化した「プロダクションミーティング」を立て直すまでの取り組み
stefafafan
1
250
需要創出(Chatwork)×供給(BPaaS) フライホイールとMoat 実行能力の最適配置とAI戦略
kubell_hr
0
2.1k
AI駆動開発で生産性を追いかけたら、行き着いたのは品質とシフトレフトだった
littlehands
0
450
freeeで運用しているAIQAについて
qatonchan
0
370
Featured
See All Featured
YesSQL, Process and Tooling at Scale
rocio
174
15k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
140
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
170
Statistics for Hackers
jakevdp
799
230k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
220
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
260
What's in a price? How to price your products and services
michaelherold
247
13k
The Limits of Empathy - UXLibs8
cassininazir
1
320
A Tale of Four Properties
chriscoyier
163
24k
The untapped power of vector embeddings
frankvandijk
2
1.7k
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license