Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Platforms for Data Science
Search
Deepak Singh
October 01, 2011
Technology
200
3
Share
Platforms for Data Science
Talk given at the "Computing on the Brink" series
Deepak Singh
October 01, 2011
More Decks by Deepak Singh
See All by Deepak Singh
Changing the Calculus of Containers (Datadog Dash)
mndoci
2
120
Platforms for scientific data analysis
mndoci
3
120
FGED Keynote
mndoci
3
100
Open Mic Science - May 7, 2012
mndoci
4
1.3k
Talk at "Genome Informatics Alliance 2012" meeting
mndoci
1
270
A Platform for Data Science
mndoci
6
15k
Intel Theater Presentation @ SC11
mndoci
6
200
Talk at West Coast Association of Shared Directors meeting
mndoci
3
160
A platform for data science - Systems Bioinformatics Workshop
mndoci
3
120
Other Decks in Technology
See All in Technology
Claude Teamプランの選定と、できること/できないこと
rfdnxbro
1
1.9k
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4.2k
会社紹介資料 / Sansan Company Profile
sansan33
PRO
16
410k
CC Workflow Studio
seiyakobayashi
0
270
Azure Lifecycle with Copilot CLI
torumakabe
0
140
建設的な現実逃避のしかた / How to practice constructive escapism
pauli
4
310
ASTのGitHub CopilotとCopilot CLIの現在地をお話しします/How AST Operates GitHub Copilot and Copilot CLI
aeonpeople
1
210
システムは「動く」だけでは 足りない - 非機能要件・分散システム・トレードオフの基礎
nwiizo
25
8.1k
ふりかえりがなかった職能横断チームにふりかえりを導入してみて学んだこと 〜チームのふりかえりを「みんなで未来を考える場」にするプロローグ設計〜
masahiro1214shimokawa
0
340
数案件を同時に進行するためのコンテキスト整理術
sutetotanuki
1
180
GitHub Copilotを極める会 - 開発者のための活用術
findy_eventslides
6
4k
DevOpsDays2026 Tokyo Cross-border practices to connect "safety" and "DX" in healthcare
hokkai7go
0
120
Featured
See All Featured
BBQ
matthewcrist
89
10k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
140
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
93
Visualization
eitanlees
150
17k
New Earth Scene 8
popppiees
3
2k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.6k
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
96
Ethics towards AI in product and experience design
skipperchong
2
250
Transcript
There is no magic There is only awesome D e
e p a k S i n g h Platforms for data science
bioinformatics image: Ethan Hein
3
collection
curation
analysis
what’s the big deal?
None
Source: http://www.nature.com/news/specials/bigdata/index.html
Image: Yael Fitzpatrick (AAAS)
Image: Yael Fitzpatrick (AAAS)
lots of data
lots of people
lots of places
constant change
we want to make our data more effective
versioning
provenance
filter
aggregate
extend
mashup
human interfaces
None
image: Leo Reynolds
hard problem
really hard problem
so how do get there?
information platforms
Image: Drew Conway
dataspaces Further reading: Jeff Hammerbacher, Information Platforms and the rise
of the data scientist, Beautiful Data
the unreasonable effectiveness of data Halevy, et al. IEEE Intelligent
Systems, 24, 8-12 (2009)
accept all data formats
evolve APIs
beyond databases and the data warehouse
data as a programmable resource
data is a royal garden
compute is a fungible commodity
optimizing the most valuable resource
compute, storage, workflows, memory, transmission, algorithms, cost, …
people Credit: Pieter Musterd a CC-BY-NC-ND license
Image: Chris Dagdigian
my bias
cloud services
distributed systems
scale
global
consumption models
on-demand
what is the value of your data?
None
None
Credit: Angel Pizzaro, U. Penn
mapreduce for genomics http://bowtie-bio.sourceforge.net/crossbow/index.shtml http://contrail-bio.sourceforge.net http://bowtie-bio.sourceforge.net/myrna/index.shtml
None
Bioproximity http://aws.amazon.com/solutions/case-studies/bioproximity/
None
None
30,472 cores
$1279/hr
http://cloudbiolinux.org/
http://usegalaxy.org/cloud
in summary
large scale data requires a rethink
data architecture
compute architecture
distributed, programmable infrastructure
cloud services
remove constraints
can we build data science platforms?
there is no magic there is only awesome
[email protected]
Twitter:@mndoci http://slideshare.net/mndoci http://mndoci.com Inspiration and ideas from Matt Wood&
Larry Lessig Credit” Oberazzi under a CC-BY-NC-SA license