Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Database Integration to Improve Accessibility t...
Search
Tazro Inutano Ohta
July 04, 2014
Science
0
140
Database Integration to Improve Accessibility to Public High-throughput Sequencing Data
A Presentation at National Institute of Genetics, Japan Retreat 2014
Tazro Inutano Ohta
July 04, 2014
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
130
Standardization of biological sample information database
inutano
0
79
Describe data analysis workflow with workflow languages
inutano
5
5.6k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
350
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
110
Workflows that run everywhere and where to run them
inutano
0
170
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
300
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
120
Container, Cloud, and HPC
inutano
0
180
Other Decks in Science
See All in Science
データマイニング - コミュニティ発見
trycycle
PRO
0
180
学術講演会中央大学学員会府中支部
tagtag
0
330
Hakonwa-Quaternion
hiranabe
1
160
NDCG is NOT All I Need
statditto
2
2.6k
機械学習 - SVM
trycycle
PRO
1
940
生成検索エンジン最適化に関する研究の紹介
ynakano
2
1.5k
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
rudorudo11
0
170
(2025) Balade en cyclotomie
mansuy
0
280
データベース08: 実体関連モデルとは?
trycycle
PRO
0
1k
My Little Monster
juzishuu
0
310
Cross-Media Technologies, Information Science and Human-Information Interaction
signer
PRO
3
31k
AIに仕事を奪われる 最初の医師たちへ
ikora128
0
1k
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
730
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
Testing 201, or: Great Expectations
jmmastey
46
7.8k
[RailsConf 2023] Rails as a piece of cake
palkan
58
6.2k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Done Done
chrislema
186
16k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
9
1k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
Site-Speed That Sticks
csswizardry
13
1k
Why Our Code Smells
bkeepers
PRO
340
57k
Transcript
Database Integration to Improve Accessibility to High-Throughput Seq Data
TAZRO OHTA @inutano
None
What do you imagine with a term “Database”?
None
None
None
Knowledge Scientific data Experimental data
Knowledge base Database Raw Data repository
Knowledge base Database Raw Data repository
What kind of data? Next-generation is already out there…
We all need Raw data repo for NGS
We’ve already seen WHY WE NEED
None
Reproducibility is what makes science fair.
2 things required for data repository is…
1: Reliability Data should be archived correctly, with explicit metadata
2: Accessibility Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
Current Web-interface for DRA http://trace.ddbj.nig.ac.jp/DRASearch
Good: Simple, Fast, and no bugs (!) Challenge: Lack of
metadata caused “NOT FOUND”
PROBLEM:
???
DRASearch can NOT find Data without metadata …but they definitely
exist in the repo.
Too many to ask submitters; then we implemented a system
to make metadata rich enough
2 sources into DRA DDBJ Read Archive
Publications can have details of seq process, Seq Read Quality
can be a source of data quality. DDBJ Read Archive PubMed PMC Extracted Read Quality
And then: integration enables to implement Efficient Data Search
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
83% seq reads satisfied average quality over 30 0.03% of
seq reads fall into over 50% N content
1: Reliability from paper/data qual more description brings more proof.
2: Accessibility from text-search Search included publication brings flexibility.
2.20% of submitted projects has at least one publication 4429
/ 201558 PROBLEM:
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
What is Next-step to carry on?
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
Database is for Biologists not for developers.
Thank you!
[email protected]
http://speakerdeck.com/inutano