Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Database Integration to Improve Accessibility t...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Tazro Inutano Ohta
July 04, 2014
Science
150
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Database Integration to Improve Accessibility to Public High-throughput Sequencing Data
A Presentation at National Institute of Genetics, Japan Retreat 2014
Tazro Inutano Ohta
July 04, 2014
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
150
Standardization of biological sample information database
inutano
0
110
Describe data analysis workflow with workflow languages
inutano
5
6.1k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
380
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
130
Workflows that run everywhere and where to run them
inutano
0
190
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
330
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
150
Container, Cloud, and HPC
inutano
0
200
Other Decks in Science
See All in Science
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
PRO
0
270
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
PRO
0
130
機械学習 - DBSCAN
trycycle
PRO
0
1.9k
SHINOMIYA Nariyoshi
genomethica
0
150
データベース03: 関係データモデル
trycycle
PRO
1
550
防災デジタル分野での官民共創の取り組み (1)防災DX官民共創をどう進めるか
ditccsugii
0
660
20251212_LT忘年会_データサイエンス枠_新川.pdf
shinpsan
0
290
力学系から見た現代的な機械学習
hanbao
4
4.2k
ダメな自分の育て方―性格タイプの「劣等機能」から理解するニガテ克服術
ppillc
0
150
生成AIの現状と展望
tagtag
PRO
0
130
(2025) Balade en cyclotomie
mansuy
0
620
やるべきときにMLをやる AIエージェント開発
fufufukakaka
2
1.5k
Featured
See All Featured
Noah Learner - AI + Me: how we built a GSC Bulk Export data pipeline
techseoconnect
PRO
0
200
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.3k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
280
Designing for humans not robots
tammielis
254
26k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
170
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
6k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
28
3.5k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
330
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
440
Transcript
Database Integration to Improve Accessibility to High-Throughput Seq Data
TAZRO OHTA @inutano
None
What do you imagine with a term “Database”?
None
None
None
Knowledge Scientific data Experimental data
Knowledge base Database Raw Data repository
Knowledge base Database Raw Data repository
What kind of data? Next-generation is already out there…
We all need Raw data repo for NGS
We’ve already seen WHY WE NEED
None
Reproducibility is what makes science fair.
2 things required for data repository is…
1: Reliability Data should be archived correctly, with explicit metadata
2: Accessibility Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
Current Web-interface for DRA http://trace.ddbj.nig.ac.jp/DRASearch
Good: Simple, Fast, and no bugs (!) Challenge: Lack of
metadata caused “NOT FOUND”
PROBLEM:
???
DRASearch can NOT find Data without metadata …but they definitely
exist in the repo.
Too many to ask submitters; then we implemented a system
to make metadata rich enough
2 sources into DRA DDBJ Read Archive
Publications can have details of seq process, Seq Read Quality
can be a source of data quality. DDBJ Read Archive PubMed PMC Extracted Read Quality
And then: integration enables to implement Efficient Data Search
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
83% seq reads satisfied average quality over 30 0.03% of
seq reads fall into over 50% N content
1: Reliability from paper/data qual more description brings more proof.
2: Accessibility from text-search Search included publication brings flexibility.
2.20% of submitted projects has at least one publication 4429
/ 201558 PROBLEM:
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
What is Next-step to carry on?
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
Database is for Biologists not for developers.
Thank you!
[email protected]
http://speakerdeck.com/inutano