Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Database Integration to Improve Accessibility t...
Search
Tazro Inutano Ohta
July 04, 2014
Science
0
140
Database Integration to Improve Accessibility to Public High-throughput Sequencing Data
A Presentation at National Institute of Genetics, Japan Retreat 2014
Tazro Inutano Ohta
July 04, 2014
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
110
Standardization of biological sample information database
inutano
0
64
Describe data analysis workflow with workflow languages
inutano
5
5.1k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
330
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
85
Workflows that run everywhere and where to run them
inutano
0
140
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
270
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
100
Container, Cloud, and HPC
inutano
0
160
Other Decks in Science
See All in Science
応用心理学Ⅰテキストマイニング講義資料講義編(2024年度)
satocos135
0
150
地質研究者が苦労しながら運用する情報公開システムの実例
naito2000
0
170
メール送信サーバの集約における透過型SMTP プロキシの定量評価 / Quantitative Evaluation of Transparent SMTP Proxy in Email Sending Server Aggregation
linyows
0
870
学術講演会中央大学学員会大分支部
tagtag
0
140
WCS-LA-2024
lcolladotor
0
220
理論計算機科学における 数学の応用: 擬似ランダムネス
nobushimi
1
430
Transformers are Universal in Context Learners
gpeyre
0
810
(論文読み)贈り物の交換による地位の競争と社会構造の変化 - 文化人類学への統計物理学的アプローチ -
__ymgc__
1
220
05_山中真也_室蘭工業大学大学院工学研究科教授_だてプロの挑戦.pdf
sip3ristex
0
340
深層学習を用いた根菜類の個数カウントによる収量推定法の開発
kentaitakura
0
100
Spectral Sparsification of Hypergraphs
tasusu
0
310
06_浅井雄一郎_株式会社浅井農園代表取締役社長_紹介資料.pdf
sip3ristex
0
340
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
41
2.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
A better future with KSS
kneath
239
17k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
Mobile First: as difficult as doing things right
swwweet
223
9.6k
How to Ace a Technical Interview
jacobian
276
23k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
Building a Modern Day E-commerce SEO Strategy
aleyda
40
7.3k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
34
2.2k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.2k
The World Runs on Bad Software
bkeepers
PRO
68
11k
Transcript
Database Integration to Improve Accessibility to High-Throughput Seq Data
TAZRO OHTA @inutano
None
What do you imagine with a term “Database”?
None
None
None
Knowledge Scientific data Experimental data
Knowledge base Database Raw Data repository
Knowledge base Database Raw Data repository
What kind of data? Next-generation is already out there…
We all need Raw data repo for NGS
We’ve already seen WHY WE NEED
None
Reproducibility is what makes science fair.
2 things required for data repository is…
1: Reliability Data should be archived correctly, with explicit metadata
2: Accessibility Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
1: Reliability needs curation Data should be archived correctly, with
explicit metadata 2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
Current Web-interface for DRA http://trace.ddbj.nig.ac.jp/DRASearch
Good: Simple, Fast, and no bugs (!) Challenge: Lack of
metadata caused “NOT FOUND”
PROBLEM:
???
DRASearch can NOT find Data without metadata …but they definitely
exist in the repo.
Too many to ask submitters; then we implemented a system
to make metadata rich enough
2 sources into DRA DDBJ Read Archive
Publications can have details of seq process, Seq Read Quality
can be a source of data quality. DDBJ Read Archive PubMed PMC Extracted Read Quality
And then: integration enables to implement Efficient Data Search
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Available via DBCLS SRA http://sra.dbcls.jp/
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
Power of Integration: Metadata Search http://sra.dbcls.jp/search
83% seq reads satisfied average quality over 30 0.03% of
seq reads fall into over 50% N content
1: Reliability from paper/data qual more description brings more proof.
2: Accessibility from text-search Search included publication brings flexibility.
2.20% of submitted projects has at least one publication 4429
/ 201558 PROBLEM:
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
NIH Data sharing Guideline http://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
What is Next-step to carry on?
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
1: Beyond Raw Data Archive is going to handle alignment
data. 2: Analysis Reproducibility Public repo for analysis pipeline is required.
Database is for Biologists not for developers.
Thank you!
[email protected]
http://speakerdeck.com/inutano