Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
August 30, 2013
Research
0
160
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」
Tazro Inutano Ohta
August 30, 2013
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
100
Standardization of biological sample information database
inutano
0
57
Describe data analysis workflow with workflow languages
inutano
5
4.7k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
320
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
76
Workflows that run everywhere and where to run them
inutano
0
130
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
260
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
96
Container, Cloud, and HPC
inutano
0
150
Other Decks in Research
See All in Research
ECCV2024読み会: Minimalist Vision with Freeform Pixels
hsmtta
1
380
[輪講] Transformer Layers as Painters
nk35jk
4
620
メタヒューリスティクスに基づく汎用線形整数計画ソルバーの開発
snowberryfield
3
720
Whoisの闇
hirachan
3
250
20241115都市交通決起集会 趣旨説明・熊本事例紹介
trafficbrain
0
910
チュートリアル:Mamba, Vision Mamba (Vim)
hf149
6
2k
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
sansan_randd
1
440
文書画像のデータ化における VLM活用 / Use of VLM in document image data conversion
sansan_randd
2
450
打率7割を実現する、プロダクトディスカバリーの7つの極意(pmconf2024)
geshi0820
0
260
医療支援AI開発における臨床と情報学の連携を円滑に進めるために
moda0
0
140
新規のC言語処理系を実装することによる 組込みシステム研究にもたらす価値 についての考察
zacky1972
1
310
Human-Informed Machine Learning Models and Interactions
hiromu1996
2
560
Featured
See All Featured
Bash Introduction
62gerente
610
210k
Become a Pro
speakerdeck
PRO
26
5.1k
No one is an island. Learnings from fostering a developers community.
thoeni
20
3.1k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
11
900
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
3k
Statistics for Hackers
jakevdp
797
220k
GraphQLとの向き合い方2022年版
quramy
44
13k
Speed Design
sergeychernyshev
25
760
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
RailsConf 2023
tenderlove
29
980
Music & Morning Musume
bryan
46
6.3k
Transcript
࣍ੈγʔέϯεσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛར༻͢Δ AJACS#42 TOYAMA ౷߹σʔλϕʔεߨशձ"+"$4ࢁ 0
ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔಛٕज़ઐһେాୡ 5B[SP0IUB 5FDI4QFDJBMJTU %BUBCBTF$FOUFSGPS-JGF4DJFODF Effective SRA - public database for
high-throughput sequencing
͓͜ͱΘΓ Preface A
/(4σʔλղੳͷνϡʔτϦΞϧ͋Γ·ͤΜ %#ͷσʔλొͷνϡʔτϦΞϧ͋Γ·ͤΜ /(4ݚڀΛαϙʔτ͢Δެڞσʔλͷ͓Ͱ͢ Preface σʔλղੳΛαϙʔτ͢ΔϦιʔε͝հ͠·͢ σʔλొͷࡍʹඞཁͳใ͝հ͠·͢ ݚڀͷݱͰ׆͔ͨ͢Ίͷެڞ%#ͷ͍ํΛ͝հ͠·͢ A A D
ຊͷ༰ 0 Table of Contents
4FRVFODF3FBE"SDIJWF43"ʹ͍ͭͯ %#$-4ʹ͓͚ΔऔΓΈ /(4σʔλݕࡧͱར༻ͷ࣮ྫ Table of Contents ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ n
E X
ެڞ/(4σʔλϕʔε4FRVFODF3FBE"SDIJWF 43" ʹ͍ͭͯ ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ SRA: The public DB for primary
NGS data n
ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰొɾσʔλΞΫηε͕Մೳɽ ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ
ྻH[C[͘͠ಠࣗܗࣜͰѹॖ͞Εͨͷ͕%-Մೳɽ C L J ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍ྻσʔλΛडɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλผ%# EC(B1 &(" ʹొɽ
INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org ήϊϜใͷඪ४ԽͳͲͷใ %#DPMMBCͷϙϦγʔ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
NCBI SRA http://www.ncbi.nlm.nih.gov/sra ϑϦʔϫʔυݕࡧ ৄࡉݕࡧ 43"#-"45 ϩϯάϦʔυͷΈ 4PGUXBSF 43"UPPMLJU
NCBI SRA #JPQSPKFDUͷώοτ ώοτ݅ &YQFSJNFOU୯Ґ λΠτϧɼγʔέϯαɼ γʔέϯεྔͳͲͷใ ώοτͨ͠ੜछ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
NCBI SRA ࣮ݧͷλΠτϧ MBZPVU BEBQUPSͳͲͷ Ϧʔυͷใ γʔέϯγϯάϥϯ͝ͱͷ ใͱ%-ϦϯΫ ݕࡧ݁Ռͷτοϓώοτ 439
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
EMBL-EBI ENA http://www.ebi.ac.uk/ena ৄࡉݕࡧ ϑϦʔϫʔυݕࡧ ྻݕࡧ
EMBL-EBI ENA ֤ΧςΰϦʹ͓͚Δݕࡧ݁Ռͷ ώοτ ֤ΧςΰϦͷτοϓώοτ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
EMBL-EBI ENA ࣮ݧʹ͍ͭͯͷใ γʔέϯγϯάϥϯͷใɼ %-ϦϯΫ z&YQFSJNFOUzͷτοϓώοτ 439 Ұׅ%-ɼςΩετܗࣜͰͷ දࣔɼΧϥϜͷબ දࣔ͞Ε͍ͯΔใΛ
ςΩετܗࣜͰ%-
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
DDBJ DRA αΠτݕࡧ σʔλͷݕࡧɼ σʔλͷొɼ ಈըϚχϡΞϧ http://trace.ddbj.ac.jp/dra
DDBJ DRA *%ʹΑΔݕࡧɼ ϑΝηοτ ߜࠐ ݕࡧɼ ΩʔϫʔυʹΑΔݕࡧ ੜछɼ࣮ݧछɼσʔλొݩͷ ϥϯΩϯάUPQ http://trace.ddbj.ac.jp/DRASearch
*%छ͝ͱͷΤϯτϦ
DDBJ DRA ૯ώοτ ݕࡧ݁Ռ http://trace.ddbj.ac.jp/DRASearch ϝλσʔλͷλΠϓͱ ੜछʹΑΔߜΓࠐΈ
DDBJ DRA ؔ࿈ΞΠςϜͷϦϯΫͱ %-ϦϯΫ ࣮ݧͷৄࡉใ ϥΠϒϥϦ࡞ɼ γʔέϯαɼ ϕʔείʔϧͳͲͷใ z&YQFSJNFOUzͰߜΓࠐΈˠτοϓώοτ 439
DDBJ DRA /BWJHBUJPOˠ3VO 433 %-Ͱ͖ͳ͍ͷͷྫ
DDBJ DRA %3"4FBSDIˠ%33 %-͕Մೳͳͷͷྫ Ϧʔυͷใ RVBMJUZʹνΣοΫ͢Δͱ QISFETDPSF͕දࣔ͞ΕΔ 'BTURܗࣜͱ 43"-JUFܗࣜɼ ͦΕͧΕͷ%-ϦϯΫ
'51
DDBJ DRA /$#*43"Ͱz433zΛݕࡧˠ3FDPSEJTSFNPWFE
Handson ݕࡧͯ͠ΈΔ ੜछɼγʔέϯαʔ໊ɼҨࢠ໊ɼ࣬ױ໊ͳͲͰݕࡧɽ "EWBODFEৄࡉݕࡧͬͯΈΔɽ c ग़͖ͯͨσʔλͷৄࡉΛௐΔ σʔλ͕ͲΕ͘Β͍ͷେ͖͔͞ௐΔ μϯϩʔυʹͲΕ͘Β͍͕͔͔࣌ؒΓͦ͏͔ʁ ϋʔυσΟεΫͷۭ͖༰ྔʹऩ·Δ͔ʁ ώοτ͕ͨ݅͠ଟ͗͢Δগͳ͗͢Δ࣌ผͷݕࡧΛࢼ͢ɽ
໘നͦ͏ͳσʔλ͔Ͳ͏͔அͰ͖ΔใΛ୳͢ɽ
Search Tips ͦΕͧΕ͕ಠࣗʹػೳΛ։ൃ͍ͯ͠Δ ࣮ߦͰ͖Δݕࡧͷछྨɼ݁ՌͷදࣔͳͲ͕ҟͳΔɽ *%ڞ௨ͳͷͰɼ͍͚Δ͜ͱͰΑΓศརʹ୳ͤΔɽ O %-Ͱ͖ͳ͍σʔλ͋Δ ϝλσʔλʹهड़͞Εͳ͍ใݕࡧͰ͖ͳ͍ ϝλσʔλͱɼྻσʔλʹର͢Δऍσʔλͷ͜ͱɽ ࢦఆͷܗࣜʹै͍ొऀʹΑͬͯهड़͞ΕΔɽ
༷ʑͳཧ༝ͰొऀʹΑͬͯऔΓԼ͛ΒΕΔͳͲͷଞʹɼ ొ͞Ε͔ͨΓͰڞ༗͞Ε͍ͯͳ͍ͨΊݟ͔ͭΒͳ͍͜ͱɽ
! ϝλσʔλ Metadata Object
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ྻσʔλͱڞʹొ͞ΕΔϝλσʔλछྨͷΦϒδΣΫτ͔Β ߏ͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡ใ͕هड़͞ΕΔ
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ϝλσʔλΛొ୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔੑ͜ͷΑ͏ʹͳΔ
Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002
DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%σʔλΛड͚͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷࣈͰࣔ͞ΕΔ
Metadata Tips ΦϒδΣΫτɼ*%ͷؔෳࡶ େنͳϓϩδΣΫτʹͳΔͱ3VO4BNQMF͕ඦʹͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔͷ͋Δɽ O lͲͷใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛѲ͢Δ ొऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ
ಛʹϥΠϒϥϦௐͷ߲ͳͲɽ จͳͲͷใ͕ߋ৽͞ΕͯΞοϓσʔτ͞Εͳ͍߹ɽ σʔλͷొ͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹॏཁɽ ৄ͘͠IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
·ͱΊ n Summary #1
Summary #1 43"*/4%$ϝϯόʔہʹΑͬͯӡӦ͞ΕΔ σʔλڞ༗͞ΕΔͷͰೖΓޱ͕Ͳ͜Ͱಉ͕ͩ͡ɼ ݕࡧػೳͳͲ͕ͦΕͧΕҟͳΔɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ n
E %#$-4ʹ͓͚ΔऔΓΈ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ Tech Dev at DBCLS - Search and
Statistics
ଞ%#ͱͷ౷߹ ݕࡧػೳͷ։ൃ ౷ܭʹΑΔ%#ͷݱঢ়Ѳ DBCLS v SRA ϝλσʔλ͚ͩͰͳ͘ɼจͳͲͷจݙใɼ ࣬ױͷใɼ͞ΒʹݸผσʔλͷྻΫΦϦςΟΛܭࢉɽ ہͷػೳΛ౷߹ͭͭ͠ɼಠࣗͷػೳΛՃͨ͠ɼ ΑΓσʔλར༻ऀΛࢦͨ͠ݕࡧػೳΛ։ൃɽ
ϝλσʔλΛݩʹͨ͠ొͷਪҠΛެ։ɽ ͞ΒʹྻใΛݩʹͨ͠%#શମͷใΛੳɽ ≠ π ¥
%#$-443" ≠ DBCLS SRA
DBCLS SRA http://sra.dbcls.jp/ ొ͞Ε͍ͯΔσʔλΛ ϝλσʔλผʹϦετදࣔ 43"*%ੜछɼ γʔέϯαͳͲ͔Βݕࡧ
http://sra.dbcls.jp/ ࣮ݧछɼγʔέϯαɼ ੜछ͝ͱͷϥϯΩϯά ʹΑΔਪҠͷάϥϑ DBCLS SRA
http://sra.dbcls.jp/ σʔλΛจݙใ͔Β୳͢ σʔλΛ࣬ױใ͔Β୳͢ DBCLS SRA
จݙใͷ౷߹ ∆ DBCLS SRA Publication Search
ྻσʔλͷใจͷํ͕ৄ͍͠ ྻσʔλ͕จΑΓલʹެ։͞ΕΔ͜ͱ ϝλσʔλʹจݙใ͕ه͞Εͳ͍͜ͱ͕͋Δ DBCLS SRA Publication Search ݚڀͷதͰͷγʔέϯεͷҐஔ͚ͮॏཁɽ .BUFSJBMT.FUIPETʹৄ͍͠ใ͕͋Δ͜ͱ͕ଟ͍ɽ άϥϯτͷ੍ɼδϟʔφϧʹΑΔσʔλެ։ͷࢦࣔͳͲɽ
େنͳϓϩδΣΫτͰެ։ϙϦγʔΛઃఆ͢Δ͜ͱɽ Ұొ͞Εͨޙʹϝλσʔλ͕Ξοϓσʔτ͞Εͳ͍ɽ ެ։͞Εͨσʔλͱจͷඥ͚Λߦ͏ඞཁ͕͋Δɽ ∆
%#$-443"ˠzจݙ͔Β୳͢z ࣮ݧछɼγʔέϯαɼ ੜछʹΑΔߜΓࠐΈݕࡧ 43"*%ͱ1VC.FE*%ͷର Ԡද͓Αͼจݙͷใ ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ DBCLS SRA Publication
Search
࣬ױใͷ౷߹ ® DBCLS SRA Diseases Search
ΫϦχΧϧγʔέϯεͷݕࡧࠔ ϝλσʔλͷهड़͚ͩͰෆेͳ߹ จݙใʹ༩͞ΕͨλάΛར༻͢Δ DBCLS SRA Diseases Search શήϊϜγʔέϯεଟܕͷใͳͲɼ 43"Ͱެ։͞Εͳ͍߹ଟ͍ɽ ొऀʹΑͬͯهड़ͷํɼใྔʹ͕ࠩ͋ΔͨΊɼ
Ұׅͯ͠ݕࡧ͢Δ͜ͱ͕͍͠ɽ 1VC.FEΤϯτϦʹ༩͞ΕΔ.F4)λʔϜΛར༻ͯ͠ɼ ࣬ױͷใΛΩʔʹͨ͠σʔλݕࡧػೳΛ։ൃɽ ®
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠසผ ࣬ױλΠϓ͔Βݕࡧ ࣬ױ໊ͱొσʔλ දࣔ݅ͷࢦఆ DBCLS SRA Diseases Search
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠ࣬ױΧςΰϦผ ΫϦοΫͯ͠πϦʔΛల։ ࣈΛΫϦοΫͯ͠ Ϧετදࣔ DBCLS SRA Diseases Search
%#$-4ಠࣗͷݕࡧػೳ S DBCLS SRA Metadata Search
S ΑΓϢʔβࢦͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷใΛݕࡧʹөͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͍͚ͳ͍ࣝΛͳΔ͘ݮΒ͢ɽ
ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷใ ಠࣗͷใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷޮ͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯͷΈࠐΈՄೳʹɽ
։ൃऀ͚ใɼαϙʔτ༻ πΠολʔΞΧϯτ ϑϦʔϫʔυݕࡧ http://sra.dbcls.jp/search DBCLS SRA Metadata Search ߜΓࠐΈݕࡧ
݅ʹ֘͢Δσʔλʹ ରͯ͠ϑϦʔϫʔυݕࡧ ݅ʹ֘͢Δ σʔλΛશͯදࣔ DBCLS SRA Metadata Search ֤݅ʹ֘͢Δ σʔλͷׂ߹
ߜΓࠐΈݕࡧ .VTNVTDVMVT5SBOTDSJQUPNF*MMVNJOB.J4FR
ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ ΩʔϫʔυͰΞΠςϜΛ ߜΓࠐΉ DBCLS SRA Metadata Search ώοτͨ͠σʔλͷใɽ ੨͍ߦจใ͖
ݕࡧ݁Ռ
ϓϩδΣΫτͷ֓ཁ จͷ֓ཁͱཁࢫ DBCLS SRA Metadata Search 1VC.FE 1.$ͷϦϯΫ 431ΛΫϦοΫͨ݁͠Ռ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ΫϦοΫͯ͠ల։ .BUFSJBMTBOE.FUIPET 3FTVMUT
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ μϯϩʔυϦϯΫ 3VO
4BNQMFͷใ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ શମͰͷࠩΛϋΠϥΠτ
DBCLS SRA Metadata Search 3VOͷΫΦϦςΟใ 433 ϦʔυɼϦʔυɼ ($ͳͲͷใ ֤Ϟδϡʔϧͷ݁ՌΛ ΫϦοΫ֦ͯ͠େ
S ݕࡧΛՄࢹԽ͢Δ จใؚΊͨΩʔϫʔυݕࡧ ϦʔυͷใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹ΛݟͯஅͰ͖Δɽ
ͳΔ͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧରΛ֦େ͍ͯ͠Δɽ μϯϩʔυʹ͍࣌ؒΛཁ͢Δ͜ͱɽ ࣮֬ʹ͑Δσʔλ͚ͩΛબͿͨΊͷใΛఏڙɽ
·ͱΊ E Summary #2
Summary #2 %#$-443"43"ͷػೳ֦ுͰ͋Δ σʔλొड͚͚ͣɼ43"ͷঢ়گΛѲ͢ΔͨΊͷใ ΑΓσʔλΛ୳͍͢͠ݕࡧػೳΛఏڙ͍ͯ͠Δɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔੑಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ E
X /(4σʔλݕࡧͱར༻ͷ࣮ྫ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ Search published NGS data and project
X ஈ֊ผɾެڞσʔλͷར༻ྫ Use cases of Public data
X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷใΛݩʹɼ
γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜछɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛՃͯ͠ɼ ղੳͷਫ਼্ʹཱͯΔɽ
σʔλͷݕࡧɾར༻Ͱඞཁͳ͜ͱ ∑ Practical search tips
∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛѲ͢Δ γʔέϯαͷϦʔυεϖοΫΛѲ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυɼϦʔυҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫɼ
ࢼༀͷΞοϓσʔτʹΑͬͯมΘΔͷͰҙ͕ඞཁɽ ੜछͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷใ͕ॏཁ
Required read spec by application application / ࣮ݧछ total bases
/ ૯Ԙج read length / Ϧʔυ read number (M) / Ϧʔυ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔ֶผ࣍ੈγʔέϯαʔతผΞυόϯετϝιουQΑΓҾ༻ ରͷήϊϜαΠζͳͲͰࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ͋Γ·͢
Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩࢠൃݱ੍ޚղੳ ৽نήϊϜྻܾఆ ΤϐδΣωςΟΫεղੳ
ϝλήϊϜղੳ ήϊϜߏղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એͰ͕͢པ·Ε͍ͯΔΘ͚ͰചΕΔͱʹ͓͕ۚೖΔΘ͚Ͱ͋Γ·ͤΜ
Read spec, still improving ࢼༀιϑτΣΞͷ্ʹΑΓಉ͡γʔέϯαͰ ϦʔυϦʔυ͕සൟʹมΘΔ ྫJMMVNJOBࣾ.J4FR
ྫϚεͷҨࢠൃݱͷݚڀྫΛ୳͢ ∑ Example survey: mouse brain transcriptome
Example survey: mouse brain transcriptome ੜछͱ࣮ݧछΛࢦఆ TVCNJUDPOEJUJPOΛԡ͢ http://sra.dbcls.jp/search γʔέϯαۭཝͷ··
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/filter?species=Mus %20musculus&type=Transcriptome&instrument= ΩʔϫʔυʹzCSBJOzΛ ೖྗͯ͠zTFBSDIzΛԡ͢ Example survey: mouse brain transcriptome
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain 4UVEZ5JUMFͷԼͷೖྗཝʹ lCSBJOzͱೖྗͯ͠ߜΓࠐΈ Example survey: mouse brain transcriptome
ϓϩδΣΫτΛ৽͍͠ॱ ʹฒΔͨΊ4UVEZ*%Λ ΫϦοΫ ͜ͷϓϩδΣΫτͷσʔλΛ ݟͯΈ·͢ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain Example survey: mouse
brain transcriptome
ϓϩδΣΫτͷ֓ཁ http://sra.dbcls.jp/search/view/SRP011204 ϓϩδΣΫτͰߦΘΕͨ γʔέϯεͷ֓ཁ Example survey: mouse brain transcriptome
Ϧʔυ˺d. ϦʔυC UPUBMd(C http://sra.dbcls.jp/search/view/SRP011204 ͭͷ4BNQMFͰ3VO αϯϓϧׂ SFQMJDBUFT Example survey: mouse
brain transcriptome
λΠτϧʹ͋ͬͨ(&0*% l(4&zͰݕࡧ http://www.ncbi.nlm.nih.gov/geo/ Example survey: mouse brain transcriptome
SFQMJDBUFTͰͨ͠ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
(&0Ͱจͷใ͕ Ξοϓσʔτ͞Ε͍ͯΔ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 ͦΕͧΕͷαϯϓϧͷ ৄ͍͠ใ $POUSPMͷใΛݟΔͨΊ (&04BNQMF*%ΛΫϦοΫ Example survey:
mouse brain transcriptome
4BNQMF$IBSBDUFSJTUJDT http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFॲཧͷϓϩτίϧ Example survey: mouse brain transcriptome
43"ͷ&YQFSJNFOU*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFͷؔΛݟΔͨΊ #JPTBNQMF*%ΛΫϦοΫ Example survey: mouse brain transcriptome
ରԠ͢Δ43"4BNQMF*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 43"ϑΥʔϚοτͷ ྻσʔλͷ%-ϦϯΫ Example survey: mouse brain transcriptome
(&0ͷϖʔδʹͬͯ จͷϦϯΫΛΫϦοΫ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
͔ͤͬ͘ͳͷͰ 1VC3FBEFSͰશจΛ֬ೝ http://www.ncbi.nlm.nih.gov/pubmed/22563483 Example survey: mouse brain transcriptome
/BWJHBUJPOΛΫϦοΫ .BUFSJBMT.FUIPETΛ ΫϦοΫ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader Example survey: mouse brain transcriptome
σʔλղੳʹ͍ͭͯ ར༻ͨ͠πʔϧͳͲ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader ϥΠϒϥϦௐͱ γʔέϯγϯάʹ͍ͭͯ Example survey: mouse brain
transcriptome
∑ ݅ʹ߹͏σʔλͷϦʔυͷใΛಘΔ ϥΠϒϥϦௐσʔλղੳͷใΛಘΔ ݅ʹ߹ͬͨσʔλΛμϯϩʔυ Practical search tips Ϧʔυͷ͞ɼϦʔυɼαϯϓϧͷใͳͲɽ ར༻తʹ߹͍ͬͯΔ͔ɼσʔλͷे͔ɽ 43"ʹهࡌ͕͋Δ͜ͱଟ͘ͳ͍ɽ
จ(&0ͳͲ֎෦%#ͷใΛ୧Δ͜ͱͰಘΒΕΔ͜ͱɽ σʔλʹΑͬͯ%-ɼϑΝΠϧల։ʹඇৗʹ͕͔͔࣌ؒΔɽ %%#+'51ͰGBTURΛ%-ɼ͘͠%%#+ύΠϓϥΠϯΛར༻ɽ
σʔλͷ֬ೝͱμϯϩʔυ ∑ Quality check and download
Read quality check ϦʔυͷҐஔ͝ͱͷ ΫΦϦςΟΛνΣοΫ http://sra.dbcls.jp/search/view/SRR426841 ($ͳͲνΣοΫ
Data download via FTP l'51zΛΫϦοΫ http://sra.dbcls.jp/search/view/SRP011204 %#ܗࣜΛબ͢Δͱ '51αΠτ͕։͘
Data download via FTP http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR426841 '"45243"-JUF ͲͪΒ͔ͷܗࣜΛΫϦοΫ
'51αΠτʹήετͰϩάΠϯ C[ܗࣜͰѹॖ͞ΕͨGBTURϑΝΠϧʹΞΫηεͰ͖·͢ Data download via FTP
ύΠϓϥΠϯΛར༻͢Δ ∑ DDBJ Read Annotation Pipeline
DDBJ Read Annotation Pipeline ϩάΠϯޙɼ l*NQPSUQVCMJD%3"zΛ ΫϦοΫ https://p.ddbj.nig.ac.jp/ˠϩάΠϯ 43"*%Λೖྗͯ͠ σʔλΛύΠϓϥΠϯʹՃ
·ͱΊ Summary #3 X
X Summary #3 จݙϦʔυใΛ׆༻ͯ͠ඞཁͳใΛಘΔ ϦʔυͷใϥΠϒϥϦௐɾղੳͳͲͷใ͕ඞཁɽ Ͳ͏ͯ͠ใ͕ݟ͔ͭΒͳ͍࣌ఘΊΔͷେࣄɽ ެڞͷղੳύΠϓϥΠϯΛ͏·͘ར༻͢Δ ڊେͳσʔλ%-ʹ͕͔͔࣌ؒΓɼ)%%༰ྔѹഭ͢Δɽ %%#+ύΠϓϥΠϯΛ׆༻͢Δ͜ͱͰίετΛԼ͛ΒΕΔɽ
ΦϯϥΠϯͰඞཁͳใΛ୳͢ Œ Online Reference
IUUQHJUIVCDPNJOVUBOPTSB@NFUBEBUB@UPPMLJUXJLJ 43" /(4ʹؔ͢ΔϦϑΝϨϯεͱϦϯΫू Online Reference
࣭ٙԠ Thank you for your attention ¿ ·࣭ͨUPIUB!ECDMTSPJTBDKQ·Ͱ