Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
August 30, 2013
Research
0
160
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」
Tazro Inutano Ohta
August 30, 2013
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
100
Standardization of biological sample information database
inutano
0
52
Describe data analysis workflow with workflow languages
inutano
5
4.5k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
320
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
76
Workflows that run everywhere and where to run them
inutano
0
130
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
250
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
93
Container, Cloud, and HPC
inutano
0
150
Other Decks in Research
See All in Research
Language is primarily a tool for communication rather than thought
ryou0634
4
740
RSJ2024「基盤モデルの実ロボット応用」チュートリアルA(河原塚)
haraduka
3
640
The Fellowship of Trust in AI
tomzimmermann
0
130
Global Evidence Summit (GES) 参加報告
daimoriwaki
0
150
MIRU2024_招待講演_RALF_in_CVPR2024
udonda
1
330
論文読み会 SNLP2024 Instruction-tuned Language Models are Better Knowledge Learners. In: ACL 2024
s_mizuki_nlp
1
350
20240820: Minimum Bayes Risk Decoding for High-Quality Text Generation Beyond High-Probability Text
de9uch1
0
120
言語処理学会30周年記念事業留学支援交流会@YANS2024:「学生のための短期留学」
a1da4
1
240
snlp2024_multiheadMoE
takase
0
430
Leveraging LLMs for Unsupervised Dense Retriever Ranking (SIGIR 2024)
kampersanda
2
190
marukotenant01/tenant-20240916
marketing2024
0
500
KDD論文読み会2024: False Positive in A/B Tests
ryotoitoi
0
200
Featured
See All Featured
What's in a price? How to price your products and services
michaelherold
243
12k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
4 Signs Your Business is Dying
shpigford
180
21k
Happy Clients
brianwarren
98
6.7k
Writing Fast Ruby
sferik
627
61k
5 minutes of I Can Smell Your CMS
philhawksworth
202
19k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Building Your Own Lightsaber
phodgson
103
6.1k
Site-Speed That Sticks
csswizardry
0
24
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
364
24k
Code Reviewing Like a Champion
maltzj
520
39k
Transcript
࣍ੈγʔέϯεσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛར༻͢Δ AJACS#42 TOYAMA ౷߹σʔλϕʔεߨशձ"+"$4ࢁ 0
ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔಛٕज़ઐһେాୡ 5B[SP0IUB 5FDI4QFDJBMJTU %BUBCBTF$FOUFSGPS-JGF4DJFODF Effective SRA - public database for
high-throughput sequencing
͓͜ͱΘΓ Preface A
/(4σʔλղੳͷνϡʔτϦΞϧ͋Γ·ͤΜ %#ͷσʔλొͷνϡʔτϦΞϧ͋Γ·ͤΜ /(4ݚڀΛαϙʔτ͢Δެڞσʔλͷ͓Ͱ͢ Preface σʔλղੳΛαϙʔτ͢ΔϦιʔε͝հ͠·͢ σʔλొͷࡍʹඞཁͳใ͝հ͠·͢ ݚڀͷݱͰ׆͔ͨ͢Ίͷެڞ%#ͷ͍ํΛ͝հ͠·͢ A A D
ຊͷ༰ 0 Table of Contents
4FRVFODF3FBE"SDIJWF43"ʹ͍ͭͯ %#$-4ʹ͓͚ΔऔΓΈ /(4σʔλݕࡧͱར༻ͷ࣮ྫ Table of Contents ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ n
E X
ެڞ/(4σʔλϕʔε4FRVFODF3FBE"SDIJWF 43" ʹ͍ͭͯ ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ SRA: The public DB for primary
NGS data n
ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰొɾσʔλΞΫηε͕Մೳɽ ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ
ྻH[C[͘͠ಠࣗܗࣜͰѹॖ͞Εͨͷ͕%-Մೳɽ C L J ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍ྻσʔλΛडɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλผ%# EC(B1 &(" ʹొɽ
INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org ήϊϜใͷඪ४ԽͳͲͷใ %#DPMMBCͷϙϦγʔ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
NCBI SRA http://www.ncbi.nlm.nih.gov/sra ϑϦʔϫʔυݕࡧ ৄࡉݕࡧ 43"#-"45 ϩϯάϦʔυͷΈ 4PGUXBSF 43"UPPMLJU
NCBI SRA #JPQSPKFDUͷώοτ ώοτ݅ &YQFSJNFOU୯Ґ λΠτϧɼγʔέϯαɼ γʔέϯεྔͳͲͷใ ώοτͨ͠ੜछ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
NCBI SRA ࣮ݧͷλΠτϧ MBZPVU BEBQUPSͳͲͷ Ϧʔυͷใ γʔέϯγϯάϥϯ͝ͱͷ ใͱ%-ϦϯΫ ݕࡧ݁Ռͷτοϓώοτ 439
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
EMBL-EBI ENA http://www.ebi.ac.uk/ena ৄࡉݕࡧ ϑϦʔϫʔυݕࡧ ྻݕࡧ
EMBL-EBI ENA ֤ΧςΰϦʹ͓͚Δݕࡧ݁Ռͷ ώοτ ֤ΧςΰϦͷτοϓώοτ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
EMBL-EBI ENA ࣮ݧʹ͍ͭͯͷใ γʔέϯγϯάϥϯͷใɼ %-ϦϯΫ z&YQFSJNFOUzͷτοϓώοτ 439 Ұׅ%-ɼςΩετܗࣜͰͷ දࣔɼΧϥϜͷબ දࣔ͞Ε͍ͯΔใΛ
ςΩετܗࣜͰ%-
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
DDBJ DRA αΠτݕࡧ σʔλͷݕࡧɼ σʔλͷొɼ ಈըϚχϡΞϧ http://trace.ddbj.ac.jp/dra
DDBJ DRA *%ʹΑΔݕࡧɼ ϑΝηοτ ߜࠐ ݕࡧɼ ΩʔϫʔυʹΑΔݕࡧ ੜछɼ࣮ݧछɼσʔλొݩͷ ϥϯΩϯάUPQ http://trace.ddbj.ac.jp/DRASearch
*%छ͝ͱͷΤϯτϦ
DDBJ DRA ૯ώοτ ݕࡧ݁Ռ http://trace.ddbj.ac.jp/DRASearch ϝλσʔλͷλΠϓͱ ੜछʹΑΔߜΓࠐΈ
DDBJ DRA ؔ࿈ΞΠςϜͷϦϯΫͱ %-ϦϯΫ ࣮ݧͷৄࡉใ ϥΠϒϥϦ࡞ɼ γʔέϯαɼ ϕʔείʔϧͳͲͷใ z&YQFSJNFOUzͰߜΓࠐΈˠτοϓώοτ 439
DDBJ DRA /BWJHBUJPOˠ3VO 433 %-Ͱ͖ͳ͍ͷͷྫ
DDBJ DRA %3"4FBSDIˠ%33 %-͕Մೳͳͷͷྫ Ϧʔυͷใ RVBMJUZʹνΣοΫ͢Δͱ QISFETDPSF͕දࣔ͞ΕΔ 'BTURܗࣜͱ 43"-JUFܗࣜɼ ͦΕͧΕͷ%-ϦϯΫ
'51
DDBJ DRA /$#*43"Ͱz433zΛݕࡧˠ3FDPSEJTSFNPWFE
Handson ݕࡧͯ͠ΈΔ ੜछɼγʔέϯαʔ໊ɼҨࢠ໊ɼ࣬ױ໊ͳͲͰݕࡧɽ "EWBODFEৄࡉݕࡧͬͯΈΔɽ c ग़͖ͯͨσʔλͷৄࡉΛௐΔ σʔλ͕ͲΕ͘Β͍ͷେ͖͔͞ௐΔ μϯϩʔυʹͲΕ͘Β͍͕͔͔࣌ؒΓͦ͏͔ʁ ϋʔυσΟεΫͷۭ͖༰ྔʹऩ·Δ͔ʁ ώοτ͕ͨ݅͠ଟ͗͢Δগͳ͗͢Δ࣌ผͷݕࡧΛࢼ͢ɽ
໘നͦ͏ͳσʔλ͔Ͳ͏͔அͰ͖ΔใΛ୳͢ɽ
Search Tips ͦΕͧΕ͕ಠࣗʹػೳΛ։ൃ͍ͯ͠Δ ࣮ߦͰ͖Δݕࡧͷछྨɼ݁ՌͷදࣔͳͲ͕ҟͳΔɽ *%ڞ௨ͳͷͰɼ͍͚Δ͜ͱͰΑΓศརʹ୳ͤΔɽ O %-Ͱ͖ͳ͍σʔλ͋Δ ϝλσʔλʹهड़͞Εͳ͍ใݕࡧͰ͖ͳ͍ ϝλσʔλͱɼྻσʔλʹର͢Δऍσʔλͷ͜ͱɽ ࢦఆͷܗࣜʹै͍ొऀʹΑͬͯهड़͞ΕΔɽ
༷ʑͳཧ༝ͰొऀʹΑͬͯऔΓԼ͛ΒΕΔͳͲͷଞʹɼ ొ͞Ε͔ͨΓͰڞ༗͞Ε͍ͯͳ͍ͨΊݟ͔ͭΒͳ͍͜ͱɽ
! ϝλσʔλ Metadata Object
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ྻσʔλͱڞʹొ͞ΕΔϝλσʔλछྨͷΦϒδΣΫτ͔Β ߏ͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡ใ͕هड़͞ΕΔ
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ϝλσʔλΛొ୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔੑ͜ͷΑ͏ʹͳΔ
Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002
DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%σʔλΛड͚͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷࣈͰࣔ͞ΕΔ
Metadata Tips ΦϒδΣΫτɼ*%ͷؔෳࡶ େنͳϓϩδΣΫτʹͳΔͱ3VO4BNQMF͕ඦʹͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔͷ͋Δɽ O lͲͷใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛѲ͢Δ ొऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ
ಛʹϥΠϒϥϦௐͷ߲ͳͲɽ จͳͲͷใ͕ߋ৽͞ΕͯΞοϓσʔτ͞Εͳ͍߹ɽ σʔλͷొ͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹॏཁɽ ৄ͘͠IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
·ͱΊ n Summary #1
Summary #1 43"*/4%$ϝϯόʔہʹΑͬͯӡӦ͞ΕΔ σʔλڞ༗͞ΕΔͷͰೖΓޱ͕Ͳ͜Ͱಉ͕ͩ͡ɼ ݕࡧػೳͳͲ͕ͦΕͧΕҟͳΔɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ n
E %#$-4ʹ͓͚ΔऔΓΈ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ Tech Dev at DBCLS - Search and
Statistics
ଞ%#ͱͷ౷߹ ݕࡧػೳͷ։ൃ ౷ܭʹΑΔ%#ͷݱঢ়Ѳ DBCLS v SRA ϝλσʔλ͚ͩͰͳ͘ɼจͳͲͷจݙใɼ ࣬ױͷใɼ͞ΒʹݸผσʔλͷྻΫΦϦςΟΛܭࢉɽ ہͷػೳΛ౷߹ͭͭ͠ɼಠࣗͷػೳΛՃͨ͠ɼ ΑΓσʔλར༻ऀΛࢦͨ͠ݕࡧػೳΛ։ൃɽ
ϝλσʔλΛݩʹͨ͠ొͷਪҠΛެ։ɽ ͞ΒʹྻใΛݩʹͨ͠%#શମͷใΛੳɽ ≠ π ¥
%#$-443" ≠ DBCLS SRA
DBCLS SRA http://sra.dbcls.jp/ ొ͞Ε͍ͯΔσʔλΛ ϝλσʔλผʹϦετදࣔ 43"*%ੜछɼ γʔέϯαͳͲ͔Βݕࡧ
http://sra.dbcls.jp/ ࣮ݧछɼγʔέϯαɼ ੜछ͝ͱͷϥϯΩϯά ʹΑΔਪҠͷάϥϑ DBCLS SRA
http://sra.dbcls.jp/ σʔλΛจݙใ͔Β୳͢ σʔλΛ࣬ױใ͔Β୳͢ DBCLS SRA
จݙใͷ౷߹ ∆ DBCLS SRA Publication Search
ྻσʔλͷใจͷํ͕ৄ͍͠ ྻσʔλ͕จΑΓલʹެ։͞ΕΔ͜ͱ ϝλσʔλʹจݙใ͕ه͞Εͳ͍͜ͱ͕͋Δ DBCLS SRA Publication Search ݚڀͷதͰͷγʔέϯεͷҐஔ͚ͮॏཁɽ .BUFSJBMT.FUIPETʹৄ͍͠ใ͕͋Δ͜ͱ͕ଟ͍ɽ άϥϯτͷ੍ɼδϟʔφϧʹΑΔσʔλެ։ͷࢦࣔͳͲɽ
େنͳϓϩδΣΫτͰެ։ϙϦγʔΛઃఆ͢Δ͜ͱɽ Ұొ͞Εͨޙʹϝλσʔλ͕Ξοϓσʔτ͞Εͳ͍ɽ ެ։͞Εͨσʔλͱจͷඥ͚Λߦ͏ඞཁ͕͋Δɽ ∆
%#$-443"ˠzจݙ͔Β୳͢z ࣮ݧछɼγʔέϯαɼ ੜछʹΑΔߜΓࠐΈݕࡧ 43"*%ͱ1VC.FE*%ͷର Ԡද͓Αͼจݙͷใ ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ DBCLS SRA Publication
Search
࣬ױใͷ౷߹ ® DBCLS SRA Diseases Search
ΫϦχΧϧγʔέϯεͷݕࡧࠔ ϝλσʔλͷهड़͚ͩͰෆेͳ߹ จݙใʹ༩͞ΕͨλάΛར༻͢Δ DBCLS SRA Diseases Search શήϊϜγʔέϯεଟܕͷใͳͲɼ 43"Ͱެ։͞Εͳ͍߹ଟ͍ɽ ొऀʹΑͬͯهड़ͷํɼใྔʹ͕ࠩ͋ΔͨΊɼ
Ұׅͯ͠ݕࡧ͢Δ͜ͱ͕͍͠ɽ 1VC.FEΤϯτϦʹ༩͞ΕΔ.F4)λʔϜΛར༻ͯ͠ɼ ࣬ױͷใΛΩʔʹͨ͠σʔλݕࡧػೳΛ։ൃɽ ®
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠසผ ࣬ױλΠϓ͔Βݕࡧ ࣬ױ໊ͱొσʔλ දࣔ݅ͷࢦఆ DBCLS SRA Diseases Search
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠ࣬ױΧςΰϦผ ΫϦοΫͯ͠πϦʔΛల։ ࣈΛΫϦοΫͯ͠ Ϧετදࣔ DBCLS SRA Diseases Search
%#$-4ಠࣗͷݕࡧػೳ S DBCLS SRA Metadata Search
S ΑΓϢʔβࢦͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷใΛݕࡧʹөͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͍͚ͳ͍ࣝΛͳΔ͘ݮΒ͢ɽ
ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷใ ಠࣗͷใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷޮ͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯͷΈࠐΈՄೳʹɽ
։ൃऀ͚ใɼαϙʔτ༻ πΠολʔΞΧϯτ ϑϦʔϫʔυݕࡧ http://sra.dbcls.jp/search DBCLS SRA Metadata Search ߜΓࠐΈݕࡧ
݅ʹ֘͢Δσʔλʹ ରͯ͠ϑϦʔϫʔυݕࡧ ݅ʹ֘͢Δ σʔλΛશͯදࣔ DBCLS SRA Metadata Search ֤݅ʹ֘͢Δ σʔλͷׂ߹
ߜΓࠐΈݕࡧ .VTNVTDVMVT5SBOTDSJQUPNF*MMVNJOB.J4FR
ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ ΩʔϫʔυͰΞΠςϜΛ ߜΓࠐΉ DBCLS SRA Metadata Search ώοτͨ͠σʔλͷใɽ ੨͍ߦจใ͖
ݕࡧ݁Ռ
ϓϩδΣΫτͷ֓ཁ จͷ֓ཁͱཁࢫ DBCLS SRA Metadata Search 1VC.FE 1.$ͷϦϯΫ 431ΛΫϦοΫͨ݁͠Ռ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ΫϦοΫͯ͠ల։ .BUFSJBMTBOE.FUIPET 3FTVMUT
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ μϯϩʔυϦϯΫ 3VO
4BNQMFͷใ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ શମͰͷࠩΛϋΠϥΠτ
DBCLS SRA Metadata Search 3VOͷΫΦϦςΟใ 433 ϦʔυɼϦʔυɼ ($ͳͲͷใ ֤Ϟδϡʔϧͷ݁ՌΛ ΫϦοΫ֦ͯ͠େ
S ݕࡧΛՄࢹԽ͢Δ จใؚΊͨΩʔϫʔυݕࡧ ϦʔυͷใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹ΛݟͯஅͰ͖Δɽ
ͳΔ͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧରΛ֦େ͍ͯ͠Δɽ μϯϩʔυʹ͍࣌ؒΛཁ͢Δ͜ͱɽ ࣮֬ʹ͑Δσʔλ͚ͩΛબͿͨΊͷใΛఏڙɽ
·ͱΊ E Summary #2
Summary #2 %#$-443"43"ͷػೳ֦ுͰ͋Δ σʔλొड͚͚ͣɼ43"ͷঢ়گΛѲ͢ΔͨΊͷใ ΑΓσʔλΛ୳͍͢͠ݕࡧػೳΛఏڙ͍ͯ͠Δɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔੑಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ E
X /(4σʔλݕࡧͱར༻ͷ࣮ྫ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ Search published NGS data and project
X ஈ֊ผɾެڞσʔλͷར༻ྫ Use cases of Public data
X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷใΛݩʹɼ
γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜछɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛՃͯ͠ɼ ղੳͷਫ਼্ʹཱͯΔɽ
σʔλͷݕࡧɾར༻Ͱඞཁͳ͜ͱ ∑ Practical search tips
∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛѲ͢Δ γʔέϯαͷϦʔυεϖοΫΛѲ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυɼϦʔυҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫɼ
ࢼༀͷΞοϓσʔτʹΑͬͯมΘΔͷͰҙ͕ඞཁɽ ੜछͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷใ͕ॏཁ
Required read spec by application application / ࣮ݧछ total bases
/ ૯Ԙج read length / Ϧʔυ read number (M) / Ϧʔυ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔ֶผ࣍ੈγʔέϯαʔతผΞυόϯετϝιουQΑΓҾ༻ ରͷήϊϜαΠζͳͲͰࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ͋Γ·͢
Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩࢠൃݱ੍ޚղੳ ৽نήϊϜྻܾఆ ΤϐδΣωςΟΫεղੳ
ϝλήϊϜղੳ ήϊϜߏղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એͰ͕͢པ·Ε͍ͯΔΘ͚ͰചΕΔͱʹ͓͕ۚೖΔΘ͚Ͱ͋Γ·ͤΜ
Read spec, still improving ࢼༀιϑτΣΞͷ্ʹΑΓಉ͡γʔέϯαͰ ϦʔυϦʔυ͕සൟʹมΘΔ ྫJMMVNJOBࣾ.J4FR
ྫϚεͷҨࢠൃݱͷݚڀྫΛ୳͢ ∑ Example survey: mouse brain transcriptome
Example survey: mouse brain transcriptome ੜछͱ࣮ݧछΛࢦఆ TVCNJUDPOEJUJPOΛԡ͢ http://sra.dbcls.jp/search γʔέϯαۭཝͷ··
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/filter?species=Mus %20musculus&type=Transcriptome&instrument= ΩʔϫʔυʹzCSBJOzΛ ೖྗͯ͠zTFBSDIzΛԡ͢ Example survey: mouse brain transcriptome
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain 4UVEZ5JUMFͷԼͷೖྗཝʹ lCSBJOzͱೖྗͯ͠ߜΓࠐΈ Example survey: mouse brain transcriptome
ϓϩδΣΫτΛ৽͍͠ॱ ʹฒΔͨΊ4UVEZ*%Λ ΫϦοΫ ͜ͷϓϩδΣΫτͷσʔλΛ ݟͯΈ·͢ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain Example survey: mouse
brain transcriptome
ϓϩδΣΫτͷ֓ཁ http://sra.dbcls.jp/search/view/SRP011204 ϓϩδΣΫτͰߦΘΕͨ γʔέϯεͷ֓ཁ Example survey: mouse brain transcriptome
Ϧʔυ˺d. ϦʔυC UPUBMd(C http://sra.dbcls.jp/search/view/SRP011204 ͭͷ4BNQMFͰ3VO αϯϓϧׂ SFQMJDBUFT Example survey: mouse
brain transcriptome
λΠτϧʹ͋ͬͨ(&0*% l(4&zͰݕࡧ http://www.ncbi.nlm.nih.gov/geo/ Example survey: mouse brain transcriptome
SFQMJDBUFTͰͨ͠ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
(&0Ͱจͷใ͕ Ξοϓσʔτ͞Ε͍ͯΔ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 ͦΕͧΕͷαϯϓϧͷ ৄ͍͠ใ $POUSPMͷใΛݟΔͨΊ (&04BNQMF*%ΛΫϦοΫ Example survey:
mouse brain transcriptome
4BNQMF$IBSBDUFSJTUJDT http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFॲཧͷϓϩτίϧ Example survey: mouse brain transcriptome
43"ͷ&YQFSJNFOU*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFͷؔΛݟΔͨΊ #JPTBNQMF*%ΛΫϦοΫ Example survey: mouse brain transcriptome
ରԠ͢Δ43"4BNQMF*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 43"ϑΥʔϚοτͷ ྻσʔλͷ%-ϦϯΫ Example survey: mouse brain transcriptome
(&0ͷϖʔδʹͬͯ จͷϦϯΫΛΫϦοΫ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
͔ͤͬ͘ͳͷͰ 1VC3FBEFSͰશจΛ֬ೝ http://www.ncbi.nlm.nih.gov/pubmed/22563483 Example survey: mouse brain transcriptome
/BWJHBUJPOΛΫϦοΫ .BUFSJBMT.FUIPETΛ ΫϦοΫ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader Example survey: mouse brain transcriptome
σʔλղੳʹ͍ͭͯ ར༻ͨ͠πʔϧͳͲ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader ϥΠϒϥϦௐͱ γʔέϯγϯάʹ͍ͭͯ Example survey: mouse brain
transcriptome
∑ ݅ʹ߹͏σʔλͷϦʔυͷใΛಘΔ ϥΠϒϥϦௐσʔλղੳͷใΛಘΔ ݅ʹ߹ͬͨσʔλΛμϯϩʔυ Practical search tips Ϧʔυͷ͞ɼϦʔυɼαϯϓϧͷใͳͲɽ ར༻తʹ߹͍ͬͯΔ͔ɼσʔλͷे͔ɽ 43"ʹهࡌ͕͋Δ͜ͱଟ͘ͳ͍ɽ
จ(&0ͳͲ֎෦%#ͷใΛ୧Δ͜ͱͰಘΒΕΔ͜ͱɽ σʔλʹΑͬͯ%-ɼϑΝΠϧల։ʹඇৗʹ͕͔͔࣌ؒΔɽ %%#+'51ͰGBTURΛ%-ɼ͘͠%%#+ύΠϓϥΠϯΛར༻ɽ
σʔλͷ֬ೝͱμϯϩʔυ ∑ Quality check and download
Read quality check ϦʔυͷҐஔ͝ͱͷ ΫΦϦςΟΛνΣοΫ http://sra.dbcls.jp/search/view/SRR426841 ($ͳͲνΣοΫ
Data download via FTP l'51zΛΫϦοΫ http://sra.dbcls.jp/search/view/SRP011204 %#ܗࣜΛબ͢Δͱ '51αΠτ͕։͘
Data download via FTP http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR426841 '"45243"-JUF ͲͪΒ͔ͷܗࣜΛΫϦοΫ
'51αΠτʹήετͰϩάΠϯ C[ܗࣜͰѹॖ͞ΕͨGBTURϑΝΠϧʹΞΫηεͰ͖·͢ Data download via FTP
ύΠϓϥΠϯΛར༻͢Δ ∑ DDBJ Read Annotation Pipeline
DDBJ Read Annotation Pipeline ϩάΠϯޙɼ l*NQPSUQVCMJD%3"zΛ ΫϦοΫ https://p.ddbj.nig.ac.jp/ˠϩάΠϯ 43"*%Λೖྗͯ͠ σʔλΛύΠϓϥΠϯʹՃ
·ͱΊ Summary #3 X
X Summary #3 จݙϦʔυใΛ׆༻ͯ͠ඞཁͳใΛಘΔ ϦʔυͷใϥΠϒϥϦௐɾղੳͳͲͷใ͕ඞཁɽ Ͳ͏ͯ͠ใ͕ݟ͔ͭΒͳ͍࣌ఘΊΔͷେࣄɽ ެڞͷղੳύΠϓϥΠϯΛ͏·͘ར༻͢Δ ڊେͳσʔλ%-ʹ͕͔͔࣌ؒΓɼ)%%༰ྔѹഭ͢Δɽ %%#+ύΠϓϥΠϯΛ׆༻͢Δ͜ͱͰίετΛԼ͛ΒΕΔɽ
ΦϯϥΠϯͰඞཁͳใΛ୳͢ Œ Online Reference
IUUQHJUIVCDPNJOVUBOPTSB@NFUBEBUB@UPPMLJUXJLJ 43" /(4ʹؔ͢ΔϦϑΝϨϯεͱϦϯΫू Online Reference
࣭ٙԠ Thank you for your attention ¿ ·࣭ͨUPIUB!ECDMTSPJTBDKQ·Ͱ