Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
Search
Tazro Inutano Ohta
August 30, 2013
Research
0
160
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」
Tazro Inutano Ohta
August 30, 2013
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
95
Standardization of biological sample information database
inutano
0
38
Describe data analysis workflow with workflow languages
inutano
4
3.7k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
310
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
63
Workflows that run everywhere and where to run them
inutano
0
130
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
220
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
83
Container, Cloud, and HPC
inutano
0
140
Other Decks in Research
See All in Research
SANER 2019 Most Influential Paper Talk
tsantalis
0
120
Prompt Tuning から Fine Tuning への移行時期推定
icoxfog417
17
7k
一般化ランダムフォレストの理論と統計的因果推論への応用
tomoshige_n
10
1.8k
DeepCrysTet: A Deep Learning Approach Using Tetrahedral Mesh for Predicting Properties of Crystalline Materials
tsurubee
0
370
Refactoring Mining - The key to unlock software evolution
tsantalis
0
260
ICLR2024 LLMエージェントの研究動向
masatoto
6
1.7k
CVPR2023 EarthVision Workshopより衛星画像関連論文紹介 / Satellite Imaging Processing Papers in CVPR2023 EarthVision Workshop
nttcom
0
120
MLtraq: Track your AI experiments at hyperspeed
micheda
1
110
Gmail の「メール送信者のガイドライン」強化から 1 ヵ月、今後予想されるメールセキュリティの変化とは
hirachan
1
240
説明可能AI:代表的手法と最近の動向
yuyay
1
600
LLMマルチエージェントを俯瞰する
masatoto
26
16k
HP (Hitto Point: 筆頭ポイント)
tanichu
0
720
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
31
46k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
226
51k
Web development in the modern age
philhawksworth
202
10k
The MySQL Ecosystem @ GitHub 2015
samlambert
243
12k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
17
1.4k
Building a Scalable Design System with Sketch
lauravandoore
456
32k
Designing for humans not robots
tammielis
248
25k
How GitHub (no longer) Works
holman
304
140k
Product Roadmaps are Hard
iamctodd
44
9.7k
Building a Modern Day E-commerce SEO Strategy
aleyda
17
6.4k
Designing Experiences People Love
moore
136
23k
Being A Developer After 40
akosma
57
580k
Transcript
࣍ੈγʔέϯεσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛར༻͢Δ AJACS#42 TOYAMA ౷߹σʔλϕʔεߨशձ"+"$4ࢁ 0
ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔಛٕज़ઐһେాୡ 5B[SP0IUB 5FDI4QFDJBMJTU %BUBCBTF$FOUFSGPS-JGF4DJFODF Effective SRA - public database for
high-throughput sequencing
͓͜ͱΘΓ Preface A
/(4σʔλղੳͷνϡʔτϦΞϧ͋Γ·ͤΜ %#ͷσʔλొͷνϡʔτϦΞϧ͋Γ·ͤΜ /(4ݚڀΛαϙʔτ͢Δެڞσʔλͷ͓Ͱ͢ Preface σʔλղੳΛαϙʔτ͢ΔϦιʔε͝հ͠·͢ σʔλొͷࡍʹඞཁͳใ͝հ͠·͢ ݚڀͷݱͰ׆͔ͨ͢Ίͷެڞ%#ͷ͍ํΛ͝հ͠·͢ A A D
ຊͷ༰ 0 Table of Contents
4FRVFODF3FBE"SDIJWF43"ʹ͍ͭͯ %#$-4ʹ͓͚ΔऔΓΈ /(4σʔλݕࡧͱར༻ͷ࣮ྫ Table of Contents ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ n
E X
ެڞ/(4σʔλϕʔε4FRVFODF3FBE"SDIJWF 43" ʹ͍ͭͯ ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ SRA: The public DB for primary
NGS data n
ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰొɾσʔλΞΫηε͕Մೳɽ ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ
ྻH[C[͘͠ಠࣗܗࣜͰѹॖ͞Εͨͷ͕%-Մೳɽ C L J ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍ྻσʔλΛडɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλผ%# EC(B1 &(" ʹొɽ
INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org ήϊϜใͷඪ४ԽͳͲͷใ %#DPMMBCͷϙϦγʔ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
NCBI SRA http://www.ncbi.nlm.nih.gov/sra ϑϦʔϫʔυݕࡧ ৄࡉݕࡧ 43"#-"45 ϩϯάϦʔυͷΈ 4PGUXBSF 43"UPPMLJU
NCBI SRA #JPQSPKFDUͷώοτ ώοτ݅ &YQFSJNFOU୯Ґ λΠτϧɼγʔέϯαɼ γʔέϯεྔͳͲͷใ ώοτͨ͠ੜछ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
NCBI SRA ࣮ݧͷλΠτϧ MBZPVU BEBQUPSͳͲͷ Ϧʔυͷใ γʔέϯγϯάϥϯ͝ͱͷ ใͱ%-ϦϯΫ ݕࡧ݁Ռͷτοϓώοτ 439
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
EMBL-EBI ENA http://www.ebi.ac.uk/ena ৄࡉݕࡧ ϑϦʔϫʔυݕࡧ ྻݕࡧ
EMBL-EBI ENA ֤ΧςΰϦʹ͓͚Δݕࡧ݁Ռͷ ώοτ ֤ΧςΰϦͷτοϓώοτ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
EMBL-EBI ENA ࣮ݧʹ͍ͭͯͷใ γʔέϯγϯάϥϯͷใɼ %-ϦϯΫ z&YQFSJNFOUzͷτοϓώοτ 439 Ұׅ%-ɼςΩετܗࣜͰͷ දࣔɼΧϥϜͷબ දࣔ͞Ε͍ͯΔใΛ
ςΩετܗࣜͰ%-
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
DDBJ DRA αΠτݕࡧ σʔλͷݕࡧɼ σʔλͷొɼ ಈըϚχϡΞϧ http://trace.ddbj.ac.jp/dra
DDBJ DRA *%ʹΑΔݕࡧɼ ϑΝηοτ ߜࠐ ݕࡧɼ ΩʔϫʔυʹΑΔݕࡧ ੜछɼ࣮ݧछɼσʔλొݩͷ ϥϯΩϯάUPQ http://trace.ddbj.ac.jp/DRASearch
*%छ͝ͱͷΤϯτϦ
DDBJ DRA ૯ώοτ ݕࡧ݁Ռ http://trace.ddbj.ac.jp/DRASearch ϝλσʔλͷλΠϓͱ ੜछʹΑΔߜΓࠐΈ
DDBJ DRA ؔ࿈ΞΠςϜͷϦϯΫͱ %-ϦϯΫ ࣮ݧͷৄࡉใ ϥΠϒϥϦ࡞ɼ γʔέϯαɼ ϕʔείʔϧͳͲͷใ z&YQFSJNFOUzͰߜΓࠐΈˠτοϓώοτ 439
DDBJ DRA /BWJHBUJPOˠ3VO 433 %-Ͱ͖ͳ͍ͷͷྫ
DDBJ DRA %3"4FBSDIˠ%33 %-͕Մೳͳͷͷྫ Ϧʔυͷใ RVBMJUZʹνΣοΫ͢Δͱ QISFETDPSF͕දࣔ͞ΕΔ 'BTURܗࣜͱ 43"-JUFܗࣜɼ ͦΕͧΕͷ%-ϦϯΫ
'51
DDBJ DRA /$#*43"Ͱz433zΛݕࡧˠ3FDPSEJTSFNPWFE
Handson ݕࡧͯ͠ΈΔ ੜछɼγʔέϯαʔ໊ɼҨࢠ໊ɼ࣬ױ໊ͳͲͰݕࡧɽ "EWBODFEৄࡉݕࡧͬͯΈΔɽ c ग़͖ͯͨσʔλͷৄࡉΛௐΔ σʔλ͕ͲΕ͘Β͍ͷେ͖͔͞ௐΔ μϯϩʔυʹͲΕ͘Β͍͕͔͔࣌ؒΓͦ͏͔ʁ ϋʔυσΟεΫͷۭ͖༰ྔʹऩ·Δ͔ʁ ώοτ͕ͨ݅͠ଟ͗͢Δগͳ͗͢Δ࣌ผͷݕࡧΛࢼ͢ɽ
໘നͦ͏ͳσʔλ͔Ͳ͏͔அͰ͖ΔใΛ୳͢ɽ
Search Tips ͦΕͧΕ͕ಠࣗʹػೳΛ։ൃ͍ͯ͠Δ ࣮ߦͰ͖Δݕࡧͷछྨɼ݁ՌͷදࣔͳͲ͕ҟͳΔɽ *%ڞ௨ͳͷͰɼ͍͚Δ͜ͱͰΑΓศརʹ୳ͤΔɽ O %-Ͱ͖ͳ͍σʔλ͋Δ ϝλσʔλʹهड़͞Εͳ͍ใݕࡧͰ͖ͳ͍ ϝλσʔλͱɼྻσʔλʹର͢Δऍσʔλͷ͜ͱɽ ࢦఆͷܗࣜʹै͍ొऀʹΑͬͯهड़͞ΕΔɽ
༷ʑͳཧ༝ͰొऀʹΑͬͯऔΓԼ͛ΒΕΔͳͲͷଞʹɼ ొ͞Ε͔ͨΓͰڞ༗͞Ε͍ͯͳ͍ͨΊݟ͔ͭΒͳ͍͜ͱɽ
! ϝλσʔλ Metadata Object
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ྻσʔλͱڞʹొ͞ΕΔϝλσʔλछྨͷΦϒδΣΫτ͔Β ߏ͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡ใ͕هड़͞ΕΔ
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ϝλσʔλΛొ୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔੑ͜ͷΑ͏ʹͳΔ
Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002
DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%σʔλΛड͚͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷࣈͰࣔ͞ΕΔ
Metadata Tips ΦϒδΣΫτɼ*%ͷؔෳࡶ େنͳϓϩδΣΫτʹͳΔͱ3VO4BNQMF͕ඦʹͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔͷ͋Δɽ O lͲͷใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛѲ͢Δ ొऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ
ಛʹϥΠϒϥϦௐͷ߲ͳͲɽ จͳͲͷใ͕ߋ৽͞ΕͯΞοϓσʔτ͞Εͳ͍߹ɽ σʔλͷొ͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹॏཁɽ ৄ͘͠IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
·ͱΊ n Summary #1
Summary #1 43"*/4%$ϝϯόʔہʹΑͬͯӡӦ͞ΕΔ σʔλڞ༗͞ΕΔͷͰೖΓޱ͕Ͳ͜Ͱಉ͕ͩ͡ɼ ݕࡧػೳͳͲ͕ͦΕͧΕҟͳΔɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ n
E %#$-4ʹ͓͚ΔऔΓΈ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ Tech Dev at DBCLS - Search and
Statistics
ଞ%#ͱͷ౷߹ ݕࡧػೳͷ։ൃ ౷ܭʹΑΔ%#ͷݱঢ়Ѳ DBCLS v SRA ϝλσʔλ͚ͩͰͳ͘ɼจͳͲͷจݙใɼ ࣬ױͷใɼ͞ΒʹݸผσʔλͷྻΫΦϦςΟΛܭࢉɽ ہͷػೳΛ౷߹ͭͭ͠ɼಠࣗͷػೳΛՃͨ͠ɼ ΑΓσʔλར༻ऀΛࢦͨ͠ݕࡧػೳΛ։ൃɽ
ϝλσʔλΛݩʹͨ͠ొͷਪҠΛެ։ɽ ͞ΒʹྻใΛݩʹͨ͠%#શମͷใΛੳɽ ≠ π ¥
%#$-443" ≠ DBCLS SRA
DBCLS SRA http://sra.dbcls.jp/ ొ͞Ε͍ͯΔσʔλΛ ϝλσʔλผʹϦετදࣔ 43"*%ੜछɼ γʔέϯαͳͲ͔Βݕࡧ
http://sra.dbcls.jp/ ࣮ݧछɼγʔέϯαɼ ੜछ͝ͱͷϥϯΩϯά ʹΑΔਪҠͷάϥϑ DBCLS SRA
http://sra.dbcls.jp/ σʔλΛจݙใ͔Β୳͢ σʔλΛ࣬ױใ͔Β୳͢ DBCLS SRA
จݙใͷ౷߹ ∆ DBCLS SRA Publication Search
ྻσʔλͷใจͷํ͕ৄ͍͠ ྻσʔλ͕จΑΓલʹެ։͞ΕΔ͜ͱ ϝλσʔλʹจݙใ͕ه͞Εͳ͍͜ͱ͕͋Δ DBCLS SRA Publication Search ݚڀͷதͰͷγʔέϯεͷҐஔ͚ͮॏཁɽ .BUFSJBMT.FUIPETʹৄ͍͠ใ͕͋Δ͜ͱ͕ଟ͍ɽ άϥϯτͷ੍ɼδϟʔφϧʹΑΔσʔλެ։ͷࢦࣔͳͲɽ
େنͳϓϩδΣΫτͰެ։ϙϦγʔΛઃఆ͢Δ͜ͱɽ Ұొ͞Εͨޙʹϝλσʔλ͕Ξοϓσʔτ͞Εͳ͍ɽ ެ։͞Εͨσʔλͱจͷඥ͚Λߦ͏ඞཁ͕͋Δɽ ∆
%#$-443"ˠzจݙ͔Β୳͢z ࣮ݧछɼγʔέϯαɼ ੜछʹΑΔߜΓࠐΈݕࡧ 43"*%ͱ1VC.FE*%ͷର Ԡද͓Αͼจݙͷใ ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ DBCLS SRA Publication
Search
࣬ױใͷ౷߹ ® DBCLS SRA Diseases Search
ΫϦχΧϧγʔέϯεͷݕࡧࠔ ϝλσʔλͷهड़͚ͩͰෆेͳ߹ จݙใʹ༩͞ΕͨλάΛར༻͢Δ DBCLS SRA Diseases Search શήϊϜγʔέϯεଟܕͷใͳͲɼ 43"Ͱެ։͞Εͳ͍߹ଟ͍ɽ ొऀʹΑͬͯهड़ͷํɼใྔʹ͕ࠩ͋ΔͨΊɼ
Ұׅͯ͠ݕࡧ͢Δ͜ͱ͕͍͠ɽ 1VC.FEΤϯτϦʹ༩͞ΕΔ.F4)λʔϜΛར༻ͯ͠ɼ ࣬ױͷใΛΩʔʹͨ͠σʔλݕࡧػೳΛ։ൃɽ ®
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠසผ ࣬ױλΠϓ͔Βݕࡧ ࣬ױ໊ͱొσʔλ දࣔ݅ͷࢦఆ DBCLS SRA Diseases Search
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠ࣬ױΧςΰϦผ ΫϦοΫͯ͠πϦʔΛల։ ࣈΛΫϦοΫͯ͠ Ϧετදࣔ DBCLS SRA Diseases Search
%#$-4ಠࣗͷݕࡧػೳ S DBCLS SRA Metadata Search
S ΑΓϢʔβࢦͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷใΛݕࡧʹөͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͍͚ͳ͍ࣝΛͳΔ͘ݮΒ͢ɽ
ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷใ ಠࣗͷใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷޮ͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯͷΈࠐΈՄೳʹɽ
։ൃऀ͚ใɼαϙʔτ༻ πΠολʔΞΧϯτ ϑϦʔϫʔυݕࡧ http://sra.dbcls.jp/search DBCLS SRA Metadata Search ߜΓࠐΈݕࡧ
݅ʹ֘͢Δσʔλʹ ରͯ͠ϑϦʔϫʔυݕࡧ ݅ʹ֘͢Δ σʔλΛશͯදࣔ DBCLS SRA Metadata Search ֤݅ʹ֘͢Δ σʔλͷׂ߹
ߜΓࠐΈݕࡧ .VTNVTDVMVT5SBOTDSJQUPNF*MMVNJOB.J4FR
ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ ΩʔϫʔυͰΞΠςϜΛ ߜΓࠐΉ DBCLS SRA Metadata Search ώοτͨ͠σʔλͷใɽ ੨͍ߦจใ͖
ݕࡧ݁Ռ
ϓϩδΣΫτͷ֓ཁ จͷ֓ཁͱཁࢫ DBCLS SRA Metadata Search 1VC.FE 1.$ͷϦϯΫ 431ΛΫϦοΫͨ݁͠Ռ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ΫϦοΫͯ͠ల։ .BUFSJBMTBOE.FUIPET 3FTVMUT
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ μϯϩʔυϦϯΫ 3VO
4BNQMFͷใ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ શମͰͷࠩΛϋΠϥΠτ
DBCLS SRA Metadata Search 3VOͷΫΦϦςΟใ 433 ϦʔυɼϦʔυɼ ($ͳͲͷใ ֤Ϟδϡʔϧͷ݁ՌΛ ΫϦοΫ֦ͯ͠େ
S ݕࡧΛՄࢹԽ͢Δ จใؚΊͨΩʔϫʔυݕࡧ ϦʔυͷใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹ΛݟͯஅͰ͖Δɽ
ͳΔ͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧରΛ֦େ͍ͯ͠Δɽ μϯϩʔυʹ͍࣌ؒΛཁ͢Δ͜ͱɽ ࣮֬ʹ͑Δσʔλ͚ͩΛબͿͨΊͷใΛఏڙɽ
·ͱΊ E Summary #2
Summary #2 %#$-443"43"ͷػೳ֦ுͰ͋Δ σʔλొड͚͚ͣɼ43"ͷঢ়گΛѲ͢ΔͨΊͷใ ΑΓσʔλΛ୳͍͢͠ݕࡧػೳΛఏڙ͍ͯ͠Δɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔੑಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ E
X /(4σʔλݕࡧͱར༻ͷ࣮ྫ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ Search published NGS data and project
X ஈ֊ผɾެڞσʔλͷར༻ྫ Use cases of Public data
X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷใΛݩʹɼ
γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜछɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛՃͯ͠ɼ ղੳͷਫ਼্ʹཱͯΔɽ
σʔλͷݕࡧɾར༻Ͱඞཁͳ͜ͱ ∑ Practical search tips
∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛѲ͢Δ γʔέϯαͷϦʔυεϖοΫΛѲ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυɼϦʔυҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫɼ
ࢼༀͷΞοϓσʔτʹΑͬͯมΘΔͷͰҙ͕ඞཁɽ ੜछͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷใ͕ॏཁ
Required read spec by application application / ࣮ݧछ total bases
/ ૯Ԙج read length / Ϧʔυ read number (M) / Ϧʔυ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔ֶผ࣍ੈγʔέϯαʔతผΞυόϯετϝιουQΑΓҾ༻ ରͷήϊϜαΠζͳͲͰࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ͋Γ·͢
Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩࢠൃݱ੍ޚղੳ ৽نήϊϜྻܾఆ ΤϐδΣωςΟΫεղੳ
ϝλήϊϜղੳ ήϊϜߏղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એͰ͕͢པ·Ε͍ͯΔΘ͚ͰചΕΔͱʹ͓͕ۚೖΔΘ͚Ͱ͋Γ·ͤΜ
Read spec, still improving ࢼༀιϑτΣΞͷ্ʹΑΓಉ͡γʔέϯαͰ ϦʔυϦʔυ͕සൟʹมΘΔ ྫJMMVNJOBࣾ.J4FR
ྫϚεͷҨࢠൃݱͷݚڀྫΛ୳͢ ∑ Example survey: mouse brain transcriptome
Example survey: mouse brain transcriptome ੜछͱ࣮ݧछΛࢦఆ TVCNJUDPOEJUJPOΛԡ͢ http://sra.dbcls.jp/search γʔέϯαۭཝͷ··
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/filter?species=Mus %20musculus&type=Transcriptome&instrument= ΩʔϫʔυʹzCSBJOzΛ ೖྗͯ͠zTFBSDIzΛԡ͢ Example survey: mouse brain transcriptome
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain 4UVEZ5JUMFͷԼͷೖྗཝʹ lCSBJOzͱೖྗͯ͠ߜΓࠐΈ Example survey: mouse brain transcriptome
ϓϩδΣΫτΛ৽͍͠ॱ ʹฒΔͨΊ4UVEZ*%Λ ΫϦοΫ ͜ͷϓϩδΣΫτͷσʔλΛ ݟͯΈ·͢ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain Example survey: mouse
brain transcriptome
ϓϩδΣΫτͷ֓ཁ http://sra.dbcls.jp/search/view/SRP011204 ϓϩδΣΫτͰߦΘΕͨ γʔέϯεͷ֓ཁ Example survey: mouse brain transcriptome
Ϧʔυ˺d. ϦʔυC UPUBMd(C http://sra.dbcls.jp/search/view/SRP011204 ͭͷ4BNQMFͰ3VO αϯϓϧׂ SFQMJDBUFT Example survey: mouse
brain transcriptome
λΠτϧʹ͋ͬͨ(&0*% l(4&zͰݕࡧ http://www.ncbi.nlm.nih.gov/geo/ Example survey: mouse brain transcriptome
SFQMJDBUFTͰͨ͠ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
(&0Ͱจͷใ͕ Ξοϓσʔτ͞Ε͍ͯΔ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 ͦΕͧΕͷαϯϓϧͷ ৄ͍͠ใ $POUSPMͷใΛݟΔͨΊ (&04BNQMF*%ΛΫϦοΫ Example survey:
mouse brain transcriptome
4BNQMF$IBSBDUFSJTUJDT http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFॲཧͷϓϩτίϧ Example survey: mouse brain transcriptome
43"ͷ&YQFSJNFOU*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFͷؔΛݟΔͨΊ #JPTBNQMF*%ΛΫϦοΫ Example survey: mouse brain transcriptome
ରԠ͢Δ43"4BNQMF*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 43"ϑΥʔϚοτͷ ྻσʔλͷ%-ϦϯΫ Example survey: mouse brain transcriptome
(&0ͷϖʔδʹͬͯ จͷϦϯΫΛΫϦοΫ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
͔ͤͬ͘ͳͷͰ 1VC3FBEFSͰશจΛ֬ೝ http://www.ncbi.nlm.nih.gov/pubmed/22563483 Example survey: mouse brain transcriptome
/BWJHBUJPOΛΫϦοΫ .BUFSJBMT.FUIPETΛ ΫϦοΫ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader Example survey: mouse brain transcriptome
σʔλղੳʹ͍ͭͯ ར༻ͨ͠πʔϧͳͲ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader ϥΠϒϥϦௐͱ γʔέϯγϯάʹ͍ͭͯ Example survey: mouse brain
transcriptome
∑ ݅ʹ߹͏σʔλͷϦʔυͷใΛಘΔ ϥΠϒϥϦௐσʔλղੳͷใΛಘΔ ݅ʹ߹ͬͨσʔλΛμϯϩʔυ Practical search tips Ϧʔυͷ͞ɼϦʔυɼαϯϓϧͷใͳͲɽ ར༻తʹ߹͍ͬͯΔ͔ɼσʔλͷे͔ɽ 43"ʹهࡌ͕͋Δ͜ͱଟ͘ͳ͍ɽ
จ(&0ͳͲ֎෦%#ͷใΛ୧Δ͜ͱͰಘΒΕΔ͜ͱɽ σʔλʹΑͬͯ%-ɼϑΝΠϧల։ʹඇৗʹ͕͔͔࣌ؒΔɽ %%#+'51ͰGBTURΛ%-ɼ͘͠%%#+ύΠϓϥΠϯΛར༻ɽ
σʔλͷ֬ೝͱμϯϩʔυ ∑ Quality check and download
Read quality check ϦʔυͷҐஔ͝ͱͷ ΫΦϦςΟΛνΣοΫ http://sra.dbcls.jp/search/view/SRR426841 ($ͳͲνΣοΫ
Data download via FTP l'51zΛΫϦοΫ http://sra.dbcls.jp/search/view/SRP011204 %#ܗࣜΛબ͢Δͱ '51αΠτ͕։͘
Data download via FTP http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR426841 '"45243"-JUF ͲͪΒ͔ͷܗࣜΛΫϦοΫ
'51αΠτʹήετͰϩάΠϯ C[ܗࣜͰѹॖ͞ΕͨGBTURϑΝΠϧʹΞΫηεͰ͖·͢ Data download via FTP
ύΠϓϥΠϯΛར༻͢Δ ∑ DDBJ Read Annotation Pipeline
DDBJ Read Annotation Pipeline ϩάΠϯޙɼ l*NQPSUQVCMJD%3"zΛ ΫϦοΫ https://p.ddbj.nig.ac.jp/ˠϩάΠϯ 43"*%Λೖྗͯ͠ σʔλΛύΠϓϥΠϯʹՃ
·ͱΊ Summary #3 X
X Summary #3 จݙϦʔυใΛ׆༻ͯ͠ඞཁͳใΛಘΔ ϦʔυͷใϥΠϒϥϦௐɾղੳͳͲͷใ͕ඞཁɽ Ͳ͏ͯ͠ใ͕ݟ͔ͭΒͳ͍࣌ఘΊΔͷେࣄɽ ެڞͷղੳύΠϓϥΠϯΛ͏·͘ར༻͢Δ ڊେͳσʔλ%-ʹ͕͔͔࣌ؒΓɼ)%%༰ྔѹഭ͢Δɽ %%#+ύΠϓϥΠϯΛ׆༻͢Δ͜ͱͰίετΛԼ͛ΒΕΔɽ
ΦϯϥΠϯͰඞཁͳใΛ୳͢ Œ Online Reference
IUUQHJUIVCDPNJOVUBOPTSB@NFUBEBUB@UPPMLJUXJLJ 43" /(4ʹؔ͢ΔϦϑΝϨϯεͱϦϯΫू Online Reference
࣭ٙԠ Thank you for your attention ¿ ·࣭ͨUPIUB!ECDMTSPJTBDKQ·Ͱ