$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
August 30, 2013
Research
0
170
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」
Tazro Inutano Ohta
August 30, 2013
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
130
Standardization of biological sample information database
inutano
0
79
Describe data analysis workflow with workflow languages
inutano
5
5.6k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
350
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
110
Workflows that run everywhere and where to run them
inutano
0
160
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
300
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
120
Container, Cloud, and HPC
inutano
0
180
Other Decks in Research
See All in Research
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
kurita
1
290
cvpaper.challenge 10年の軌跡 / cvpaper.challenge a decade-long journey
gatheluck
3
630
高畑鬼界ヶ島と重文・称名寺本薬師如来像の来歴を追って/kikaigashima
kochizufan
0
100
「リアル×スキマ時間」を活用したUXリサーチ 〜新規事業を前に進めるためのUXリサーチプロセスの設計〜
techtekt
PRO
0
160
多言語カスタマーインタビューの“壁”を越える~PMと生成AIの共創~ 株式会社ジグザグ 松野 亘
watarumatsuno
0
160
MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation
satai
4
480
まずはここから:Overleaf共同執筆・CopilotでAIコーディング入門・Codespacesで独立環境
matsui_528
2
820
【輪講資料】Moshi: a speech-text foundation model for real-time dialogue
hpprc
3
820
その推薦システムの評価指標、ユーザーの感覚とズレてるかも
kuri8ive
1
270
投資戦略202508
pw
0
580
第二言語習得研究における 明示的・暗示的知識の再検討:この分類は何に役に立つか,何に役に立たないか
tam07pb915
0
400
離散凸解析に基づく予測付き離散最適化手法 (IBIS '25)
taihei_oki
PRO
1
610
Featured
See All Featured
It's Worth the Effort
3n
187
29k
A Tale of Four Properties
chriscoyier
162
23k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Into the Great Unknown - MozCon
thekraken
40
2.2k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Building Flexible Design Systems
yeseniaperezcruz
330
39k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Scaling GitHub
holman
464
140k
Transcript
࣍ੈγʔέϯεσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛར༻͢Δ AJACS#42 TOYAMA ౷߹σʔλϕʔεߨशձ"+"$4ࢁ 0
ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔಛٕज़ઐһେాୡ 5B[SP0IUB 5FDI4QFDJBMJTU %BUBCBTF$FOUFSGPS-JGF4DJFODF Effective SRA - public database for
high-throughput sequencing
͓͜ͱΘΓ Preface A
/(4σʔλղੳͷνϡʔτϦΞϧ͋Γ·ͤΜ %#ͷσʔλొͷνϡʔτϦΞϧ͋Γ·ͤΜ /(4ݚڀΛαϙʔτ͢Δެڞσʔλͷ͓Ͱ͢ Preface σʔλղੳΛαϙʔτ͢ΔϦιʔε͝հ͠·͢ σʔλొͷࡍʹඞཁͳใ͝հ͠·͢ ݚڀͷݱͰ׆͔ͨ͢Ίͷެڞ%#ͷ͍ํΛ͝հ͠·͢ A A D
ຊͷ༰ 0 Table of Contents
4FRVFODF3FBE"SDIJWF43"ʹ͍ͭͯ %#$-4ʹ͓͚ΔऔΓΈ /(4σʔλݕࡧͱར༻ͷ࣮ྫ Table of Contents ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ n
E X
ެڞ/(4σʔλϕʔε4FRVFODF3FBE"SDIJWF 43" ʹ͍ͭͯ ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ SRA: The public DB for primary
NGS data n
ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰొɾσʔλΞΫηε͕Մೳɽ ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ
ྻH[C[͘͠ಠࣗܗࣜͰѹॖ͞Εͨͷ͕%-Մೳɽ C L J ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍ྻσʔλΛडɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλผ%# EC(B1 &(" ʹొɽ
INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org ήϊϜใͷඪ४ԽͳͲͷใ %#DPMMBCͷϙϦγʔ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
NCBI SRA http://www.ncbi.nlm.nih.gov/sra ϑϦʔϫʔυݕࡧ ৄࡉݕࡧ 43"#-"45 ϩϯάϦʔυͷΈ 4PGUXBSF 43"UPPMLJU
NCBI SRA #JPQSPKFDUͷώοτ ώοτ݅ &YQFSJNFOU୯Ґ λΠτϧɼγʔέϯαɼ γʔέϯεྔͳͲͷใ ώοτͨ͠ੜछ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
NCBI SRA ࣮ݧͷλΠτϧ MBZPVU BEBQUPSͳͲͷ Ϧʔυͷใ γʔέϯγϯάϥϯ͝ͱͷ ใͱ%-ϦϯΫ ݕࡧ݁Ռͷτοϓώοτ 439
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
EMBL-EBI ENA http://www.ebi.ac.uk/ena ৄࡉݕࡧ ϑϦʔϫʔυݕࡧ ྻݕࡧ
EMBL-EBI ENA ֤ΧςΰϦʹ͓͚Δݕࡧ݁Ռͷ ώοτ ֤ΧςΰϦͷτοϓώοτ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
EMBL-EBI ENA ࣮ݧʹ͍ͭͯͷใ γʔέϯγϯάϥϯͷใɼ %-ϦϯΫ z&YQFSJNFOUzͷτοϓώοτ 439 Ұׅ%-ɼςΩετܗࣜͰͷ දࣔɼΧϥϜͷબ දࣔ͞Ε͍ͯΔใΛ
ςΩετܗࣜͰ%-
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
DDBJ DRA αΠτݕࡧ σʔλͷݕࡧɼ σʔλͷొɼ ಈըϚχϡΞϧ http://trace.ddbj.ac.jp/dra
DDBJ DRA *%ʹΑΔݕࡧɼ ϑΝηοτ ߜࠐ ݕࡧɼ ΩʔϫʔυʹΑΔݕࡧ ੜछɼ࣮ݧछɼσʔλొݩͷ ϥϯΩϯάUPQ http://trace.ddbj.ac.jp/DRASearch
*%छ͝ͱͷΤϯτϦ
DDBJ DRA ૯ώοτ ݕࡧ݁Ռ http://trace.ddbj.ac.jp/DRASearch ϝλσʔλͷλΠϓͱ ੜछʹΑΔߜΓࠐΈ
DDBJ DRA ؔ࿈ΞΠςϜͷϦϯΫͱ %-ϦϯΫ ࣮ݧͷৄࡉใ ϥΠϒϥϦ࡞ɼ γʔέϯαɼ ϕʔείʔϧͳͲͷใ z&YQFSJNFOUzͰߜΓࠐΈˠτοϓώοτ 439
DDBJ DRA /BWJHBUJPOˠ3VO 433 %-Ͱ͖ͳ͍ͷͷྫ
DDBJ DRA %3"4FBSDIˠ%33 %-͕Մೳͳͷͷྫ Ϧʔυͷใ RVBMJUZʹνΣοΫ͢Δͱ QISFETDPSF͕දࣔ͞ΕΔ 'BTURܗࣜͱ 43"-JUFܗࣜɼ ͦΕͧΕͷ%-ϦϯΫ
'51
DDBJ DRA /$#*43"Ͱz433zΛݕࡧˠ3FDPSEJTSFNPWFE
Handson ݕࡧͯ͠ΈΔ ੜछɼγʔέϯαʔ໊ɼҨࢠ໊ɼ࣬ױ໊ͳͲͰݕࡧɽ "EWBODFEৄࡉݕࡧͬͯΈΔɽ c ग़͖ͯͨσʔλͷৄࡉΛௐΔ σʔλ͕ͲΕ͘Β͍ͷେ͖͔͞ௐΔ μϯϩʔυʹͲΕ͘Β͍͕͔͔࣌ؒΓͦ͏͔ʁ ϋʔυσΟεΫͷۭ͖༰ྔʹऩ·Δ͔ʁ ώοτ͕ͨ݅͠ଟ͗͢Δগͳ͗͢Δ࣌ผͷݕࡧΛࢼ͢ɽ
໘നͦ͏ͳσʔλ͔Ͳ͏͔அͰ͖ΔใΛ୳͢ɽ
Search Tips ͦΕͧΕ͕ಠࣗʹػೳΛ։ൃ͍ͯ͠Δ ࣮ߦͰ͖Δݕࡧͷछྨɼ݁ՌͷදࣔͳͲ͕ҟͳΔɽ *%ڞ௨ͳͷͰɼ͍͚Δ͜ͱͰΑΓศརʹ୳ͤΔɽ O %-Ͱ͖ͳ͍σʔλ͋Δ ϝλσʔλʹهड़͞Εͳ͍ใݕࡧͰ͖ͳ͍ ϝλσʔλͱɼྻσʔλʹର͢Δऍσʔλͷ͜ͱɽ ࢦఆͷܗࣜʹै͍ొऀʹΑͬͯهड़͞ΕΔɽ
༷ʑͳཧ༝ͰొऀʹΑͬͯऔΓԼ͛ΒΕΔͳͲͷଞʹɼ ొ͞Ε͔ͨΓͰڞ༗͞Ε͍ͯͳ͍ͨΊݟ͔ͭΒͳ͍͜ͱɽ
! ϝλσʔλ Metadata Object
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ྻσʔλͱڞʹొ͞ΕΔϝλσʔλछྨͷΦϒδΣΫτ͔Β ߏ͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡ใ͕هड़͞ΕΔ
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ϝλσʔλΛొ୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔੑ͜ͷΑ͏ʹͳΔ
Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002
DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%σʔλΛड͚͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷࣈͰࣔ͞ΕΔ
Metadata Tips ΦϒδΣΫτɼ*%ͷؔෳࡶ େنͳϓϩδΣΫτʹͳΔͱ3VO4BNQMF͕ඦʹͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔͷ͋Δɽ O lͲͷใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛѲ͢Δ ొऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ
ಛʹϥΠϒϥϦௐͷ߲ͳͲɽ จͳͲͷใ͕ߋ৽͞ΕͯΞοϓσʔτ͞Εͳ͍߹ɽ σʔλͷొ͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹॏཁɽ ৄ͘͠IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
·ͱΊ n Summary #1
Summary #1 43"*/4%$ϝϯόʔہʹΑͬͯӡӦ͞ΕΔ σʔλڞ༗͞ΕΔͷͰೖΓޱ͕Ͳ͜Ͱಉ͕ͩ͡ɼ ݕࡧػೳͳͲ͕ͦΕͧΕҟͳΔɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ n
E %#$-4ʹ͓͚ΔऔΓΈ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ Tech Dev at DBCLS - Search and
Statistics
ଞ%#ͱͷ౷߹ ݕࡧػೳͷ։ൃ ౷ܭʹΑΔ%#ͷݱঢ়Ѳ DBCLS v SRA ϝλσʔλ͚ͩͰͳ͘ɼจͳͲͷจݙใɼ ࣬ױͷใɼ͞ΒʹݸผσʔλͷྻΫΦϦςΟΛܭࢉɽ ہͷػೳΛ౷߹ͭͭ͠ɼಠࣗͷػೳΛՃͨ͠ɼ ΑΓσʔλར༻ऀΛࢦͨ͠ݕࡧػೳΛ։ൃɽ
ϝλσʔλΛݩʹͨ͠ొͷਪҠΛެ։ɽ ͞ΒʹྻใΛݩʹͨ͠%#શମͷใΛੳɽ ≠ π ¥
%#$-443" ≠ DBCLS SRA
DBCLS SRA http://sra.dbcls.jp/ ొ͞Ε͍ͯΔσʔλΛ ϝλσʔλผʹϦετදࣔ 43"*%ੜछɼ γʔέϯαͳͲ͔Βݕࡧ
http://sra.dbcls.jp/ ࣮ݧछɼγʔέϯαɼ ੜछ͝ͱͷϥϯΩϯά ʹΑΔਪҠͷάϥϑ DBCLS SRA
http://sra.dbcls.jp/ σʔλΛจݙใ͔Β୳͢ σʔλΛ࣬ױใ͔Β୳͢ DBCLS SRA
จݙใͷ౷߹ ∆ DBCLS SRA Publication Search
ྻσʔλͷใจͷํ͕ৄ͍͠ ྻσʔλ͕จΑΓલʹެ։͞ΕΔ͜ͱ ϝλσʔλʹจݙใ͕ه͞Εͳ͍͜ͱ͕͋Δ DBCLS SRA Publication Search ݚڀͷதͰͷγʔέϯεͷҐஔ͚ͮॏཁɽ .BUFSJBMT.FUIPETʹৄ͍͠ใ͕͋Δ͜ͱ͕ଟ͍ɽ άϥϯτͷ੍ɼδϟʔφϧʹΑΔσʔλެ։ͷࢦࣔͳͲɽ
େنͳϓϩδΣΫτͰެ։ϙϦγʔΛઃఆ͢Δ͜ͱɽ Ұొ͞Εͨޙʹϝλσʔλ͕Ξοϓσʔτ͞Εͳ͍ɽ ެ։͞Εͨσʔλͱจͷඥ͚Λߦ͏ඞཁ͕͋Δɽ ∆
%#$-443"ˠzจݙ͔Β୳͢z ࣮ݧछɼγʔέϯαɼ ੜछʹΑΔߜΓࠐΈݕࡧ 43"*%ͱ1VC.FE*%ͷର Ԡද͓Αͼจݙͷใ ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ DBCLS SRA Publication
Search
࣬ױใͷ౷߹ ® DBCLS SRA Diseases Search
ΫϦχΧϧγʔέϯεͷݕࡧࠔ ϝλσʔλͷهड़͚ͩͰෆेͳ߹ จݙใʹ༩͞ΕͨλάΛར༻͢Δ DBCLS SRA Diseases Search શήϊϜγʔέϯεଟܕͷใͳͲɼ 43"Ͱެ։͞Εͳ͍߹ଟ͍ɽ ొऀʹΑͬͯهड़ͷํɼใྔʹ͕ࠩ͋ΔͨΊɼ
Ұׅͯ͠ݕࡧ͢Δ͜ͱ͕͍͠ɽ 1VC.FEΤϯτϦʹ༩͞ΕΔ.F4)λʔϜΛར༻ͯ͠ɼ ࣬ױͷใΛΩʔʹͨ͠σʔλݕࡧػೳΛ։ൃɽ ®
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠසผ ࣬ױλΠϓ͔Βݕࡧ ࣬ױ໊ͱొσʔλ දࣔ݅ͷࢦఆ DBCLS SRA Diseases Search
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠ࣬ױΧςΰϦผ ΫϦοΫͯ͠πϦʔΛల։ ࣈΛΫϦοΫͯ͠ Ϧετදࣔ DBCLS SRA Diseases Search
%#$-4ಠࣗͷݕࡧػೳ S DBCLS SRA Metadata Search
S ΑΓϢʔβࢦͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷใΛݕࡧʹөͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͍͚ͳ͍ࣝΛͳΔ͘ݮΒ͢ɽ
ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷใ ಠࣗͷใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷޮ͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯͷΈࠐΈՄೳʹɽ
։ൃऀ͚ใɼαϙʔτ༻ πΠολʔΞΧϯτ ϑϦʔϫʔυݕࡧ http://sra.dbcls.jp/search DBCLS SRA Metadata Search ߜΓࠐΈݕࡧ
݅ʹ֘͢Δσʔλʹ ରͯ͠ϑϦʔϫʔυݕࡧ ݅ʹ֘͢Δ σʔλΛશͯදࣔ DBCLS SRA Metadata Search ֤݅ʹ֘͢Δ σʔλͷׂ߹
ߜΓࠐΈݕࡧ .VTNVTDVMVT5SBOTDSJQUPNF*MMVNJOB.J4FR
ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ ΩʔϫʔυͰΞΠςϜΛ ߜΓࠐΉ DBCLS SRA Metadata Search ώοτͨ͠σʔλͷใɽ ੨͍ߦจใ͖
ݕࡧ݁Ռ
ϓϩδΣΫτͷ֓ཁ จͷ֓ཁͱཁࢫ DBCLS SRA Metadata Search 1VC.FE 1.$ͷϦϯΫ 431ΛΫϦοΫͨ݁͠Ռ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ΫϦοΫͯ͠ల։ .BUFSJBMTBOE.FUIPET 3FTVMUT
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ μϯϩʔυϦϯΫ 3VO
4BNQMFͷใ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ શମͰͷࠩΛϋΠϥΠτ
DBCLS SRA Metadata Search 3VOͷΫΦϦςΟใ 433 ϦʔυɼϦʔυɼ ($ͳͲͷใ ֤Ϟδϡʔϧͷ݁ՌΛ ΫϦοΫ֦ͯ͠େ
S ݕࡧΛՄࢹԽ͢Δ จใؚΊͨΩʔϫʔυݕࡧ ϦʔυͷใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹ΛݟͯஅͰ͖Δɽ
ͳΔ͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧରΛ֦େ͍ͯ͠Δɽ μϯϩʔυʹ͍࣌ؒΛཁ͢Δ͜ͱɽ ࣮֬ʹ͑Δσʔλ͚ͩΛબͿͨΊͷใΛఏڙɽ
·ͱΊ E Summary #2
Summary #2 %#$-443"43"ͷػೳ֦ுͰ͋Δ σʔλొड͚͚ͣɼ43"ͷঢ়گΛѲ͢ΔͨΊͷใ ΑΓσʔλΛ୳͍͢͠ݕࡧػೳΛఏڙ͍ͯ͠Δɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔੑಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ E
X /(4σʔλݕࡧͱར༻ͷ࣮ྫ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ Search published NGS data and project
X ஈ֊ผɾެڞσʔλͷར༻ྫ Use cases of Public data
X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷใΛݩʹɼ
γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜछɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛՃͯ͠ɼ ղੳͷਫ਼্ʹཱͯΔɽ
σʔλͷݕࡧɾར༻Ͱඞཁͳ͜ͱ ∑ Practical search tips
∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛѲ͢Δ γʔέϯαͷϦʔυεϖοΫΛѲ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυɼϦʔυҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫɼ
ࢼༀͷΞοϓσʔτʹΑͬͯมΘΔͷͰҙ͕ඞཁɽ ੜछͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷใ͕ॏཁ
Required read spec by application application / ࣮ݧछ total bases
/ ૯Ԙج read length / Ϧʔυ read number (M) / Ϧʔυ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔ֶผ࣍ੈγʔέϯαʔతผΞυόϯετϝιουQΑΓҾ༻ ରͷήϊϜαΠζͳͲͰࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ͋Γ·͢
Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩࢠൃݱ੍ޚղੳ ৽نήϊϜྻܾఆ ΤϐδΣωςΟΫεղੳ
ϝλήϊϜղੳ ήϊϜߏղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એͰ͕͢པ·Ε͍ͯΔΘ͚ͰചΕΔͱʹ͓͕ۚೖΔΘ͚Ͱ͋Γ·ͤΜ
Read spec, still improving ࢼༀιϑτΣΞͷ্ʹΑΓಉ͡γʔέϯαͰ ϦʔυϦʔυ͕සൟʹมΘΔ ྫJMMVNJOBࣾ.J4FR
ྫϚεͷҨࢠൃݱͷݚڀྫΛ୳͢ ∑ Example survey: mouse brain transcriptome
Example survey: mouse brain transcriptome ੜछͱ࣮ݧछΛࢦఆ TVCNJUDPOEJUJPOΛԡ͢ http://sra.dbcls.jp/search γʔέϯαۭཝͷ··
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/filter?species=Mus %20musculus&type=Transcriptome&instrument= ΩʔϫʔυʹzCSBJOzΛ ೖྗͯ͠zTFBSDIzΛԡ͢ Example survey: mouse brain transcriptome
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain 4UVEZ5JUMFͷԼͷೖྗཝʹ lCSBJOzͱೖྗͯ͠ߜΓࠐΈ Example survey: mouse brain transcriptome
ϓϩδΣΫτΛ৽͍͠ॱ ʹฒΔͨΊ4UVEZ*%Λ ΫϦοΫ ͜ͷϓϩδΣΫτͷσʔλΛ ݟͯΈ·͢ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain Example survey: mouse
brain transcriptome
ϓϩδΣΫτͷ֓ཁ http://sra.dbcls.jp/search/view/SRP011204 ϓϩδΣΫτͰߦΘΕͨ γʔέϯεͷ֓ཁ Example survey: mouse brain transcriptome
Ϧʔυ˺d. ϦʔυC UPUBMd(C http://sra.dbcls.jp/search/view/SRP011204 ͭͷ4BNQMFͰ3VO αϯϓϧׂ SFQMJDBUFT Example survey: mouse
brain transcriptome
λΠτϧʹ͋ͬͨ(&0*% l(4&zͰݕࡧ http://www.ncbi.nlm.nih.gov/geo/ Example survey: mouse brain transcriptome
SFQMJDBUFTͰͨ͠ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
(&0Ͱจͷใ͕ Ξοϓσʔτ͞Ε͍ͯΔ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 ͦΕͧΕͷαϯϓϧͷ ৄ͍͠ใ $POUSPMͷใΛݟΔͨΊ (&04BNQMF*%ΛΫϦοΫ Example survey:
mouse brain transcriptome
4BNQMF$IBSBDUFSJTUJDT http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFॲཧͷϓϩτίϧ Example survey: mouse brain transcriptome
43"ͷ&YQFSJNFOU*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFͷؔΛݟΔͨΊ #JPTBNQMF*%ΛΫϦοΫ Example survey: mouse brain transcriptome
ରԠ͢Δ43"4BNQMF*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 43"ϑΥʔϚοτͷ ྻσʔλͷ%-ϦϯΫ Example survey: mouse brain transcriptome
(&0ͷϖʔδʹͬͯ จͷϦϯΫΛΫϦοΫ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
͔ͤͬ͘ͳͷͰ 1VC3FBEFSͰશจΛ֬ೝ http://www.ncbi.nlm.nih.gov/pubmed/22563483 Example survey: mouse brain transcriptome
/BWJHBUJPOΛΫϦοΫ .BUFSJBMT.FUIPETΛ ΫϦοΫ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader Example survey: mouse brain transcriptome
σʔλղੳʹ͍ͭͯ ར༻ͨ͠πʔϧͳͲ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader ϥΠϒϥϦௐͱ γʔέϯγϯάʹ͍ͭͯ Example survey: mouse brain
transcriptome
∑ ݅ʹ߹͏σʔλͷϦʔυͷใΛಘΔ ϥΠϒϥϦௐσʔλղੳͷใΛಘΔ ݅ʹ߹ͬͨσʔλΛμϯϩʔυ Practical search tips Ϧʔυͷ͞ɼϦʔυɼαϯϓϧͷใͳͲɽ ར༻తʹ߹͍ͬͯΔ͔ɼσʔλͷे͔ɽ 43"ʹهࡌ͕͋Δ͜ͱଟ͘ͳ͍ɽ
จ(&0ͳͲ֎෦%#ͷใΛ୧Δ͜ͱͰಘΒΕΔ͜ͱɽ σʔλʹΑͬͯ%-ɼϑΝΠϧల։ʹඇৗʹ͕͔͔࣌ؒΔɽ %%#+'51ͰGBTURΛ%-ɼ͘͠%%#+ύΠϓϥΠϯΛར༻ɽ
σʔλͷ֬ೝͱμϯϩʔυ ∑ Quality check and download
Read quality check ϦʔυͷҐஔ͝ͱͷ ΫΦϦςΟΛνΣοΫ http://sra.dbcls.jp/search/view/SRR426841 ($ͳͲνΣοΫ
Data download via FTP l'51zΛΫϦοΫ http://sra.dbcls.jp/search/view/SRP011204 %#ܗࣜΛબ͢Δͱ '51αΠτ͕։͘
Data download via FTP http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR426841 '"45243"-JUF ͲͪΒ͔ͷܗࣜΛΫϦοΫ
'51αΠτʹήετͰϩάΠϯ C[ܗࣜͰѹॖ͞ΕͨGBTURϑΝΠϧʹΞΫηεͰ͖·͢ Data download via FTP
ύΠϓϥΠϯΛར༻͢Δ ∑ DDBJ Read Annotation Pipeline
DDBJ Read Annotation Pipeline ϩάΠϯޙɼ l*NQPSUQVCMJD%3"zΛ ΫϦοΫ https://p.ddbj.nig.ac.jp/ˠϩάΠϯ 43"*%Λೖྗͯ͠ σʔλΛύΠϓϥΠϯʹՃ
·ͱΊ Summary #3 X
X Summary #3 จݙϦʔυใΛ׆༻ͯ͠ඞཁͳใΛಘΔ ϦʔυͷใϥΠϒϥϦௐɾղੳͳͲͷใ͕ඞཁɽ Ͳ͏ͯ͠ใ͕ݟ͔ͭΒͳ͍࣌ఘΊΔͷେࣄɽ ެڞͷղੳύΠϓϥΠϯΛ͏·͘ར༻͢Δ ڊେͳσʔλ%-ʹ͕͔͔࣌ؒΓɼ)%%༰ྔѹഭ͢Δɽ %%#+ύΠϓϥΠϯΛ׆༻͢Δ͜ͱͰίετΛԼ͛ΒΕΔɽ
ΦϯϥΠϯͰඞཁͳใΛ୳͢ Œ Online Reference
IUUQHJUIVCDPNJOVUBOPTSB@NFUBEBUB@UPPMLJUXJLJ 43" /(4ʹؔ͢ΔϦϑΝϨϯεͱϦϯΫू Online Reference
࣭ٙԠ Thank you for your attention ¿ ·࣭ͨUPIUB!ECDMTSPJTBDKQ·Ͱ