Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
August 30, 2013
Research
0
170
Sequence Read Archive: Database for High-throughput sequencing best practice 2013
統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」
Tazro Inutano Ohta
August 30, 2013
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
130
Standardization of biological sample information database
inutano
0
76
Describe data analysis workflow with workflow languages
inutano
5
5.5k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
340
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
110
Workflows that run everywhere and where to run them
inutano
0
160
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
290
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
120
Container, Cloud, and HPC
inutano
0
170
Other Decks in Research
See All in Research
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
0
220
Nullspace MPC
mizuhoaoki
1
270
RHO-1: Not All Tokens Are What You Need
sansan_randd
1
200
AIスパコン「さくらONE」の オブザーバビリティ / Observability for AI Supercomputer SAKURAONE
yuukit
2
640
投資戦略202508
pw
0
570
とあるSREの博士「過程」 / A Certain SRE’s Ph.D. Journey
yuukit
11
4.7k
財務諸表監査のための逐次検定
masakat0
0
170
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
180
Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping
satai
3
260
A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects
satai
4
410
MIRU2025 チュートリアル講演「ロボット基盤モデルの最前線」
haraduka
15
9.5k
Generative Models 2025
takahashihiroshi
25
14k
Featured
See All Featured
Designing for humans not robots
tammielis
254
26k
Scaling GitHub
holman
463
140k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
31
2.7k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
650
Making Projects Easy
brettharned
120
6.4k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
2
270
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Side Projects
sachag
455
43k
Transcript
࣍ੈγʔέϯεσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛར༻͢Δ AJACS#42 TOYAMA ౷߹σʔλϕʔεߨशձ"+"$4ࢁ 0
ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔಛٕज़ઐһେాୡ 5B[SP0IUB 5FDI4QFDJBMJTU %BUBCBTF$FOUFSGPS-JGF4DJFODF Effective SRA - public database for
high-throughput sequencing
͓͜ͱΘΓ Preface A
/(4σʔλղੳͷνϡʔτϦΞϧ͋Γ·ͤΜ %#ͷσʔλొͷνϡʔτϦΞϧ͋Γ·ͤΜ /(4ݚڀΛαϙʔτ͢Δެڞσʔλͷ͓Ͱ͢ Preface σʔλղੳΛαϙʔτ͢ΔϦιʔε͝հ͠·͢ σʔλొͷࡍʹඞཁͳใ͝հ͠·͢ ݚڀͷݱͰ׆͔ͨ͢Ίͷެڞ%#ͷ͍ํΛ͝հ͠·͢ A A D
ຊͷ༰ 0 Table of Contents
4FRVFODF3FBE"SDIJWF43"ʹ͍ͭͯ %#$-4ʹ͓͚ΔऔΓΈ /(4σʔλݕࡧͱར༻ͷ࣮ྫ Table of Contents ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ n
E X
ެڞ/(4σʔλϕʔε4FRVFODF3FBE"SDIJWF 43" ʹ͍ͭͯ ӡӦମ੍ɼϙϦγʔɼެ։͞ΕΔσʔλ SRA: The public DB for primary
NGS data n
ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰొɾσʔλΞΫηε͕Մೳɽ ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ
ྻH[C[͘͠ಠࣗܗࣜͰѹॖ͞Εͨͷ͕%-Մೳɽ C L J ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍ྻσʔλΛडɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλผ%# EC(B1 &(" ʹొɽ
INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org ήϊϜใͷඪ४ԽͳͲͷใ %#DPMMBCͷϙϦγʔ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
NCBI SRA http://www.ncbi.nlm.nih.gov/sra ϑϦʔϫʔυݕࡧ ৄࡉݕࡧ 43"#-"45 ϩϯάϦʔυͷΈ 4PGUXBSF 43"UPPMLJU
NCBI SRA #JPQSPKFDUͷώοτ ώοτ݅ &YQFSJNFOU୯Ґ λΠτϧɼγʔέϯαɼ γʔέϯεྔͳͲͷใ ώοτͨ͠ੜछ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
NCBI SRA ࣮ݧͷλΠτϧ MBZPVU BEBQUPSͳͲͷ Ϧʔυͷใ γʔέϯγϯάϥϯ͝ͱͷ ใͱ%-ϦϯΫ ݕࡧ݁Ռͷτοϓώοτ 439
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
EMBL-EBI ENA http://www.ebi.ac.uk/ena ৄࡉݕࡧ ϑϦʔϫʔυݕࡧ ྻݕࡧ
EMBL-EBI ENA ֤ΧςΰϦʹ͓͚Δݕࡧ݁Ռͷ ώοτ ֤ΧςΰϦͷτοϓώοτ ΩʔϫʔυlIVNBONJDSPCJPNFQSPKFDUzͷݕࡧ݁Ռ
EMBL-EBI ENA ࣮ݧʹ͍ͭͯͷใ γʔέϯγϯάϥϯͷใɼ %-ϦϯΫ z&YQFSJNFOUzͷτοϓώοτ 439 Ұׅ%-ɼςΩετܗࣜͰͷ දࣔɼΧϥϜͷબ දࣔ͞Ε͍ͯΔใΛ
ςΩετܗࣜͰ%-
The INSDC Members /$#*43" &.#-&#*&/"43" %%#+%3" ࠷डσʔλ͕ଟ͍ɽ ಠࣗѹॖϑΥʔϚοτ43"ϑΥʔϚοτͷ։ൃݩɽ ࡾౡͷҨֶݚڀॴʹ͋Δ%%#+ʹΑͬͯӡ༻͞Ε͍ͯΔɽ ඇѹॖܗࣜͰσʔλΛެ։͍ͯ͠Δɽ
ैདྷͷྻΞʔΧΠϒͱಉ͡ηΫγϣϯ͕։ൃɾӡ༻͍ͯ͠Δɽ ొɾݕࡧڞʹ(6*$6*྆ํΛఆͯ͠։ൃ͞Ε͍ͯΔɽ
DDBJ DRA αΠτݕࡧ σʔλͷݕࡧɼ σʔλͷొɼ ಈըϚχϡΞϧ http://trace.ddbj.ac.jp/dra
DDBJ DRA *%ʹΑΔݕࡧɼ ϑΝηοτ ߜࠐ ݕࡧɼ ΩʔϫʔυʹΑΔݕࡧ ੜछɼ࣮ݧछɼσʔλొݩͷ ϥϯΩϯάUPQ http://trace.ddbj.ac.jp/DRASearch
*%छ͝ͱͷΤϯτϦ
DDBJ DRA ૯ώοτ ݕࡧ݁Ռ http://trace.ddbj.ac.jp/DRASearch ϝλσʔλͷλΠϓͱ ੜछʹΑΔߜΓࠐΈ
DDBJ DRA ؔ࿈ΞΠςϜͷϦϯΫͱ %-ϦϯΫ ࣮ݧͷৄࡉใ ϥΠϒϥϦ࡞ɼ γʔέϯαɼ ϕʔείʔϧͳͲͷใ z&YQFSJNFOUzͰߜΓࠐΈˠτοϓώοτ 439
DDBJ DRA /BWJHBUJPOˠ3VO 433 %-Ͱ͖ͳ͍ͷͷྫ
DDBJ DRA %3"4FBSDIˠ%33 %-͕Մೳͳͷͷྫ Ϧʔυͷใ RVBMJUZʹνΣοΫ͢Δͱ QISFETDPSF͕දࣔ͞ΕΔ 'BTURܗࣜͱ 43"-JUFܗࣜɼ ͦΕͧΕͷ%-ϦϯΫ
'51
DDBJ DRA /$#*43"Ͱz433zΛݕࡧˠ3FDPSEJTSFNPWFE
Handson ݕࡧͯ͠ΈΔ ੜछɼγʔέϯαʔ໊ɼҨࢠ໊ɼ࣬ױ໊ͳͲͰݕࡧɽ "EWBODFEৄࡉݕࡧͬͯΈΔɽ c ग़͖ͯͨσʔλͷৄࡉΛௐΔ σʔλ͕ͲΕ͘Β͍ͷେ͖͔͞ௐΔ μϯϩʔυʹͲΕ͘Β͍͕͔͔࣌ؒΓͦ͏͔ʁ ϋʔυσΟεΫͷۭ͖༰ྔʹऩ·Δ͔ʁ ώοτ͕ͨ݅͠ଟ͗͢Δগͳ͗͢Δ࣌ผͷݕࡧΛࢼ͢ɽ
໘നͦ͏ͳσʔλ͔Ͳ͏͔அͰ͖ΔใΛ୳͢ɽ
Search Tips ͦΕͧΕ͕ಠࣗʹػೳΛ։ൃ͍ͯ͠Δ ࣮ߦͰ͖Δݕࡧͷछྨɼ݁ՌͷදࣔͳͲ͕ҟͳΔɽ *%ڞ௨ͳͷͰɼ͍͚Δ͜ͱͰΑΓศརʹ୳ͤΔɽ O %-Ͱ͖ͳ͍σʔλ͋Δ ϝλσʔλʹهड़͞Εͳ͍ใݕࡧͰ͖ͳ͍ ϝλσʔλͱɼྻσʔλʹର͢Δऍσʔλͷ͜ͱɽ ࢦఆͷܗࣜʹै͍ొऀʹΑͬͯهड़͞ΕΔɽ
༷ʑͳཧ༝ͰొऀʹΑͬͯऔΓԼ͛ΒΕΔͳͲͷଞʹɼ ొ͞Ε͔ͨΓͰڞ༗͞Ε͍ͯͳ͍ͨΊݟ͔ͭΒͳ͍͜ͱɽ
! ϝλσʔλ Metadata Object
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ྻσʔλͱڞʹొ͞ΕΔϝλσʔλछྨͷΦϒδΣΫτ͔Β ߏ͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡ใ͕هड़͞ΕΔ
Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment
Run Run Run Run ϝλσʔλΛొ୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔੑ͜ͷΑ͏ʹͳΔ
Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002
DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%σʔλΛड͚͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷࣈͰࣔ͞ΕΔ
Metadata Tips ΦϒδΣΫτɼ*%ͷؔෳࡶ େنͳϓϩδΣΫτʹͳΔͱ3VO4BNQMF͕ඦʹͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔͷ͋Δɽ O lͲͷใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛѲ͢Δ ొऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ
ಛʹϥΠϒϥϦௐͷ߲ͳͲɽ จͳͲͷใ͕ߋ৽͞ΕͯΞοϓσʔτ͞Εͳ͍߹ɽ σʔλͷొ͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹॏཁɽ ৄ͘͠IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
·ͱΊ n Summary #1
Summary #1 43"*/4%$ϝϯόʔہʹΑͬͯӡӦ͞ΕΔ σʔλڞ༗͞ΕΔͷͰೖΓޱ͕Ͳ͜Ͱಉ͕ͩ͡ɼ ݕࡧػೳͳͲ͕ͦΕͧΕҟͳΔɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ n
E %#$-4ʹ͓͚ΔऔΓΈ ଞ%#ͱͷ౷߹ɼݕࡧػೳͷ։ൃɼ౷ܭʹΑΔ%#ͷݱঢ়ͷՄࢹԽ Tech Dev at DBCLS - Search and
Statistics
ଞ%#ͱͷ౷߹ ݕࡧػೳͷ։ൃ ౷ܭʹΑΔ%#ͷݱঢ়Ѳ DBCLS v SRA ϝλσʔλ͚ͩͰͳ͘ɼจͳͲͷจݙใɼ ࣬ױͷใɼ͞ΒʹݸผσʔλͷྻΫΦϦςΟΛܭࢉɽ ہͷػೳΛ౷߹ͭͭ͠ɼಠࣗͷػೳΛՃͨ͠ɼ ΑΓσʔλར༻ऀΛࢦͨ͠ݕࡧػೳΛ։ൃɽ
ϝλσʔλΛݩʹͨ͠ొͷਪҠΛެ։ɽ ͞ΒʹྻใΛݩʹͨ͠%#શମͷใΛੳɽ ≠ π ¥
%#$-443" ≠ DBCLS SRA
DBCLS SRA http://sra.dbcls.jp/ ొ͞Ε͍ͯΔσʔλΛ ϝλσʔλผʹϦετදࣔ 43"*%ੜछɼ γʔέϯαͳͲ͔Βݕࡧ
http://sra.dbcls.jp/ ࣮ݧछɼγʔέϯαɼ ੜछ͝ͱͷϥϯΩϯά ʹΑΔਪҠͷάϥϑ DBCLS SRA
http://sra.dbcls.jp/ σʔλΛจݙใ͔Β୳͢ σʔλΛ࣬ױใ͔Β୳͢ DBCLS SRA
จݙใͷ౷߹ ∆ DBCLS SRA Publication Search
ྻσʔλͷใจͷํ͕ৄ͍͠ ྻσʔλ͕จΑΓલʹެ։͞ΕΔ͜ͱ ϝλσʔλʹจݙใ͕ه͞Εͳ͍͜ͱ͕͋Δ DBCLS SRA Publication Search ݚڀͷதͰͷγʔέϯεͷҐஔ͚ͮॏཁɽ .BUFSJBMT.FUIPETʹৄ͍͠ใ͕͋Δ͜ͱ͕ଟ͍ɽ άϥϯτͷ੍ɼδϟʔφϧʹΑΔσʔλެ։ͷࢦࣔͳͲɽ
େنͳϓϩδΣΫτͰެ։ϙϦγʔΛઃఆ͢Δ͜ͱɽ Ұొ͞Εͨޙʹϝλσʔλ͕Ξοϓσʔτ͞Εͳ͍ɽ ެ։͞Εͨσʔλͱจͷඥ͚Λߦ͏ඞཁ͕͋Δɽ ∆
%#$-443"ˠzจݙ͔Β୳͢z ࣮ݧछɼγʔέϯαɼ ੜछʹΑΔߜΓࠐΈݕࡧ 43"*%ͱ1VC.FE*%ͷର Ԡද͓Αͼจݙͷใ ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ DBCLS SRA Publication
Search
࣬ױใͷ౷߹ ® DBCLS SRA Diseases Search
ΫϦχΧϧγʔέϯεͷݕࡧࠔ ϝλσʔλͷهड़͚ͩͰෆेͳ߹ จݙใʹ༩͞ΕͨλάΛར༻͢Δ DBCLS SRA Diseases Search શήϊϜγʔέϯεଟܕͷใͳͲɼ 43"Ͱެ։͞Εͳ͍߹ଟ͍ɽ ొऀʹΑͬͯهड़ͷํɼใྔʹ͕ࠩ͋ΔͨΊɼ
Ұׅͯ͠ݕࡧ͢Δ͜ͱ͕͍͠ɽ 1VC.FEΤϯτϦʹ༩͞ΕΔ.F4)λʔϜΛར༻ͯ͠ɼ ࣬ױͷใΛΩʔʹͨ͠σʔλݕࡧػೳΛ։ൃɽ ®
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠසผ ࣬ױλΠϓ͔Βݕࡧ ࣬ױ໊ͱొσʔλ දࣔ݅ͷࢦఆ DBCLS SRA Diseases Search
%#$-443"ˠz࣬ױ͔ΒோΊΔzˠ࣬ױΧςΰϦผ ΫϦοΫͯ͠πϦʔΛల։ ࣈΛΫϦοΫͯ͠ Ϧετදࣔ DBCLS SRA Diseases Search
%#$-4ಠࣗͷݕࡧػೳ S DBCLS SRA Metadata Search
S ΑΓϢʔβࢦͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷใΛݕࡧʹөͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͍͚ͳ͍ࣝΛͳΔ͘ݮΒ͢ɽ
ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷใ ಠࣗͷใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷޮ͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯͷΈࠐΈՄೳʹɽ
։ൃऀ͚ใɼαϙʔτ༻ πΠολʔΞΧϯτ ϑϦʔϫʔυݕࡧ http://sra.dbcls.jp/search DBCLS SRA Metadata Search ߜΓࠐΈݕࡧ
݅ʹ֘͢Δσʔλʹ ରͯ͠ϑϦʔϫʔυݕࡧ ݅ʹ֘͢Δ σʔλΛશͯදࣔ DBCLS SRA Metadata Search ֤݅ʹ֘͢Δ σʔλͷׂ߹
ߜΓࠐΈݕࡧ .VTNVTDVMVT5SBOTDSJQUPNF*MMVNJOB.J4FR
ΧϥϜ໊ΛΫϦοΫͯ͠ ฒସ͑ ΩʔϫʔυͰΞΠςϜΛ ߜΓࠐΉ DBCLS SRA Metadata Search ώοτͨ͠σʔλͷใɽ ੨͍ߦจใ͖
ݕࡧ݁Ռ
ϓϩδΣΫτͷ֓ཁ จͷ֓ཁͱཁࢫ DBCLS SRA Metadata Search 1VC.FE 1.$ͷϦϯΫ 431ΛΫϦοΫͨ݁͠Ռ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ΫϦοΫͯ͠ల։ .BUFSJBMTBOE.FUIPET 3FTVMUT
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ μϯϩʔυϦϯΫ 3VO
4BNQMFͷใ
DBCLS SRA Metadata Search 431ΛΫϦοΫͨ݁͠Ռ ςʔϒϧΛUTW KTPOܗࣜͰදࣔ ฒସ͑ͱߜΓࠐΈ શମͰͷࠩΛϋΠϥΠτ
DBCLS SRA Metadata Search 3VOͷΫΦϦςΟใ 433 ϦʔυɼϦʔυɼ ($ͳͲͷใ ֤Ϟδϡʔϧͷ݁ՌΛ ΫϦοΫ֦ͯ͠େ
S ݕࡧΛՄࢹԽ͢Δ จใؚΊͨΩʔϫʔυݕࡧ ϦʔυͷใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹ΛݟͯஅͰ͖Δɽ
ͳΔ͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧରΛ֦େ͍ͯ͠Δɽ μϯϩʔυʹ͍࣌ؒΛཁ͢Δ͜ͱɽ ࣮֬ʹ͑Δσʔλ͚ͩΛબͿͨΊͷใΛఏڙɽ
·ͱΊ E Summary #2
Summary #2 %#$-443"43"ͷػೳ֦ுͰ͋Δ σʔλొड͚͚ͣɼ43"ͷঢ়گΛѲ͢ΔͨΊͷใ ΑΓσʔλΛ୳͍͢͠ݕࡧػೳΛఏڙ͍ͯ͠Δɽ ྻσʔλͷొɾݕࡧʹϝλσʔλ͕ॏཁ ͦΕͧΕʹ*%͕ৼΒΕొऀ͕هड़͢Δɽ ༰ͱؔੑಛʹσʔλొ࣌ʹཧղ͢Δඞཁ͕͋Δɽ E
X /(4σʔλݕࡧͱར༻ͷ࣮ྫ հͨ͠αʔϏεΛར༻ͯ͠ɼաڈͷ/(4ݚڀࣄྫΛௐࠪ͢Δ Search published NGS data and project
X ஈ֊ผɾެڞσʔλͷར༻ྫ Use cases of Public data
X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷใΛݩʹɼ
γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜछɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛՃͯ͠ɼ ղੳͷਫ਼্ʹཱͯΔɽ
σʔλͷݕࡧɾར༻Ͱඞཁͳ͜ͱ ∑ Practical search tips
∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛѲ͢Δ γʔέϯαͷϦʔυεϖοΫΛѲ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυɼϦʔυҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫɼ
ࢼༀͷΞοϓσʔτʹΑͬͯมΘΔͷͰҙ͕ඞཁɽ ੜछͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷใ͕ॏཁ
Required read spec by application application / ࣮ݧछ total bases
/ ૯Ԙج read length / Ϧʔυ read number (M) / Ϧʔυ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔ֶผ࣍ੈγʔέϯαʔతผΞυόϯετϝιουQΑΓҾ༻ ରͷήϊϜαΠζͳͲͰࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ͋Γ·͢
Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩࢠൃݱ੍ޚղੳ ৽نήϊϜྻܾఆ ΤϐδΣωςΟΫεղੳ
ϝλήϊϜղੳ ήϊϜߏղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એͰ͕͢པ·Ε͍ͯΔΘ͚ͰചΕΔͱʹ͓͕ۚೖΔΘ͚Ͱ͋Γ·ͤΜ
Read spec, still improving ࢼༀιϑτΣΞͷ্ʹΑΓಉ͡γʔέϯαͰ ϦʔυϦʔυ͕සൟʹมΘΔ ྫJMMVNJOBࣾ.J4FR
ྫϚεͷҨࢠൃݱͷݚڀྫΛ୳͢ ∑ Example survey: mouse brain transcriptome
Example survey: mouse brain transcriptome ੜछͱ࣮ݧछΛࢦఆ TVCNJUDPOEJUJPOΛԡ͢ http://sra.dbcls.jp/search γʔέϯαۭཝͷ··
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/filter?species=Mus %20musculus&type=Transcriptome&instrument= ΩʔϫʔυʹzCSBJOzΛ ೖྗͯ͠zTFBSDIzΛԡ͢ Example survey: mouse brain transcriptome
ϓϩδΣΫτ͕֘ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain 4UVEZ5JUMFͷԼͷೖྗཝʹ lCSBJOzͱೖྗͯ͠ߜΓࠐΈ Example survey: mouse brain transcriptome
ϓϩδΣΫτΛ৽͍͠ॱ ʹฒΔͨΊ4UVEZ*%Λ ΫϦοΫ ͜ͷϓϩδΣΫτͷσʔλΛ ݟͯΈ·͢ http://sra.dbcls.jp/search/search?species=Mus %20musculus&type=Transcriptome&instrument=&search_query=brain Example survey: mouse
brain transcriptome
ϓϩδΣΫτͷ֓ཁ http://sra.dbcls.jp/search/view/SRP011204 ϓϩδΣΫτͰߦΘΕͨ γʔέϯεͷ֓ཁ Example survey: mouse brain transcriptome
Ϧʔυ˺d. ϦʔυC UPUBMd(C http://sra.dbcls.jp/search/view/SRP011204 ͭͷ4BNQMFͰ3VO αϯϓϧׂ SFQMJDBUFT Example survey: mouse
brain transcriptome
λΠτϧʹ͋ͬͨ(&0*% l(4&zͰݕࡧ http://www.ncbi.nlm.nih.gov/geo/ Example survey: mouse brain transcriptome
SFQMJDBUFTͰͨ͠ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
(&0Ͱจͷใ͕ Ξοϓσʔτ͞Ε͍ͯΔ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 ͦΕͧΕͷαϯϓϧͷ ৄ͍͠ใ $POUSPMͷใΛݟΔͨΊ (&04BNQMF*%ΛΫϦοΫ Example survey:
mouse brain transcriptome
4BNQMF$IBSBDUFSJTUJDT http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFॲཧͷϓϩτίϧ Example survey: mouse brain transcriptome
43"ͷ&YQFSJNFOU*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 4BNQMFͷؔΛݟΔͨΊ #JPTBNQMF*%ΛΫϦοΫ Example survey: mouse brain transcriptome
ରԠ͢Δ43"4BNQMF*% http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSM884353 43"ϑΥʔϚοτͷ ྻσʔλͷ%-ϦϯΫ Example survey: mouse brain transcriptome
(&0ͷϖʔδʹͬͯ จͷϦϯΫΛΫϦοΫ http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE36232 Example survey: mouse brain transcriptome
͔ͤͬ͘ͳͷͰ 1VC3FBEFSͰશจΛ֬ೝ http://www.ncbi.nlm.nih.gov/pubmed/22563483 Example survey: mouse brain transcriptome
/BWJHBUJPOΛΫϦοΫ .BUFSJBMT.FUIPETΛ ΫϦοΫ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader Example survey: mouse brain transcriptome
σʔλղੳʹ͍ͭͯ ར༻ͨ͠πʔϧͳͲ http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3341364/?report=reader ϥΠϒϥϦௐͱ γʔέϯγϯάʹ͍ͭͯ Example survey: mouse brain
transcriptome
∑ ݅ʹ߹͏σʔλͷϦʔυͷใΛಘΔ ϥΠϒϥϦௐσʔλղੳͷใΛಘΔ ݅ʹ߹ͬͨσʔλΛμϯϩʔυ Practical search tips Ϧʔυͷ͞ɼϦʔυɼαϯϓϧͷใͳͲɽ ར༻తʹ߹͍ͬͯΔ͔ɼσʔλͷे͔ɽ 43"ʹهࡌ͕͋Δ͜ͱଟ͘ͳ͍ɽ
จ(&0ͳͲ֎෦%#ͷใΛ୧Δ͜ͱͰಘΒΕΔ͜ͱɽ σʔλʹΑͬͯ%-ɼϑΝΠϧల։ʹඇৗʹ͕͔͔࣌ؒΔɽ %%#+'51ͰGBTURΛ%-ɼ͘͠%%#+ύΠϓϥΠϯΛར༻ɽ
σʔλͷ֬ೝͱμϯϩʔυ ∑ Quality check and download
Read quality check ϦʔυͷҐஔ͝ͱͷ ΫΦϦςΟΛνΣοΫ http://sra.dbcls.jp/search/view/SRR426841 ($ͳͲνΣοΫ
Data download via FTP l'51zΛΫϦοΫ http://sra.dbcls.jp/search/view/SRP011204 %#ܗࣜΛબ͢Δͱ '51αΠτ͕։͘
Data download via FTP http://trace.ddbj.nig.ac.jp/DRASearch/run?acc=SRR426841 '"45243"-JUF ͲͪΒ͔ͷܗࣜΛΫϦοΫ
'51αΠτʹήετͰϩάΠϯ C[ܗࣜͰѹॖ͞ΕͨGBTURϑΝΠϧʹΞΫηεͰ͖·͢ Data download via FTP
ύΠϓϥΠϯΛར༻͢Δ ∑ DDBJ Read Annotation Pipeline
DDBJ Read Annotation Pipeline ϩάΠϯޙɼ l*NQPSUQVCMJD%3"zΛ ΫϦοΫ https://p.ddbj.nig.ac.jp/ˠϩάΠϯ 43"*%Λೖྗͯ͠ σʔλΛύΠϓϥΠϯʹՃ
·ͱΊ Summary #3 X
X Summary #3 จݙϦʔυใΛ׆༻ͯ͠ඞཁͳใΛಘΔ ϦʔυͷใϥΠϒϥϦௐɾղੳͳͲͷใ͕ඞཁɽ Ͳ͏ͯ͠ใ͕ݟ͔ͭΒͳ͍࣌ఘΊΔͷେࣄɽ ެڞͷղੳύΠϓϥΠϯΛ͏·͘ར༻͢Δ ڊେͳσʔλ%-ʹ͕͔͔࣌ؒΓɼ)%%༰ྔѹഭ͢Δɽ %%#+ύΠϓϥΠϯΛ׆༻͢Δ͜ͱͰίετΛԼ͛ΒΕΔɽ
ΦϯϥΠϯͰඞཁͳใΛ୳͢ Œ Online Reference
IUUQHJUIVCDPNJOVUBOPTSB@NFUBEBUB@UPPMLJUXJLJ 43" /(4ʹؔ͢ΔϦϑΝϨϯεͱϦϯΫू Online Reference
࣭ٙԠ Thank you for your attention ¿ ·࣭ͨUPIUB!ECDMTSPJTBDKQ·Ͱ