Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
November 20, 2012
Science
0
73
Sequence Read Archive: Database for High-throughput sequencing best practice 2012
「次世代シーケンス解析と公共データベース: Sequence Read Archiveを使い倒す」
Tazro Inutano Ohta
November 20, 2012
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
120
Standardization of biological sample information database
inutano
0
75
Describe data analysis workflow with workflow languages
inutano
5
5.5k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
340
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
100
Workflows that run everywhere and where to run them
inutano
0
160
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
290
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
110
Container, Cloud, and HPC
inutano
0
170
Other Decks in Science
See All in Science
高校生就活へのDA導入の提案
shunyanoda
0
6k
07_浮世満理子_アイディア高等学院学院長_一般社団法人全国心理業連合会代表理事_紹介資料.pdf
sip3ristex
0
650
MCMCのR-hatは分散分析である
moricup
0
470
Accelerated Computing for Climate forecast
inureyes
0
120
機械学習 - K-means & 階層的クラスタリング
trycycle
PRO
0
1.1k
NASの容量不足のお悩み解決!災害対策も兼ねた「Wasabi Cloud NAS」はここがスゴイ
climbteam
1
180
風の力で振れ幅が大きくなる振り子!? 〜タコマナローズ橋はなぜ落ちたのか〜
syotasasaki593876
1
110
Transport information Geometry: Current and Future II
lwc2017
0
210
コンピュータビジョンによるロボットの視覚と判断:宇宙空間での適応と課題
hf149
1
380
知能とはなにかーヒトとAIのあいだー
tagtag
0
140
06_浅井雄一郎_株式会社浅井農園代表取締役社長_紹介資料.pdf
sip3ristex
0
670
実力評価性能を考慮した弓道高校生全国大会の大会制度設計の提案 / (konakalab presentation at MSS 2025.03)
konakalab
2
210
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
KATA
mclloyd
PRO
32
15k
Optimizing for Happiness
mojombo
379
70k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
Agile that works and the tools we love
rasmusluckow
331
21k
Context Engineering - Making Every Token Count
addyosmani
7
280
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
Automating Front-end Workflow
addyosmani
1371
200k
The Pragmatic Product Professional
lauravandoore
36
7k
Transcript
࣍ੈγʔέϯεղੳͱެڞσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛ͍͢ /PWBU$&3* େాୡ5B[SP0IUB ▼
ຊͷ༰ ˓ ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ %#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏε ˓ 4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε ˓ ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ώΞϦϯά ˓ ࣍ੈγʔέϯαʔʁ ˓ 4FRVFODF3FBE"SDIJWFʁ ˓ %#$-443"ʁ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ ʹ/$#*ʹΑͬͯ/(4σʔλͷऩू͕࢝·Δ ˓ ͔Β*/4%$ʹΑΔ4FRVFODF3FBE"SDIJWFͱͯ͠ӡӦ ˓ */4%$*OUFSOBUJPOBM/VDMFPUJEF4FRVFODF%BUBCBTF$PMMBCPSBUJPO ˓ ถࠃ/$#* Ԥभ&#*
ຊ%%#+ ˓ ొडɼݕࡧμϯϩʔυͳͲΛͦΕͧΕఏڙ ˓ ొ͞Εͨσʔλަ͞ΕɼͲ͔͜ΒͰΞΫηεՄೳ
Ͳ͔͜ΒͰಉ͡σʔλʹΞΫηεՄೳ Data ID : 000001 organism : mouse cell :
nervous cell sequencer : 454 date : 2011 12 08 >Seq_Numero_1 ATGCATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGCATGCATGC ATGCATGCATGCATG CATGATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGTGCATGTGC */4%$ σʔλަ ྻσʔλ ϝλσʔλΛొ
www.ncbi.nlm.nih.gov/sra
www.ebi.ac.uk/ena
trace.ddbj.nig.ac.jp/dra
ͬͯΈΔ ˓ IVNBOCSFBTUDBODFSͷσʔλΛ୳ͯ͠ΈΔ
None
None
None
None
None
None
http://www.everystockphoto.com/photo.php?imageId=3972069
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͍ʹ͍͘
ෆຬ ˓ σʔλϕʔεͷߏ͕ෳࡶա͗Δ
σʔλ͕୳ͤͳ͍
ϛογϣϯ ˓ ެڞσʔλΛͬͱ୳͘͢͠ɼ͍͘͢͢Δ
%#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏεɾπʔϧ
43"ͷσʔλΛ͍͘͢͢ΔͨΊʹ ˓ 43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ ࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ ֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ྻσʔλͱڞʹొ͞ΕΔϝλσʔλ ऍσʔλ Λूܭ ˓ γʔέϯαʔͷछྨɼαϯϓϧੜछɼ࣮ݧछΛϥϯΩϯάදࣔ ˓ σʔλొͷ৳ͼΛάϥϑͰදࣔ ͝རӹ
˓ ޮͷΑ͍ʮߜΓࠐΈݕࡧʯ ˓ ʮͲͷΑ͏ͳσʔλ͕ͲΕ͘Β͍͋Δͷ͔ʯ͕ҰͰ͔Δ ˓ ۀքͷτϨϯυΛՄࢹԽ
sra.dbcls.jp
“౷ܭ͔Β୳͢”
৳ͼͷάϥϑ
࣮ݧछผʹ৭͚
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ 43"ͷσʔλʹจͷϦϯΫ͕ਵ͍ͯ͠ͳ͍ͷ͕ଟ͍ ˓ จ͕ग़Δલʹొ͞ΕΔσʔλ͕ଟ͍ͨΊ ˓ จݙͷத͔Βެ։σʔλͷݴٴΛநग़͠ɼ*%Λ݁ͼ͚ͭΔ ͝རӹ ˓ δϟʔφϧɼจൃද࣌ɼจλΠτϧͳͲͰιʔτͯ͠ݕࡧ
˓ ࣮ݧछɼੜछɼγʔέϯαͳͲͰߜࠐΈՄೳ
sra.dbcls.jp
“จݙ͔Β୳͢”
ߜࠐΈݕࡧ ֤ϑΟʔϧυͰฒସ͑
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ /(4ҩֶܥͰͷར༻ଟ͍ ˓ 43"ผʹ͔Ε͍ͯͳ͍ͷͰݕࡧ͕ͮ͠Β͍ ˓ จݙʹਵ͢Δ.F4)UFSNΛݩʹσʔλΛ࣬ױͰཧ ˓ σʔλొͷଟ͍ͷɼ࣬ױͷΧςΰϦΛݩʹݕࡧՄೳ ͝རӹ
˓ .F4)ΩʔϫʔυΛݩʹ͍ͯ͠ΔͨΊਫ਼͕ߴ͍ ˓ ͰͷݚڀಈΛݟΔ͜ͱͰ͖Δ
sra.dbcls.jp
“࣬ױ͔ΒோΊΔ”
“සผ”
ొσʔλ ࣬ױؔ࿈ҨࢠDB GendooͷϦϯΫ
νΣοΫͯ͠ Search
MeSHͷؔ࿈λʔϜ ؔ࿈ʹԠͯ͡දࣔ
“࣬ױΧςΰϦผ”
πϦʔද͔ࣔΒ σʔλΛݕࡧ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓ 43"ͷσʔλৗʹྑ͍ͷͱݶΒͳ͍ ˓ ϛεγʔέϯεొ͞Εσʔλ͕ೖ͍ͬͯΔ ˓ ͕݅ಉ͡σʔλͳΒਫ਼ͷྑ͍ͷΛ͍͍ͨ ˓ 'BTU2$ʹΑͬͯશͯͷ43"σʔλͷΫΦϦςΟΛܭࢉ ˓
IUUQXXXCJPJOGPSNBUJDTCBCSBIBNBDVLQSPKFDUTGBTURD ͝རӹ ˓ μϯϩʔυʹҰ൩͔͔ͬͨσʔλ͕յΕ͍ͯͨͱ͍͏൵ܶΛճආ ˓ γʔέϯεΫΦϦςΟͷൺֱΛ͢Δ͚ͩͰָ͍͠
g86.dbcls.jp/sra
SRA IDΛೖྗ
FastQCʹΑΔQC݁Ռ
APIʹΑΔΞΫηεՄೳ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ େྔͷݕࡧ݁Ռͷத͔ΒͲ͏ͬͯཉ͍͠σʔλΛݟ͚ͭग़͔͢ʁ ˓ αϯϓϧɼγʔέϯαʔͳͲͷ݅ΛൺΔ ˓ จ͕ग़͍ͯΔͷΛ༏ઌ͢Δ ˓ ݅Λൺֱ͢ΔͨΊʹɼؔ࿈͢ΔใΛͻͱ·ͱΊʹ͍ͨ͠ ͝རӹ
˓ αϯϓϧɼγʔέϯεɼϓϩδΣΫτͳͲͷ*%Ͱࠞཚ͠ͳ͍ ˓ จͷใΛซͤͯݟΔ͜ͱͰਖ਼֬ͳஅ͕Ͱ͖Δ
g86.dbcls.jp/kusarinoko
All, human, mouse, Arabidopsis ͔Βબ ΩʔϫʔυΛೖྗͯ͠ݕࡧ
จͷใ ώοτͨ͠σʔλͷϦετ
Study (project) Experiment Run (Sequence) / QC
Sample
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ ˓ ϓϩάϥϜΛͬͯ܁Γฦ͠ΞΫηε͍ͨ͠ ˓ Ұఆͷظؒ͝ͱʹಉ݅͡Ͱݕࡧ͍ͨ͠ ˓ େྔͷσʔλͷใΛݕࡧ͍ͨ͠ ͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
ৗͷख͕ؒ େ෯ʹݮΔ
͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
g86.dbcls.jp/sra
Sequence Quality
SRA IDม ϝλσʔλͷऔಘ
Ұ෦ͷΈެ։த ˓ ৗͷखؒΛେ෯ʹݮΒ͘͢Ӷҙ։ൃதͰ͢
4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε
ެڞγʔέϯεσʔλΛ͏खॱ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ จʹهࡌ͞Εͨ*%ΩʔϫʔυͰݕࡧ ݕࡧ݁ՌΛݸผʹݟͯཉ͍͠σʔλΛ୳͢ '51"TQFSBͳͲͰσʔλΛμϯϩʔυ͢Δ μϯϩʔυͨ͠σʔλΛ֬ೝ͢Δ
ղੳʹར༻͢Δ
֤εςοϓΛޮԽίετμϯ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ ౷ܭɼจɼ࣬ױɼΩʔϫʔυͰݕࡧ ݕࡧ݁Ռ͔ΒσʔλΛ୳͠ɼΫΦϦςΟ֬ೝ μϯϩʔυ
ެڞ/(4σʔλͷೋ࣍ར༻ྫ ˓ ࣗͷσʔλͱಉ݅͡ͷσʔλΛར༻ͯ͠/ΛՔ͙ ˓ ࣗͷσʔλͱؔ࿈͢ΔσʔλΛར༻͠ൺֱղੳΛߦ͏ ˓ ҟͳΔॲཧ۠ɼۙԑछɼ(FOPNF5SBOTDSJQUPNF&QJHFOPNFͳͲ ˓ ղੳπʔϧͷੑೳධՁʹར༻͢Δ ˓
ෳͷπʔϧͷൺֱɼ৽نπʔϧ։ൃ࣌ͷσϞσʔλͱͯ͠ ˓ σʔλΛେྔʹूΊͯϝλղੳΛߦ͏ ˓ੜछࡉ๔ͰԣஅతͳղੳͳͲ
43"͕ͬͱ͍͘͢ͳΓ·ͨ͠ ˓ ͜ΕͰͲΜͲΜެڞσʔλΛͬͯݚڀ͕Ͱ͖Δ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ·͍ͩʹ͍͘
ෆຬ ˓ ݕࡧ͕͍ ˓ πʔϧ͕όϥόϥͰ࿈ܞͮ͠Β͍
http://www.flickr.com/photos/66986780@N00/137720685/ վྑத
͠Β͓͍ͪͩ͘͘͞ ˓ ։ൃܧଓதɼϑΟʔυόοΫ͓͍ͪͯ͠·͢
ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ԿނσʔλΛҰൠެ։͢Δͷ͔ ˓ ࠶ݱੑͷ୲อ ˓ ࠶ղੳͷखஈΛఏڙ͠ਖ਼ੑΛূ໌͢Δ ˓ ೋ࣍ར༻ͷଅਐ ˓ ϦιʔεΛγΣΞ͠ɼଞͷݚڀऀʹར༻ͯ͠Β͏
࣮ ˓ ग़ͤͱݴΘΕΔ͔Βग़͢ ˓ δϟʔφϧʹߘ͢Δࡍʹެڞ%#ͷ*%ΛٻΊΒΕΔ ˓ άϥϯτͷن্શͯͷσʔλΛެ։͠ͳ͚ΕͳΒͳ͍߹
http://www.flickr.com/photos/74521133@N00/232362142/ ग़͞ͳ͖Ό͍͚ͳ͍
͝རӹ͕ͳ͍ ˓ σʔλͷެ։ʹίετ͕͔͔Δ ˓ ଞਓ͕͍͍͢Α͏ʹ៉ྷʹཧ͢Δͷେม ˓ /(4σʔλαΠζ͕େ͖͍ͷͰΞοϓϩʔυ͢ΔͷҰۤ࿑ ˓ σʔλΛग़ͨ͠ਓͷϦεϖΫτ ˓
จൃදલͷσʔλΛୈࡾऀ͕ղੳͯ͠จʹʁ ˓ ݱঢ়ͰσʔλΛग़͚ͩ͢ͰۀʹͳΒͳ͍
ΞʔΧΠϒͲ͏͋Δ͖͔ ˓ σʔλެ։ͷෑډΛԼ͛Δ ˓ σʔλΛެ։͢ΔͨΊͷํ๏Λඪ४Խ͢Δ ˓ NJOJNVNJOGPSNBUJPOͳͲ ˓ ΑΓ؆୯ʹσʔλΛొ͢ΔͨΊͷΈ ˓
σʔλར༻Λଅਐ͢Δ ˓ ެ։σʔλΛΑΓ͍͘͢ཧ͢Δ ˓ ެ։σʔλΛ༗ޮར༻͢ΔͨΊͷํ๏ ˓ σʔλΛग़͢ਓʹ͝རӹΛ ˓ ެ։σʔλʹ%0*ΛৼͬͯҾ༻ΛՄೳʹ͢Δ
Ͳ͏͢Δ ˓ Ͳ͏͢Εʁ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͕ΜΓ·͢
͝ਗ਼͋Γ͕ͱ͏͍͟͝·ͨ͠