Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
November 20, 2012
Science
0
74
Sequence Read Archive: Database for High-throughput sequencing best practice 2012
「次世代シーケンス解析と公共データベース: Sequence Read Archiveを使い倒す」
Tazro Inutano Ohta
November 20, 2012
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
130
Standardization of biological sample information database
inutano
0
76
Describe data analysis workflow with workflow languages
inutano
5
5.5k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
340
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
110
Workflows that run everywhere and where to run them
inutano
0
160
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
290
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
120
Container, Cloud, and HPC
inutano
0
170
Other Decks in Science
See All in Science
機械学習 - K近傍法 & 機械学習のお作法
trycycle
PRO
0
1.2k
Cloudflare Images + Workers KVでお手軽&低コスト画像最適化をしたかった
nenrinyear
0
100
機械学習 - 授業概要
trycycle
PRO
0
260
データベース06: SQL (3/3) 副問い合わせ
trycycle
PRO
1
650
Performance Evaluation and Ranking of Drivers in Multiple Motorsports Using Massey’s Method
konakalab
0
110
【RSJ2025】PAMIQ Core: リアルタイム継続学習のための⾮同期推論・学習フレームワーク
gesonanko
0
210
Text-to-SQLの既存の評価指標を問い直す
gotalab555
1
100
機械学習 - 決定木からはじめる機械学習
trycycle
PRO
0
1.1k
データベース02: データベースの概念
trycycle
PRO
2
940
academist Prize 4期生 研究トーク延長戦!「美は世界を救う」っていうけど、どうやって?
jimpe_hitsuwari
0
420
ランサムウェア対策にも考慮したVMware、Hyper-V、Azure、AWS間のリアルタイムレプリケーション「Zerto」を徹底解説
climbteam
0
150
データベース10: 拡張実体関連モデル
trycycle
PRO
0
1k
Featured
See All Featured
Measuring & Analyzing Core Web Vitals
bluesmoon
9
650
For a Future-Friendly Web
brad_frost
180
10k
Become a Pro
speakerdeck
PRO
29
5.6k
Building Adaptive Systems
keathley
44
2.8k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Optimizing for Happiness
mojombo
379
70k
4 Signs Your Business is Dying
shpigford
186
22k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
The World Runs on Bad Software
bkeepers
PRO
72
11k
Balancing Empowerment & Direction
lara
5
710
Transcript
࣍ੈγʔέϯεղੳͱެڞσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛ͍͢ /PWBU$&3* େాୡ5B[SP0IUB ▼
ຊͷ༰ ˓ ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ %#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏε ˓ 4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε ˓ ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ώΞϦϯά ˓ ࣍ੈγʔέϯαʔʁ ˓ 4FRVFODF3FBE"SDIJWFʁ ˓ %#$-443"ʁ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ ʹ/$#*ʹΑͬͯ/(4σʔλͷऩू͕࢝·Δ ˓ ͔Β*/4%$ʹΑΔ4FRVFODF3FBE"SDIJWFͱͯ͠ӡӦ ˓ */4%$*OUFSOBUJPOBM/VDMFPUJEF4FRVFODF%BUBCBTF$PMMBCPSBUJPO ˓ ถࠃ/$#* Ԥभ&#*
ຊ%%#+ ˓ ొडɼݕࡧμϯϩʔυͳͲΛͦΕͧΕఏڙ ˓ ొ͞Εͨσʔλަ͞ΕɼͲ͔͜ΒͰΞΫηεՄೳ
Ͳ͔͜ΒͰಉ͡σʔλʹΞΫηεՄೳ Data ID : 000001 organism : mouse cell :
nervous cell sequencer : 454 date : 2011 12 08 >Seq_Numero_1 ATGCATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGCATGCATGC ATGCATGCATGCATG CATGATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGTGCATGTGC */4%$ σʔλަ ྻσʔλ ϝλσʔλΛొ
www.ncbi.nlm.nih.gov/sra
www.ebi.ac.uk/ena
trace.ddbj.nig.ac.jp/dra
ͬͯΈΔ ˓ IVNBOCSFBTUDBODFSͷσʔλΛ୳ͯ͠ΈΔ
None
None
None
None
None
None
http://www.everystockphoto.com/photo.php?imageId=3972069
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͍ʹ͍͘
ෆຬ ˓ σʔλϕʔεͷߏ͕ෳࡶա͗Δ
σʔλ͕୳ͤͳ͍
ϛογϣϯ ˓ ެڞσʔλΛͬͱ୳͘͢͠ɼ͍͘͢͢Δ
%#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏεɾπʔϧ
43"ͷσʔλΛ͍͘͢͢ΔͨΊʹ ˓ 43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ ࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ ֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ྻσʔλͱڞʹొ͞ΕΔϝλσʔλ ऍσʔλ Λूܭ ˓ γʔέϯαʔͷछྨɼαϯϓϧੜछɼ࣮ݧछΛϥϯΩϯάදࣔ ˓ σʔλొͷ৳ͼΛάϥϑͰදࣔ ͝རӹ
˓ ޮͷΑ͍ʮߜΓࠐΈݕࡧʯ ˓ ʮͲͷΑ͏ͳσʔλ͕ͲΕ͘Β͍͋Δͷ͔ʯ͕ҰͰ͔Δ ˓ ۀքͷτϨϯυΛՄࢹԽ
sra.dbcls.jp
“౷ܭ͔Β୳͢”
৳ͼͷάϥϑ
࣮ݧछผʹ৭͚
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ 43"ͷσʔλʹจͷϦϯΫ͕ਵ͍ͯ͠ͳ͍ͷ͕ଟ͍ ˓ จ͕ग़Δલʹొ͞ΕΔσʔλ͕ଟ͍ͨΊ ˓ จݙͷத͔Βެ։σʔλͷݴٴΛநग़͠ɼ*%Λ݁ͼ͚ͭΔ ͝རӹ ˓ δϟʔφϧɼจൃද࣌ɼจλΠτϧͳͲͰιʔτͯ͠ݕࡧ
˓ ࣮ݧछɼੜछɼγʔέϯαͳͲͰߜࠐΈՄೳ
sra.dbcls.jp
“จݙ͔Β୳͢”
ߜࠐΈݕࡧ ֤ϑΟʔϧυͰฒସ͑
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ /(4ҩֶܥͰͷར༻ଟ͍ ˓ 43"ผʹ͔Ε͍ͯͳ͍ͷͰݕࡧ͕ͮ͠Β͍ ˓ จݙʹਵ͢Δ.F4)UFSNΛݩʹσʔλΛ࣬ױͰཧ ˓ σʔλొͷଟ͍ͷɼ࣬ױͷΧςΰϦΛݩʹݕࡧՄೳ ͝རӹ
˓ .F4)ΩʔϫʔυΛݩʹ͍ͯ͠ΔͨΊਫ਼͕ߴ͍ ˓ ͰͷݚڀಈΛݟΔ͜ͱͰ͖Δ
sra.dbcls.jp
“࣬ױ͔ΒோΊΔ”
“සผ”
ొσʔλ ࣬ױؔ࿈ҨࢠDB GendooͷϦϯΫ
νΣοΫͯ͠ Search
MeSHͷؔ࿈λʔϜ ؔ࿈ʹԠͯ͡දࣔ
“࣬ױΧςΰϦผ”
πϦʔද͔ࣔΒ σʔλΛݕࡧ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓ 43"ͷσʔλৗʹྑ͍ͷͱݶΒͳ͍ ˓ ϛεγʔέϯεొ͞Εσʔλ͕ೖ͍ͬͯΔ ˓ ͕݅ಉ͡σʔλͳΒਫ਼ͷྑ͍ͷΛ͍͍ͨ ˓ 'BTU2$ʹΑͬͯશͯͷ43"σʔλͷΫΦϦςΟΛܭࢉ ˓
IUUQXXXCJPJOGPSNBUJDTCBCSBIBNBDVLQSPKFDUTGBTURD ͝རӹ ˓ μϯϩʔυʹҰ൩͔͔ͬͨσʔλ͕յΕ͍ͯͨͱ͍͏൵ܶΛճආ ˓ γʔέϯεΫΦϦςΟͷൺֱΛ͢Δ͚ͩͰָ͍͠
g86.dbcls.jp/sra
SRA IDΛೖྗ
FastQCʹΑΔQC݁Ռ
APIʹΑΔΞΫηεՄೳ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ େྔͷݕࡧ݁Ռͷத͔ΒͲ͏ͬͯཉ͍͠σʔλΛݟ͚ͭग़͔͢ʁ ˓ αϯϓϧɼγʔέϯαʔͳͲͷ݅ΛൺΔ ˓ จ͕ग़͍ͯΔͷΛ༏ઌ͢Δ ˓ ݅Λൺֱ͢ΔͨΊʹɼؔ࿈͢ΔใΛͻͱ·ͱΊʹ͍ͨ͠ ͝རӹ
˓ αϯϓϧɼγʔέϯεɼϓϩδΣΫτͳͲͷ*%Ͱࠞཚ͠ͳ͍ ˓ จͷใΛซͤͯݟΔ͜ͱͰਖ਼֬ͳஅ͕Ͱ͖Δ
g86.dbcls.jp/kusarinoko
All, human, mouse, Arabidopsis ͔Βબ ΩʔϫʔυΛೖྗͯ͠ݕࡧ
จͷใ ώοτͨ͠σʔλͷϦετ
Study (project) Experiment Run (Sequence) / QC
Sample
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ ˓ ϓϩάϥϜΛͬͯ܁Γฦ͠ΞΫηε͍ͨ͠ ˓ Ұఆͷظؒ͝ͱʹಉ݅͡Ͱݕࡧ͍ͨ͠ ˓ େྔͷσʔλͷใΛݕࡧ͍ͨ͠ ͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
ৗͷख͕ؒ େ෯ʹݮΔ
͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
g86.dbcls.jp/sra
Sequence Quality
SRA IDม ϝλσʔλͷऔಘ
Ұ෦ͷΈެ։த ˓ ৗͷखؒΛେ෯ʹݮΒ͘͢Ӷҙ։ൃதͰ͢
4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε
ެڞγʔέϯεσʔλΛ͏खॱ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ จʹهࡌ͞Εͨ*%ΩʔϫʔυͰݕࡧ ݕࡧ݁ՌΛݸผʹݟͯཉ͍͠σʔλΛ୳͢ '51"TQFSBͳͲͰσʔλΛμϯϩʔυ͢Δ μϯϩʔυͨ͠σʔλΛ֬ೝ͢Δ
ղੳʹར༻͢Δ
֤εςοϓΛޮԽίετμϯ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ ౷ܭɼจɼ࣬ױɼΩʔϫʔυͰݕࡧ ݕࡧ݁Ռ͔ΒσʔλΛ୳͠ɼΫΦϦςΟ֬ೝ μϯϩʔυ
ެڞ/(4σʔλͷೋ࣍ར༻ྫ ˓ ࣗͷσʔλͱಉ݅͡ͷσʔλΛར༻ͯ͠/ΛՔ͙ ˓ ࣗͷσʔλͱؔ࿈͢ΔσʔλΛར༻͠ൺֱղੳΛߦ͏ ˓ ҟͳΔॲཧ۠ɼۙԑछɼ(FOPNF5SBOTDSJQUPNF&QJHFOPNFͳͲ ˓ ղੳπʔϧͷੑೳධՁʹར༻͢Δ ˓
ෳͷπʔϧͷൺֱɼ৽نπʔϧ։ൃ࣌ͷσϞσʔλͱͯ͠ ˓ σʔλΛେྔʹूΊͯϝλղੳΛߦ͏ ˓ੜछࡉ๔ͰԣஅతͳղੳͳͲ
43"͕ͬͱ͍͘͢ͳΓ·ͨ͠ ˓ ͜ΕͰͲΜͲΜެڞσʔλΛͬͯݚڀ͕Ͱ͖Δ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ·͍ͩʹ͍͘
ෆຬ ˓ ݕࡧ͕͍ ˓ πʔϧ͕όϥόϥͰ࿈ܞͮ͠Β͍
http://www.flickr.com/photos/66986780@N00/137720685/ վྑத
͠Β͓͍ͪͩ͘͘͞ ˓ ։ൃܧଓதɼϑΟʔυόοΫ͓͍ͪͯ͠·͢
ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ԿނσʔλΛҰൠެ։͢Δͷ͔ ˓ ࠶ݱੑͷ୲อ ˓ ࠶ղੳͷखஈΛఏڙ͠ਖ਼ੑΛূ໌͢Δ ˓ ೋ࣍ར༻ͷଅਐ ˓ ϦιʔεΛγΣΞ͠ɼଞͷݚڀऀʹར༻ͯ͠Β͏
࣮ ˓ ग़ͤͱݴΘΕΔ͔Βग़͢ ˓ δϟʔφϧʹߘ͢Δࡍʹެڞ%#ͷ*%ΛٻΊΒΕΔ ˓ άϥϯτͷن্શͯͷσʔλΛެ։͠ͳ͚ΕͳΒͳ͍߹
http://www.flickr.com/photos/74521133@N00/232362142/ ग़͞ͳ͖Ό͍͚ͳ͍
͝རӹ͕ͳ͍ ˓ σʔλͷެ։ʹίετ͕͔͔Δ ˓ ଞਓ͕͍͍͢Α͏ʹ៉ྷʹཧ͢Δͷେม ˓ /(4σʔλαΠζ͕େ͖͍ͷͰΞοϓϩʔυ͢ΔͷҰۤ࿑ ˓ σʔλΛग़ͨ͠ਓͷϦεϖΫτ ˓
จൃදલͷσʔλΛୈࡾऀ͕ղੳͯ͠จʹʁ ˓ ݱঢ়ͰσʔλΛग़͚ͩ͢ͰۀʹͳΒͳ͍
ΞʔΧΠϒͲ͏͋Δ͖͔ ˓ σʔλެ։ͷෑډΛԼ͛Δ ˓ σʔλΛެ։͢ΔͨΊͷํ๏Λඪ४Խ͢Δ ˓ NJOJNVNJOGPSNBUJPOͳͲ ˓ ΑΓ؆୯ʹσʔλΛొ͢ΔͨΊͷΈ ˓
σʔλར༻Λଅਐ͢Δ ˓ ެ։σʔλΛΑΓ͍͘͢ཧ͢Δ ˓ ެ։σʔλΛ༗ޮར༻͢ΔͨΊͷํ๏ ˓ σʔλΛग़͢ਓʹ͝རӹΛ ˓ ެ։σʔλʹ%0*ΛৼͬͯҾ༻ΛՄೳʹ͢Δ
Ͳ͏͢Δ ˓ Ͳ͏͢Εʁ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͕ΜΓ·͢
͝ਗ਼͋Γ͕ͱ͏͍͟͝·ͨ͠