Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
November 20, 2012
Science
82
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Sequence Read Archive: Database for High-throughput sequencing best practice 2012
「次世代シーケンス解析と公共データベース: Sequence Read Archiveを使い倒す」
Tazro Inutano Ohta
November 20, 2012
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
150
Standardization of biological sample information database
inutano
0
110
Describe data analysis workflow with workflow languages
inutano
5
6.1k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
380
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
130
Workflows that run everywhere and where to run them
inutano
0
190
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
330
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
150
Container, Cloud, and HPC
inutano
0
200
Other Decks in Science
See All in Science
因果推論と機械学習
sshimizu2006
1
1.2k
検索と推論タスクに関する論文の紹介
ynakano
1
230
ITTF卓球世界ランキングのポイント比を用いた試合結果予測モデルの性能評価 / Performance evaluation of match result prediction models using the point ratio of the ITTF Table Tennis World Ranking
konakalab
0
130
Bリーグのショットデータを活用した得点期待値モデルの構築 / Construction of expected points model using shot data of B.LEAGUE
konakalab
0
140
データベース05: SQL(2/3) 結合質問
trycycle
PRO
0
1.2k
プロジェクト「Azayaka」のSARの数式とジオメトリ
syuchimu
0
350
AI bij literatuuronderzoek in de wetenschap
voginip
0
170
次代のデータサイエンティストへ~スキルチェックリスト、タスクリスト更新~
datascientistsociety
PRO
3
43k
中央大学AI・データサイエンスセンター 2025年第6回イブニングセミナー 『知能とはなにか ヒトとAIのあいだ』
tagtag
PRO
0
160
ダメな自分の育て方―性格タイプの「劣等機能」から理解するニガテ克服術
ppillc
0
150
ハミルトン・ヤコビ方程式の解の性質と物理的意味
enakai00
0
650
HDC tutorial
michielstock
2
710
Featured
See All Featured
We Have a Design System, Now What?
morganepeng
55
8.2k
Building an army of robots
kneath
306
46k
We Are The Robots
honzajavorek
0
250
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
71
40k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
330
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
330
WENDY [Excerpt]
tessaabrams
11
38k
Embracing the Ebb and Flow
colly
88
5.1k
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Ruling the World: When Life Gets Gamed
codingconduct
0
250
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
160
Transcript
࣍ੈγʔέϯεղੳͱެڞσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛ͍͢ /PWBU$&3* େాୡ5B[SP0IUB ▼
ຊͷ༰ ˓ ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ %#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏε ˓ 4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε ˓ ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ώΞϦϯά ˓ ࣍ੈγʔέϯαʔʁ ˓ 4FRVFODF3FBE"SDIJWFʁ ˓ %#$-443"ʁ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ ʹ/$#*ʹΑͬͯ/(4σʔλͷऩू͕࢝·Δ ˓ ͔Β*/4%$ʹΑΔ4FRVFODF3FBE"SDIJWFͱͯ͠ӡӦ ˓ */4%$*OUFSOBUJPOBM/VDMFPUJEF4FRVFODF%BUBCBTF$PMMBCPSBUJPO ˓ ถࠃ/$#* Ԥभ&#*
ຊ%%#+ ˓ ొडɼݕࡧμϯϩʔυͳͲΛͦΕͧΕఏڙ ˓ ొ͞Εͨσʔλަ͞ΕɼͲ͔͜ΒͰΞΫηεՄೳ
Ͳ͔͜ΒͰಉ͡σʔλʹΞΫηεՄೳ Data ID : 000001 organism : mouse cell :
nervous cell sequencer : 454 date : 2011 12 08 >Seq_Numero_1 ATGCATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGCATGCATGC ATGCATGCATGCATG CATGATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGTGCATGTGC */4%$ σʔλަ ྻσʔλ ϝλσʔλΛొ
www.ncbi.nlm.nih.gov/sra
www.ebi.ac.uk/ena
trace.ddbj.nig.ac.jp/dra
ͬͯΈΔ ˓ IVNBOCSFBTUDBODFSͷσʔλΛ୳ͯ͠ΈΔ
None
None
None
None
None
None
http://www.everystockphoto.com/photo.php?imageId=3972069
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͍ʹ͍͘
ෆຬ ˓ σʔλϕʔεͷߏ͕ෳࡶա͗Δ
σʔλ͕୳ͤͳ͍
ϛογϣϯ ˓ ެڞσʔλΛͬͱ୳͘͢͠ɼ͍͘͢͢Δ
%#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏεɾπʔϧ
43"ͷσʔλΛ͍͘͢͢ΔͨΊʹ ˓ 43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ ࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ ֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ྻσʔλͱڞʹొ͞ΕΔϝλσʔλ ऍσʔλ Λूܭ ˓ γʔέϯαʔͷछྨɼαϯϓϧੜछɼ࣮ݧछΛϥϯΩϯάදࣔ ˓ σʔλొͷ৳ͼΛάϥϑͰදࣔ ͝རӹ
˓ ޮͷΑ͍ʮߜΓࠐΈݕࡧʯ ˓ ʮͲͷΑ͏ͳσʔλ͕ͲΕ͘Β͍͋Δͷ͔ʯ͕ҰͰ͔Δ ˓ ۀքͷτϨϯυΛՄࢹԽ
sra.dbcls.jp
“౷ܭ͔Β୳͢”
৳ͼͷάϥϑ
࣮ݧछผʹ৭͚
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ 43"ͷσʔλʹจͷϦϯΫ͕ਵ͍ͯ͠ͳ͍ͷ͕ଟ͍ ˓ จ͕ग़Δલʹొ͞ΕΔσʔλ͕ଟ͍ͨΊ ˓ จݙͷத͔Βެ։σʔλͷݴٴΛநग़͠ɼ*%Λ݁ͼ͚ͭΔ ͝རӹ ˓ δϟʔφϧɼจൃද࣌ɼจλΠτϧͳͲͰιʔτͯ͠ݕࡧ
˓ ࣮ݧछɼੜछɼγʔέϯαͳͲͰߜࠐΈՄೳ
sra.dbcls.jp
“จݙ͔Β୳͢”
ߜࠐΈݕࡧ ֤ϑΟʔϧυͰฒସ͑
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ /(4ҩֶܥͰͷར༻ଟ͍ ˓ 43"ผʹ͔Ε͍ͯͳ͍ͷͰݕࡧ͕ͮ͠Β͍ ˓ จݙʹਵ͢Δ.F4)UFSNΛݩʹσʔλΛ࣬ױͰཧ ˓ σʔλొͷଟ͍ͷɼ࣬ױͷΧςΰϦΛݩʹݕࡧՄೳ ͝རӹ
˓ .F4)ΩʔϫʔυΛݩʹ͍ͯ͠ΔͨΊਫ਼͕ߴ͍ ˓ ͰͷݚڀಈΛݟΔ͜ͱͰ͖Δ
sra.dbcls.jp
“࣬ױ͔ΒோΊΔ”
“සผ”
ొσʔλ ࣬ױؔ࿈ҨࢠDB GendooͷϦϯΫ
νΣοΫͯ͠ Search
MeSHͷؔ࿈λʔϜ ؔ࿈ʹԠͯ͡දࣔ
“࣬ױΧςΰϦผ”
πϦʔද͔ࣔΒ σʔλΛݕࡧ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓ 43"ͷσʔλৗʹྑ͍ͷͱݶΒͳ͍ ˓ ϛεγʔέϯεొ͞Εσʔλ͕ೖ͍ͬͯΔ ˓ ͕݅ಉ͡σʔλͳΒਫ਼ͷྑ͍ͷΛ͍͍ͨ ˓ 'BTU2$ʹΑͬͯશͯͷ43"σʔλͷΫΦϦςΟΛܭࢉ ˓
IUUQXXXCJPJOGPSNBUJDTCBCSBIBNBDVLQSPKFDUTGBTURD ͝རӹ ˓ μϯϩʔυʹҰ൩͔͔ͬͨσʔλ͕յΕ͍ͯͨͱ͍͏൵ܶΛճආ ˓ γʔέϯεΫΦϦςΟͷൺֱΛ͢Δ͚ͩͰָ͍͠
g86.dbcls.jp/sra
SRA IDΛೖྗ
FastQCʹΑΔQC݁Ռ
APIʹΑΔΞΫηεՄೳ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ େྔͷݕࡧ݁Ռͷத͔ΒͲ͏ͬͯཉ͍͠σʔλΛݟ͚ͭग़͔͢ʁ ˓ αϯϓϧɼγʔέϯαʔͳͲͷ݅ΛൺΔ ˓ จ͕ग़͍ͯΔͷΛ༏ઌ͢Δ ˓ ݅Λൺֱ͢ΔͨΊʹɼؔ࿈͢ΔใΛͻͱ·ͱΊʹ͍ͨ͠ ͝རӹ
˓ αϯϓϧɼγʔέϯεɼϓϩδΣΫτͳͲͷ*%Ͱࠞཚ͠ͳ͍ ˓ จͷใΛซͤͯݟΔ͜ͱͰਖ਼֬ͳஅ͕Ͱ͖Δ
g86.dbcls.jp/kusarinoko
All, human, mouse, Arabidopsis ͔Βબ ΩʔϫʔυΛೖྗͯ͠ݕࡧ
จͷใ ώοτͨ͠σʔλͷϦετ
Study (project) Experiment Run (Sequence) / QC
Sample
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ ˓ ϓϩάϥϜΛͬͯ܁Γฦ͠ΞΫηε͍ͨ͠ ˓ Ұఆͷظؒ͝ͱʹಉ݅͡Ͱݕࡧ͍ͨ͠ ˓ େྔͷσʔλͷใΛݕࡧ͍ͨ͠ ͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
ৗͷख͕ؒ େ෯ʹݮΔ
͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
g86.dbcls.jp/sra
Sequence Quality
SRA IDม ϝλσʔλͷऔಘ
Ұ෦ͷΈެ։த ˓ ৗͷखؒΛେ෯ʹݮΒ͘͢Ӷҙ։ൃதͰ͢
4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε
ެڞγʔέϯεσʔλΛ͏खॱ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ จʹهࡌ͞Εͨ*%ΩʔϫʔυͰݕࡧ ݕࡧ݁ՌΛݸผʹݟͯཉ͍͠σʔλΛ୳͢ '51"TQFSBͳͲͰσʔλΛμϯϩʔυ͢Δ μϯϩʔυͨ͠σʔλΛ֬ೝ͢Δ
ղੳʹར༻͢Δ
֤εςοϓΛޮԽίετμϯ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ ౷ܭɼจɼ࣬ױɼΩʔϫʔυͰݕࡧ ݕࡧ݁Ռ͔ΒσʔλΛ୳͠ɼΫΦϦςΟ֬ೝ μϯϩʔυ
ެڞ/(4σʔλͷೋ࣍ར༻ྫ ˓ ࣗͷσʔλͱಉ݅͡ͷσʔλΛར༻ͯ͠/ΛՔ͙ ˓ ࣗͷσʔλͱؔ࿈͢ΔσʔλΛར༻͠ൺֱղੳΛߦ͏ ˓ ҟͳΔॲཧ۠ɼۙԑछɼ(FOPNF5SBOTDSJQUPNF&QJHFOPNFͳͲ ˓ ղੳπʔϧͷੑೳධՁʹར༻͢Δ ˓
ෳͷπʔϧͷൺֱɼ৽نπʔϧ։ൃ࣌ͷσϞσʔλͱͯ͠ ˓ σʔλΛେྔʹूΊͯϝλղੳΛߦ͏ ˓ੜछࡉ๔ͰԣஅతͳղੳͳͲ
43"͕ͬͱ͍͘͢ͳΓ·ͨ͠ ˓ ͜ΕͰͲΜͲΜެڞσʔλΛͬͯݚڀ͕Ͱ͖Δ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ·͍ͩʹ͍͘
ෆຬ ˓ ݕࡧ͕͍ ˓ πʔϧ͕όϥόϥͰ࿈ܞͮ͠Β͍
http://www.flickr.com/photos/66986780@N00/137720685/ վྑத
͠Β͓͍ͪͩ͘͘͞ ˓ ։ൃܧଓதɼϑΟʔυόοΫ͓͍ͪͯ͠·͢
ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ԿނσʔλΛҰൠެ։͢Δͷ͔ ˓ ࠶ݱੑͷ୲อ ˓ ࠶ղੳͷखஈΛఏڙ͠ਖ਼ੑΛূ໌͢Δ ˓ ೋ࣍ར༻ͷଅਐ ˓ ϦιʔεΛγΣΞ͠ɼଞͷݚڀऀʹར༻ͯ͠Β͏
࣮ ˓ ग़ͤͱݴΘΕΔ͔Βग़͢ ˓ δϟʔφϧʹߘ͢Δࡍʹެڞ%#ͷ*%ΛٻΊΒΕΔ ˓ άϥϯτͷن্શͯͷσʔλΛެ։͠ͳ͚ΕͳΒͳ͍߹
http://www.flickr.com/photos/74521133@N00/232362142/ ग़͞ͳ͖Ό͍͚ͳ͍
͝རӹ͕ͳ͍ ˓ σʔλͷެ։ʹίετ͕͔͔Δ ˓ ଞਓ͕͍͍͢Α͏ʹ៉ྷʹཧ͢Δͷେม ˓ /(4σʔλαΠζ͕େ͖͍ͷͰΞοϓϩʔυ͢ΔͷҰۤ࿑ ˓ σʔλΛग़ͨ͠ਓͷϦεϖΫτ ˓
จൃදલͷσʔλΛୈࡾऀ͕ղੳͯ͠จʹʁ ˓ ݱঢ়ͰσʔλΛग़͚ͩ͢ͰۀʹͳΒͳ͍
ΞʔΧΠϒͲ͏͋Δ͖͔ ˓ σʔλެ։ͷෑډΛԼ͛Δ ˓ σʔλΛެ։͢ΔͨΊͷํ๏Λඪ४Խ͢Δ ˓ NJOJNVNJOGPSNBUJPOͳͲ ˓ ΑΓ؆୯ʹσʔλΛొ͢ΔͨΊͷΈ ˓
σʔλར༻Λଅਐ͢Δ ˓ ެ։σʔλΛΑΓ͍͘͢ཧ͢Δ ˓ ެ։σʔλΛ༗ޮར༻͢ΔͨΊͷํ๏ ˓ σʔλΛग़͢ਓʹ͝རӹΛ ˓ ެ։σʔλʹ%0*ΛৼͬͯҾ༻ΛՄೳʹ͢Δ
Ͳ͏͢Δ ˓ Ͳ͏͢Εʁ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͕ΜΓ·͢
͝ਗ਼͋Γ͕ͱ͏͍͟͝·ͨ͠