Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sequence Read Archive: Database for High-throug...
Search
Tazro Inutano Ohta
November 20, 2012
Science
0
70
Sequence Read Archive: Database for High-throughput sequencing best practice 2012
「次世代シーケンス解析と公共データベース: Sequence Read Archiveを使い倒す」
Tazro Inutano Ohta
November 20, 2012
Tweet
Share
More Decks by Tazro Inutano Ohta
See All by Tazro Inutano Ohta
Yevis: System to support building a workflow registry with automated quality control
inutano
0
110
Standardization of biological sample information database
inutano
0
64
Describe data analysis workflow with workflow languages
inutano
5
5.1k
Container virtualization technologies and workflow languages improve portability and reproducibility of data analysis environment
inutano
3
330
次世代シーケンサーによるメタゲノム解析:桜の花びらに付着した環境DNAを解析する
inutano
0
85
Workflows that run everywhere and where to run them
inutano
0
140
The Sequence Read Archive search system to make use of public high-throughput sequencing data
inutano
0
270
Improve portability of bioinformatics software across HPC and cloud infrastructures
inutano
1
100
Container, Cloud, and HPC
inutano
0
160
Other Decks in Science
See All in Science
Planted Clique Conjectures are Equivalent
nobushimi
0
150
All-in-One Bioinformatics Platform Realized with Snowflake ~ From In Silico Drug Discovery, Disease Variant Analysis, to Single-Cell RNA-seq
ktatsuya
0
370
Introd_Img_Process_2_Frequ
hachama
0
520
学術講演会中央大学学員会大分支部
tagtag
0
140
白金鉱業Meetup Vol.15 DMLによる条件付処置効果の推定_sotaroIZUMI_20240919
brainpadpr
2
780
データベース02: データベースの概念
trycycle
PRO
2
690
Visual Analytics for R&D Intelligence @Funding the Commons & DeSci Tokyo 2024
hayataka88
0
160
機械学習 - pandas入門
trycycle
PRO
0
180
オンプレミス環境にKubernetesを構築する
koukimiura
0
210
ACL読み会2024@名大 REANO: Optimising Retrieval-Augmented Reader Models through Knowledge Graph Generation
takuma_matsubara
0
190
SpatialBiologyWestCoastUS2024
lcolladotor
0
110
トラブルがあったコンペに学ぶデータ分析
tereka114
2
1.6k
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Speed Design
sergeychernyshev
29
940
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
The Straight Up "How To Draw Better" Workshop
denniskardys
233
140k
GraphQLとの向き合い方2022年版
quramy
46
14k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
How to Think Like a Performance Engineer
csswizardry
23
1.6k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
32
5.6k
For a Future-Friendly Web
brad_frost
177
9.7k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.5k
Transcript
࣍ੈγʔέϯεղੳͱެڞσʔλϕʔε 4FRVFODF3FBE"SDIJWFΛ͍͢ /PWBU$&3* େాୡ5B[SP0IUB ▼
ຊͷ༰ ˓ ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ %#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏε ˓ 4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε ˓ ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ώΞϦϯά ˓ ࣍ੈγʔέϯαʔʁ ˓ 4FRVFODF3FBE"SDIJWFʁ ˓ %#$-443"ʁ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ
ެڞσʔλϕʔε43"4FRVFODF3FBE"SDIJWFʹ͍ͭͯ ˓ ʹ/$#*ʹΑͬͯ/(4σʔλͷऩू͕࢝·Δ ˓ ͔Β*/4%$ʹΑΔ4FRVFODF3FBE"SDIJWFͱͯ͠ӡӦ ˓ */4%$*OUFSOBUJPOBM/VDMFPUJEF4FRVFODF%BUBCBTF$PMMBCPSBUJPO ˓ ถࠃ/$#* Ԥभ&#*
ຊ%%#+ ˓ ొडɼݕࡧμϯϩʔυͳͲΛͦΕͧΕఏڙ ˓ ొ͞Εͨσʔλަ͞ΕɼͲ͔͜ΒͰΞΫηεՄೳ
Ͳ͔͜ΒͰಉ͡σʔλʹΞΫηεՄೳ Data ID : 000001 organism : mouse cell :
nervous cell sequencer : 454 date : 2011 12 08 >Seq_Numero_1 ATGCATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGCATGCATGC ATGCATGCATGCATG CATGATGCATGCATG CATGCATGCATGCAT GCATGCATGCATGCA TGCATGTGCATGTGC */4%$ σʔλަ ྻσʔλ ϝλσʔλΛొ
www.ncbi.nlm.nih.gov/sra
www.ebi.ac.uk/ena
trace.ddbj.nig.ac.jp/dra
ͬͯΈΔ ˓ IVNBOCSFBTUDBODFSͷσʔλΛ୳ͯ͠ΈΔ
None
None
None
None
None
None
http://www.everystockphoto.com/photo.php?imageId=3972069
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͍ʹ͍͘
ෆຬ ˓ σʔλϕʔεͷߏ͕ෳࡶա͗Δ
σʔλ͕୳ͤͳ͍
ϛογϣϯ ˓ ެڞσʔλΛͬͱ୳͘͢͠ɼ͍͘͢͢Δ
%#$-4Ͱఏڙ͍ͯ͠Δ43"ؔ࿈αʔϏεɾπʔϧ
43"ͷσʔλΛ͍͘͢͢ΔͨΊʹ ˓ 43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ ࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ ֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ
43"ͷ֤छ౷ܭΛݩʹσʔλΛݕࡧ͢Δ ˓ ྻσʔλͱڞʹొ͞ΕΔϝλσʔλ ऍσʔλ Λूܭ ˓ γʔέϯαʔͷछྨɼαϯϓϧੜछɼ࣮ݧछΛϥϯΩϯάදࣔ ˓ σʔλొͷ৳ͼΛάϥϑͰදࣔ ͝རӹ
˓ ޮͷΑ͍ʮߜΓࠐΈݕࡧʯ ˓ ʮͲͷΑ͏ͳσʔλ͕ͲΕ͘Β͍͋Δͷ͔ʯ͕ҰͰ͔Δ ˓ ۀքͷτϨϯυΛՄࢹԽ
sra.dbcls.jp
“౷ܭ͔Β୳͢”
৳ͼͷάϥϑ
࣮ݧछผʹ৭͚
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ
ൃද͞ΕͨจΛݩʹσʔλΛݕࡧ͢Δ ˓ 43"ͷσʔλʹจͷϦϯΫ͕ਵ͍ͯ͠ͳ͍ͷ͕ଟ͍ ˓ จ͕ग़Δલʹొ͞ΕΔσʔλ͕ଟ͍ͨΊ ˓ จݙͷத͔Βެ։σʔλͷݴٴΛநग़͠ɼ*%Λ݁ͼ͚ͭΔ ͝རӹ ˓ δϟʔφϧɼจൃද࣌ɼจλΠτϧͳͲͰιʔτͯ͠ݕࡧ
˓ ࣮ݧछɼੜछɼγʔέϯαͳͲͰߜࠐΈՄೳ
sra.dbcls.jp
“จݙ͔Β୳͢”
ߜࠐΈݕࡧ ֤ϑΟʔϧυͰฒସ͑
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ
࣬ױΛΩʔϫʔυʹσʔλΛݕࡧ͢Δ ˓ /(4ҩֶܥͰͷར༻ଟ͍ ˓ 43"ผʹ͔Ε͍ͯͳ͍ͷͰݕࡧ͕ͮ͠Β͍ ˓ จݙʹਵ͢Δ.F4)UFSNΛݩʹσʔλΛ࣬ױͰཧ ˓ σʔλొͷଟ͍ͷɼ࣬ױͷΧςΰϦΛݩʹݕࡧՄೳ ͝རӹ
˓ .F4)ΩʔϫʔυΛݩʹ͍ͯ͠ΔͨΊਫ਼͕ߴ͍ ˓ ͰͷݚڀಈΛݟΔ͜ͱͰ͖Δ
sra.dbcls.jp
“࣬ױ͔ΒோΊΔ”
“සผ”
ొσʔλ ࣬ױؔ࿈ҨࢠDB GendooͷϦϯΫ
νΣοΫͯ͠ Search
MeSHͷؔ࿈λʔϜ ؔ࿈ʹԠͯ͡දࣔ
“࣬ױΧςΰϦผ”
πϦʔද͔ࣔΒ σʔλΛݕࡧ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ
ొ͞ΕͨσʔλͷγʔέϯεΫΦϦςΟΛݟΔ ˓ 43"ͷσʔλৗʹྑ͍ͷͱݶΒͳ͍ ˓ ϛεγʔέϯεొ͞Εσʔλ͕ೖ͍ͬͯΔ ˓ ͕݅ಉ͡σʔλͳΒਫ਼ͷྑ͍ͷΛ͍͍ͨ ˓ 'BTU2$ʹΑͬͯશͯͷ43"σʔλͷΫΦϦςΟΛܭࢉ ˓
IUUQXXXCJPJOGPSNBUJDTCBCSBIBNBDVLQSPKFDUTGBTURD ͝རӹ ˓ μϯϩʔυʹҰ൩͔͔ͬͨσʔλ͕յΕ͍ͯͨͱ͍͏൵ܶΛճආ ˓ γʔέϯεΫΦϦςΟͷൺֱΛ͢Δ͚ͩͰָ͍͠
g86.dbcls.jp/sra
SRA IDΛೖྗ
FastQCʹΑΔQC݁Ռ
APIʹΑΔΞΫηεՄೳ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ
ݕࡧ݁Ռʹؔ࿈͢ΔϝλσʔλΛޮΑ͘ϒϥδϯά͢Δ ˓ େྔͷݕࡧ݁Ռͷத͔ΒͲ͏ͬͯཉ͍͠σʔλΛݟ͚ͭग़͔͢ʁ ˓ αϯϓϧɼγʔέϯαʔͳͲͷ݅ΛൺΔ ˓ จ͕ग़͍ͯΔͷΛ༏ઌ͢Δ ˓ ݅Λൺֱ͢ΔͨΊʹɼؔ࿈͢ΔใΛͻͱ·ͱΊʹ͍ͨ͠ ͝རӹ
˓ αϯϓϧɼγʔέϯεɼϓϩδΣΫτͳͲͷ*%Ͱࠞཚ͠ͳ͍ ˓ จͷใΛซͤͯݟΔ͜ͱͰਖ਼֬ͳஅ͕Ͱ͖Δ
g86.dbcls.jp/kusarinoko
All, human, mouse, Arabidopsis ͔Βબ ΩʔϫʔυΛೖྗͯ͠ݕࡧ
จͷใ ώοτͨ͠σʔλͷϦετ
Study (project) Experiment Run (Sequence) / QC
Sample
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ {@}
֤43"ؔ࿈αʔϏεͷ"1*Λར༻͢Δ ˓ ϓϩάϥϜΛͬͯ܁Γฦ͠ΞΫηε͍ͨ͠ ˓ Ұఆͷظؒ͝ͱʹಉ݅͡Ͱݕࡧ͍ͨ͠ ˓ େྔͷσʔλͷใΛݕࡧ͍ͨ͠ ͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
ৗͷख͕ؒ େ෯ʹݮΔ
͝རӹ ˓ ৗͷख͕ؒେ෯ʹݮΔ
g86.dbcls.jp/sra
Sequence Quality
SRA IDม ϝλσʔλͷऔಘ
Ұ෦ͷΈެ։த ˓ ৗͷखؒΛେ෯ʹݮΒ͘͢Ӷҙ։ൃதͰ͢
4FRVFODF3FBE"SDIJWFϕετϓϥΫςΟε
ެڞγʔέϯεσʔλΛ͏खॱ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ จʹهࡌ͞Εͨ*%ΩʔϫʔυͰݕࡧ ݕࡧ݁ՌΛݸผʹݟͯཉ͍͠σʔλΛ୳͢ '51"TQFSBͳͲͰσʔλΛμϯϩʔυ͢Δ μϯϩʔυͨ͠σʔλΛ֬ೝ͢Δ
ղੳʹར༻͢Δ
֤εςοϓΛޮԽίετμϯ ݕࡧ ϒϥδϯά μϯϩʔυ νΣοΫ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ
ࣄલʹνΣοΫͯ͠μϯϩʔυͷίετԼ͛Δ ݕࡧ ϒϥδϯάɾνΣοΫ μϯϩʔυ ղੳ ౷ܭɼจɼ࣬ױɼΩʔϫʔυͰݕࡧ ݕࡧ݁Ռ͔ΒσʔλΛ୳͠ɼΫΦϦςΟ֬ೝ μϯϩʔυ
ެڞ/(4σʔλͷೋ࣍ར༻ྫ ˓ ࣗͷσʔλͱಉ݅͡ͷσʔλΛར༻ͯ͠/ΛՔ͙ ˓ ࣗͷσʔλͱؔ࿈͢ΔσʔλΛར༻͠ൺֱղੳΛߦ͏ ˓ ҟͳΔॲཧ۠ɼۙԑछɼ(FOPNF5SBOTDSJQUPNF&QJHFOPNFͳͲ ˓ ղੳπʔϧͷੑೳධՁʹར༻͢Δ ˓
ෳͷπʔϧͷൺֱɼ৽نπʔϧ։ൃ࣌ͷσϞσʔλͱͯ͠ ˓ σʔλΛେྔʹूΊͯϝλղੳΛߦ͏ ˓ੜछࡉ๔ͰԣஅతͳղੳͳͲ
43"͕ͬͱ͍͘͢ͳΓ·ͨ͠ ˓ ͜ΕͰͲΜͲΜެڞσʔλΛͬͯݚڀ͕Ͱ͖Δ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ·͍ͩʹ͍͘
ෆຬ ˓ ݕࡧ͕͍ ˓ πʔϧ͕όϥόϥͰ࿈ܞͮ͠Β͍
http://www.flickr.com/photos/66986780@N00/137720685/ վྑத
͠Β͓͍ͪͩ͘͘͞ ˓ ։ൃܧଓதɼϑΟʔυόοΫ͓͍ͪͯ͠·͢
ެڞσʔλϕʔεͱ/(4ɺ՝ͱ͜Ε͔Β
ԿނσʔλΛҰൠެ։͢Δͷ͔ ˓ ࠶ݱੑͷ୲อ ˓ ࠶ղੳͷखஈΛఏڙ͠ਖ਼ੑΛূ໌͢Δ ˓ ೋ࣍ར༻ͷଅਐ ˓ ϦιʔεΛγΣΞ͠ɼଞͷݚڀऀʹར༻ͯ͠Β͏
࣮ ˓ ग़ͤͱݴΘΕΔ͔Βग़͢ ˓ δϟʔφϧʹߘ͢Δࡍʹެڞ%#ͷ*%ΛٻΊΒΕΔ ˓ άϥϯτͷن্શͯͷσʔλΛެ։͠ͳ͚ΕͳΒͳ͍߹
http://www.flickr.com/photos/74521133@N00/232362142/ ग़͞ͳ͖Ό͍͚ͳ͍
͝རӹ͕ͳ͍ ˓ σʔλͷެ։ʹίετ͕͔͔Δ ˓ ଞਓ͕͍͍͢Α͏ʹ៉ྷʹཧ͢Δͷେม ˓ /(4σʔλαΠζ͕େ͖͍ͷͰΞοϓϩʔυ͢ΔͷҰۤ࿑ ˓ σʔλΛग़ͨ͠ਓͷϦεϖΫτ ˓
จൃදલͷσʔλΛୈࡾऀ͕ղੳͯ͠จʹʁ ˓ ݱঢ়ͰσʔλΛग़͚ͩ͢ͰۀʹͳΒͳ͍
ΞʔΧΠϒͲ͏͋Δ͖͔ ˓ σʔλެ։ͷෑډΛԼ͛Δ ˓ σʔλΛެ։͢ΔͨΊͷํ๏Λඪ४Խ͢Δ ˓ NJOJNVNJOGPSNBUJPOͳͲ ˓ ΑΓ؆୯ʹσʔλΛొ͢ΔͨΊͷΈ ˓
σʔλར༻Λଅਐ͢Δ ˓ ެ։σʔλΛΑΓ͍͘͢ཧ͢Δ ˓ ެ։σʔλΛ༗ޮར༻͢ΔͨΊͷํ๏ ˓ σʔλΛग़͢ਓʹ͝རӹΛ ˓ ެ։σʔλʹ%0*ΛৼͬͯҾ༻ΛՄೳʹ͢Δ
Ͳ͏͢Δ ˓ Ͳ͏͢Εʁ
http://www.flickr.com/photos/mindaugasdanys/3766009204/ ͕ΜΓ·͢
͝ਗ਼͋Γ͕ͱ͏͍͟͝·ͨ͠