Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sequence Read Archive: Database for High-throughput sequencing best practice 2013

Sequence Read Archive: Database for High-throughput sequencing best practice 2013

統合データベース講習会 AJACS富山「次世代シーケンスデータベース Sequence Read Archive を利用する」

Tazro Inutano Ohta

August 30, 2013
Tweet

More Decks by Tazro Inutano Ohta

Other Decks in Research

Transcript

  1. ӡӦମ੍ ϙϦγʔ ެ։͞ΕΔσʔλ About SRA /$#*43" &#*&/" %%#+%3"͔ΒͳΔ*/4%$ʹΑͬͯ ڠಉӡӦɽہͷͲ͔͜ΒͰ΋ొ࿥ɾσʔλΞΫηε͕Մೳɽ ഑ྻσʔλͱ࣮ݧɼαϯϓϧͳͲͷৄࡉΛهड़ͨ͠ϝλσʔλɽ

    ഑ྻ͸H[C[΋͘͠͸ಠࣗܗࣜͰѹॖ͞Εͨ΋ͷ͕%-Մೳɽ C L J ௒ฒྻܕγʔέϯα͔ΒಘΒΕΔҰ࣍഑ྻσʔλΛड෇ɾެ։ɽ QFSTPOBMMZJEFOUJGJBCMFͳσʔλ͸ผ%# EC(B1 &(" ʹొ࿥ɽ
  2. Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment

    Run Run Run Run ഑ྻσʔλͱڞʹొ࿥͞ΕΔϝλσʔλ͸छྨͷΦϒδΣΫτ͔Β ߏ੒͞ΕɼΦϒδΣΫτͷछྨʹԠͯ͡৘ใ͕هड़͞ΕΔ
  3. Metadata Object Dependencies Submission Analysis Study Sample Sample Experiment Experiment

    Run Run Run Run ϝλσʔλΛొ࿥୯ҐͰ·ͱΊΔ4VCNJTTJPOΛআ͘ͱɼ جຊతͳϝλσʔλͷηοτͷؔ܎ੑ͸͜ͷΑ͏ʹͳΔ
  4. Metadata Object Dependencies DRA000001 DRZ000001 DRP000001 DRS000001 DRS000001 DRX000001 DRX000002

    DRR000004 DRR000003 DRR000002 DRR000001 ͦΕͧΕͷΦϒδΣΫτ͸ಠࣗͷ*%Λ͍࣋ͬͯΔɽ *%͸σʔλΛड͚෇͚ͨ%#ͱΦϒδΣΫτͷछྨΛࣔ͢ ӳࣈࣈʹଓܻ͘ͷ਺ࣈͰࣔ͞ΕΔ
  5. Metadata Tips ΦϒδΣΫτɼ*%ͷؔ܎͸ෳࡶ େن໛ͳϓϩδΣΫτʹͳΔͱ3VO΍4BNQMF͕਺ඦʹ΋ͳΔɽ ·ͨɼ༷ʑͳཧ༝Ͱ ྫ֎తʹ ϧʔϧ͔Β֎Ε͍ͯΔ΋ͷ΋͋Δɽ O lͲͷ৘ใ͕Ͳ͜ʹهड़͞ΕΔ͔zΛ೺Ѳ͢Δ ొ࿥ऀʹΑͬͯϝλσʔλͷهड़ʹ͕ࠩ͋Δ

    ಛʹϥΠϒϥϦௐ੔ͷ߲ͳͲɽ ࿦จͳͲͷ৘ใ͕ߋ৽͞Εͯ΋Ξοϓσʔτ͞Εͳ͍৔߹΋ɽ σʔλͷొ࿥͚ͩͰͳ͘ɼݕࡧ͢Δࡍʹ΋ॏཁɽ ৄ͘͠͸IUUQUSBDFEECKBDKQESBNFUBEBUBIUNM
  6. S ΑΓϢʔβࢦ޲ͷݕࡧػೳΛఏڙ͢Δ ΑΓଟ͘ͷ৘ใΛݕࡧʹ൓өͤ͞Δ ࣗಈԽʹରԠ͢Δ DBCLS SRA Metadata Search ϝλσʔλ*%ʹΑΔ؅ཧͳͲɼϢʔβʹͱͬͯ ֮͑ͳ͚Ε͹͍͚ͳ͍஌ࣝΛͳΔ΂͘ݮΒ͢ɽ

    ϝλσʔλ͚ͩͰͳ͘ɼ౷߹͞Εͨଞ%#ͷ৘ใ΍ ಠࣗͷ৘ใΛऔΓೖΕͨॊೈͳݕࡧػೳΛ։ൃɽ खಈͰݕࡧΛ܁Γฦ͢ͷ͸ޮ཰͕ѱ͍ɽ ࣗಈԽͰղੳύΠϓϥΠϯ΁ͷ૊ΈࠐΈ΋Մೳʹɽ
  7. S ݕࡧΛՄࢹԽ͢Δ ࿦จ৘ใ΋ؚΊͨΩʔϫʔυݕࡧ Ϧʔυͷ৘ใΛ%-લʹ֬ೝ͢Δ DBCLS SRA Metadata Search ͳͥݕࡧ݁Ռ͕ଟ͍গͳ͍ͷ͔ɼ શମʹ͓͚Δׂ߹Λݟͯ൑அͰ͖Δɽ

    ͳΔ΂͘ଟ͘ͷؔ࿈͢Δσʔλ͕ݕࡧͰ ώοτ͢ΔΑ͏ʹݕࡧର৅Λ֦େ͍ͯ͠Δɽ μ΢ϯϩʔυʹ͸௕͍࣌ؒΛཁ͢Δ͜ͱ΋ɽ ࣮֬ʹ࢖͑Δσʔλ͚ͩΛબͿͨΊͷ৘ใΛఏڙɽ
  8. X ͜Ε͔Βߦ͏γʔέϯε ࠓਐߦ͍ͯ͠Δγʔέϯε ׬ྃͨ͠γʔέϯε Use cases of Public data ྨࣅϓϩδΣΫτͱσʔλͷ৘ใΛݩʹɼ

    γʔέϯγϯάͷσβΠϯͱγϛϡϨʔγϣϯΛߦ͏ɽ ಉ͡ੜ෺छɾγʔέϯαͷσʔλΛݩʹɼ γʔέϯγϯάͷΫΦϦςΟͷධՁΛߦ͏ɽ ۙԑछɾྨࣅϓϩδΣΫτͷσʔλΛ௥Ճͯ͠ɼ ղੳͷਫ਼౓޲্ʹ໾ཱͯΔɽ
  9. ∑ ࣮ݧछ͕ٻΊΔϦʔυεϖοΫΛ೺Ѳ͢Δ γʔέϯαͷϦʔυεϖοΫΛ೺Ѳ͢Δ Practical search tips ήϊϜϦγʔέϯεɼ3/"4FR $I*14FRͳͲɼ ࣮ݧͷछྨʹΑͬͯඞཁͳϦʔυ௕ɼϦʔυ਺͸ҟͳΔɽ ͦΕͧΕͷγʔέϯα͔ΒಘΒΕΔϦʔυͷεϖοΫ͸ɼ

    ࢼༀͷΞοϓσʔτ౳ʹΑͬͯ΋มΘΔͷͰ஫ҙ͕ඞཁɽ ੜ෺छͱ࣮ݧछʹԠͯ͡γʔέϯαΛબ୒͢Δ ެڞ%#͔ΒྨࣅͷϓϩδΣΫτΛݕࡧ͢ΔͨΊʹ΋ɼ ήϊϜαΠζͱ࣮ݧछʹԠͨ͡Ϧʔυͷ৘ใ͕ॏཁ
  10. Required read spec by application application / ࣮ݧछ total bases

    / ૯Ԙج਺ read length / Ϧʔυ௕ read number (M) / Ϧʔυ਺ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜ෺ήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜ෺ήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔޻ֶผ࡭࣍ੈ୅γʔέϯαʔ໨తผΞυόϯετϝιουQΑΓҾ༻ ஫ର৅ͷήϊϜαΠζͳͲͰ਺ࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹ৘ใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ΋͋Γ·͢
  11. Required read spec by application ४උಋೖ ώτήϊϜղੳ Ҩ఻ࢠൃݱ੍ޚղੳ ৽نήϊϜ഑ྻܾఆ ΤϐδΣωςΟΫεղੳ

    ϝλήϊϜղੳ ήϊϜߏ଄ղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ એ఻Ͱ͕͢པ·Ε͍ͯΔΘ͚Ͱ΋ചΕΔͱ๻ʹ͓͕ۚೖΔΘ͚Ͱ΋͋Γ·ͤΜ