クラウドを活用したゲノム情報解析の現状

 クラウドを活用したゲノム情報解析の現状

情報処理学会 連続セミナー 2016 第2回 クラウド http://www.ipsj.or.jp/event/seminar/2016/program02.html

991f3366d9cc17386e6a66ef4abc6dbc?s=128

Tazro Inutano Ohta

July 22, 2016
Tweet

Transcript

  1. Ϋϥ΢υΛ׆༻ͨ͠ήϊϜ৘ใղੳͷݱঢ় 22 July 2016 | ৘ใॲཧֶձ ࿈ଓηϛφʔ 2016 ୈ2ճ Ϋϥ΢υ

    େా ୡ࿠! େֶڞಉར༻ػؔ๏ਓ ৘ใɾγεςϜݚڀػߏ " σʔλαΠΤϯεڞಉར༻ج൫ࢪઃ " ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔ ಛ೚ݚڀһ" t.ohta@dbcls.rois.ac.jp Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS)
  2. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Agenda! #

    1. ࠓੜ໋Պֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔" # 2. ࠓͲͷΑ͏ͳܭࢉػ͕ٻΊΒΕ͍ͯΔͷ͔" # 3. Ϋϥ΢υΛ׆༻ͯ͠໰୊Λղܾ͢Δ
  3. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) 1. ࠓੜ໋Պֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔

  4. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ࠓੜ໋Պֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔ #

    ࣮ݧػցͷਐาʹΑͬͯσʔλͷαΠζͱྔ͕૿Ճ" # ήϊϜ෼໺Ͱ͸ʮ࣍ੈ୅DNAγʔΫΤϯαʔʯ͕ొ৔" # σʔλͷ஝ੵʹΑͬͯܭࢉػੜ෺ֶ͕੝Μʹͳ͍ͬͯΔ" # λϯύΫཱ࣭ମߏ଄σʔλɺը૾σʔλ" # σʔλॲཧɾղੳͷޮ཰Խ͸ࠓͳ͓ٸ຿" # ΞϧΰϦζϜͷਐาΛ଴͍ͬͯΔ࣌ؒ͸ͳ͍" # ϋʔυ΢ΣΞͷੑೳͰ໰୊Λղܾ͢Δ৔߹΋
  5. λϯύΫཱ࣭ମߏ଄ղੳͷྫ! MEGADOCK: ౦޻େळࢁݚڀࣨ େ্ॿڭΒͷϓϩδΣΫτ http://www.nii.ac.jp/csi/openforum2016/track/pdf/20160526AM_TOUKOUDAI_akiyama2.pdf

  6. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ήϊϜՊֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔ #

    ࣮ݧػցͷਐาΛཚ๫ʹྫ͑ΔͳΒ…" # ւ = ήϊϜ, ڕ = Ҩ఻ࢠ" # ʮͲΜͳڕ͕͍Δ͔ௐ΂Δ͜ͱͰւΛಛ௃͚ͮΔʯ" # ٕज़ͷਐาͰಓ۩ͷੑೳ͕޲্ͨ͠" # ௼Γ؄͕ఈҾ͖໢ʹͳͬͨ
  7. ࣸਅࠨ: πϦόΧϝϥ @kazzwatabe https://tsuriba.camera/posts/XQeP3qmIp6A ࣸਅӈ: photo by atramos https://www.flickr.com/photos/atramos/5508960637 ࣮ݧػց͕ਐา͢Δͱ݁Ռͷղऍʹίετ͕͔͔Δ

    ͜Ε·ͰͷDNAγʔέϯαʔͷग़ྗσʔλ͸໨Ͱݟͯ֬ೝͰ͖ͨ ࠓͷDNAγʔέϯαʔͷग़ྗ͸ܭࢉػ͕ͳ͍ͱԿ΋Ͱ͖ͳ͍
  8. https://flxlexblog.wordpress.com/2014/06/11/developments-in-next-generation-sequencing-june-2014-edition/ ←௼؄ ←ఈҾ͖໢ DNAγʔέϯα ػछ͝ͱͷੑೳൺֱ

  9. None
  10. http://www.ncbi.nlm.nih.gov/Traces/sra/ ެڞσʔλϨϙδτϦͷσʔλαΠζͷ৳ͼ

  11. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) DNAγʔέϯα͔ΒಘΒΕΔσʔλ #

    ʮήϊϜΛղಡ͢ΔʯͱҰݴͰݴ͏΋ͷͷ…" # ੜମαϯϓϧ͔ΒDNAΛநग़͢Δ" # நग़ͨ͠DNAΛ୹͍෼ࢠʹஅยԽ͢Δ" # DNAγʔέϯαͰղੳ͢Δ" # ୹͘அยԽ͞ΕͨԘج഑ྻͷϦετͰग़ྗ͞ΕΔ" # େྔͷDNAஅยͷ৘ใ͔ΒݩͷDNAΛ෮ݩ͢Δ! # de novo Assemble" # Reference Alignment" "
  12. https://speakerdeck.com/michaelbarton/ranking-genome-assemblers-with-docker-containers-dockercon-eu-2014 DNAγʔέϯα͔Βग़ྗ͞ΕΔσʔλ͸அยԽ͍ͯ͠Δ

  13. https://speakerdeck.com/michaelbarton/ranking-genome-assemblers-with-docker-containers-dockercon-eu-2014 DNAγʔέϯαΛγϡϨομʔʹྫ͑Δͱ…

  14. https://speakerdeck.com/michaelbarton/ranking-genome-assemblers-with-docker-containers-dockercon-eu-2014 ήϊϜΞηϯϒϧ = ຊͷ෮ݩ

  15. https://speakerdeck.com/michaelbarton/ranking-genome-assemblers-with-docker-containers-dockercon-eu-2014 ήϊϜΞηϯϒϧ = ຊͷ෮ݩ

  16. http://www.historyofnimr.org.uk/mill-hill-essays/essays-yearly-volumes/2010-2/bringing-it-all-back-home-next-generation- sequencing-technology-and-you/ ϦϑΝϨϯεΞϥΠϯϝϯτ! = खຊ (ϦϑΝϨϯε) ʹԊͬͯฒ΂ͯ෮ݩ

  17. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) σʔλղੳιϑτ΢ΣΞ (ղੳπʔϧ)

    # ଟ͘ͷղੳπʔϧ͕ΦʔϓϯιʔεͰެ։͞Ε͍ͯΔ" # ର৅σʔλͷੑ࣭ʹΑͬͯ࠷దͳπʔϧ͕ҟͳΔ" # σʔλղੳऀ (ੜ෺ֶऀ) ͕σʔλղੳΛߦ͏" # πʔϧ։ൃऀ(࣮૷ऀ)ͱར༻ऀ͸ಉҰͰ͸ͳ͍" # ར༻ऀ͕πʔϧͷڍಈΛ׬શʹ೺Ѳ͍ͯ͠Δͱ͸ݶΒͳ͍" # ղੳऀ͸ৗʹσʔλղੳΛ͍ͯ͠ΔΘ͚Ͱ͸ͳ͍" # ੜ෺࣮ݧͷยखؒʹղੳΛ͢Δݚڀऀ΋ଟ͍
  18. ΦʔϓϯιʔεͰެ։͞Ε࣮ͨ૷Λ༻͍ͯղੳ https://omictools.com/de-novo-genome-sequencing-category

  19. ΦʔϓϯιʔεͰެ։͞Ε࣮ͨ૷Λ༻͍ͯղੳ https://omictools.com/whole-genome-resequencing-category

  20. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ࠓੜ໋Պֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔! #

    ·ͱΊ" # σʔλͷྔͱ਺͕ٸܹʹ૿͓͑ͯΓɺࠓޙ΋૿͑Δ" # ໨తʹΑͬͯҟͳΔπʔϧɾΞϧΰϦζϜ͕࢖༻͞ΕΔ" # σʔλղੳऀͱπʔϧ։ൃऀ(࣮૷ऀ)͸ҟͳΔ͜ͱ͕ଟ͍
  21. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) 2. ࠓͲͷΑ͏ͳܭࢉػ͕ٻΊΒΕ͍ͯΔͷ͔

  22. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ࠓͲͷΑ͏ͳܭࢉػ͕࢖ΘΕ͍ͯΔͷ͔ #

    PC" # PCΫϥελ" # ڌ఺εύίϯ" # ࠃཱҨ఻ֶݚڀॴ εʔύʔίϯϐϡʔλγεςϜ
  23. ࣍ੈ୅γʔΫΤϯαʔ%3:ղੳڭຊ ࡉ๔޻ֶผ࡭ ΑΓ ڭຊʹ͸MacΛങ͑ͱॻ͍ͯ͋Δ

  24. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ͲͷΑ͏ͳܭࢉػ͕ٻΊΒΕΔͷ͔ #

    ର৅σʔλ͕େ͖͘ͳΔ/૿͑Δͱ௨ৗͷPCͰ͸ݫ͍͠" # ղੳσʔλ͕ͲΜͲΜཷ·Δ" # ಡΈॻ͖͕ߴ଎ͰڊେͳετϨʔδ! # πʔϧ͕Out of memoryͰམͪΔ" # େن໛ϝϞϦ! # όονॲཧΛେྔͷαϯϓϧʹର࣮ͯ͠ߦ͢Δ" # ෼ࢄ࣮ߦδϣϒεέδϡʔϦϯάγεςϜ! # େܕڞ༻ܭࢉػ΁ͷཁٻͷߴ·Γ" # Ҩ఻ֶݚڀॴSCͷಋೖ (2012~) => ·ͩे෼Ͱ͸ͳ͍
  25. େֶڞಉར༻ػؔ๏ਓ ৘ใɾγεςϜݚڀػߏ ࠃཱҨ఻ֶݚڀॴ SuperComputer Facilities of National Institute of Genetics

    photo from http://sc.ddbj.nig.ac.jp/index.php/ja-gallery
  26. None
  27. ૿͑ଓ͚ΔϢʔβ਺ Ҩ఻ݚDDBJηϯλʔ খּݪ͞ΜͷൃදࢿྉΑΓ

  28. ṧഭ͢ΔσΟεΫ https://sc.ddbj.nig.ac.jp/index.php/ja-nig-statistics

  29. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ݱ৔Ͱ͸Կ͕ϘτϧωοΫͳͷ͔! εύίϯϢʔβձͳͲͷώΞϦϯάΑΓ

    # ܭࢉػʹෆ׳ΕͳϢʔβͷ೰Έ" # ܭࢉػ͝ͱʹԿ͕Ͱ͖ͯԿ͕Ͱ͖ͳ͍ͷ͔Θ͔Βͳ͍" # େن໛ͳܭࢉػΛඞཁͱ͢Δ͕CUI͕࢖͑ͳ͍" # ܭࢉػΛ࢖͍͜ͳ͢ਓͷ೰Έ" # ܭࢉػ͕ࠞΜͰ͍ͯδϣϒ͕ྲྀͤͳ͍" # σʔλͷղੳ΍อଘʹे෼ʹ༧ࢉΛ౤ೖͰ͖ͳ͍! # ؀ڥߏஙʹίετ͕͔͔Δ" # ܭࢉػͷ໘౗Λݟͨ͘ͳ͍
  30. ʮੜ෺࣮ݧʹ͸͓͕͔͔ۚΔ͕ɺ
 ɹσʔλղੳʹ͸ͦΕ΄Ͳ͓͕͔͔ۚΒͳ͍ʯͱࢥΘΕ͍ͯΔ http://trattoriainutano.tumblr.com/post/132214903857/

  31. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ͲͷΑ͏ͳܭࢉػ͕ٻΊΒΕ͍ͯΔͷ͔ #

    ·ͱΊ" # ର৅σʔλͱ໨తʹΑͬͯཁٻʹ͕ࠩ͋Δ" # ήϊϜ෼໺Ͱ͸ετϨʔδ΍ϝϞϦͷ೰Έ͕ਂࠁ" # ϢʔβͷܭࢉػϦςϥγʹ΋෯͕͋Δ" # ϢʔβͷϨϕϧʹΑͬͯٻΊΔϨΠϠʔ͕ҧ͏
  32. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) 3. Ϋϥ΢υΛ׆༻ͯ͠໰୊Λղܾ͢Δ

  33. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Ϋϥ΢υΛ׆༻ͯ͠໰୊Λղܾ͢Δ #

    Ϋϥ΢υͰղܾͰ͖Δ໰୊" # ಋೖίετ" # ϊʔυͷࠞࡶ" # ϝϯςφϯείετ" # Ϋϥ΢υར༻ʹ͓͚Δ՝୊" # ετϨʔδͷίετ" # ݚڀඅͰͷࢧ෷͍" # ະൃදσʔλ / ݸਓ৘ใΛؚΉσʔλͷѻ͍
  34. Ϋϥ΢υ׆༻ࣄྫ (SaaS)! Google Genomics https://cloud.google.com/genomics/v1/analyze-variants

  35. Ϋϥ΢υ׆༻ࣄྫ (IaaS)! 1000ਓήϊϜσʔλ on AWS https://aws.amazon.com/jp/1000genomes/

  36. The NIH Commons! ถࠃͰ͸ϑΝϯσΟϯάଆ͕Ϋϥ΢υར༻Λଅਐ “The Commons is a shared virtual

    space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage, share, use and reuse data, software, metadata and workflows.” - https://datascience.nih.gov/ commons
  37. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Ϋϥ΢υ׆༻ࣄྫ (PaaS/SaaS)!

    ήϊϜղੳύΠϓϥΠϯ on ΞΧσϛοΫɾΠϯλʔΫϥ΢υ # JST CREST: ΠϯλʔΫϥ΢υΛ׆༻ͨ͠ΞϓϦέʔγϣϯத৺ܕΦʔόʔ ϨΠΫϥ΢υٕज़ʹؔ͢Δݚڀ (୅ද: NII߹ాઌੜ)" # ΞΧσϛοΫɾΠϯλʔΫϥ΢υͷࢼΈ" # Ҩ఻ݚεύίϯΛ৘ใݚΫϥ΢υଞࠃ಺ͷΞΧσϛοΫΫϥ΢υͱ࿈ܞ" # ղੳʹ༻͍ΒΕΔ֤πʔϧΛDockerԽ͢Δ͜ͱͰΞϓϦέʔγϣϯΛ ϙʔλϒϧʹ" # ༧ΊπʔϧΛ૊Έ߹ΘͤͨϫʔΫϑϩʔΛߏங͠GUIΛఏڙ" # ղੳσʔλ͝ͱʹ࠷దͳϦιʔεΛׂΓ౰ͯͨܭࢉػΛ্ཱͪ͛
  38. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Ϋϥ΢υΛ׆༻ͯ͠໰୊Λղܾ͢Δ #

    ·ͱΊ: Ϋϥ΢υར༻ʹ͓͚Δ՝୊" # ετϨʔδͷίετ" # ܭࢉ࣌͸ߴ଎ͳI/OΛཁٻ" # อ؅࣌͸௿ίετͳετϨʔδ" # (঎༻Ϋϥ΢υͷ৔߹) ݚڀඅͰͷࢧ෷͍" # ݸਓ৘ใΛؚΉσʔλͷѻ͍" # ҆શੑͷཱ֬ - ར༻࣮੷ͷ஝ੵ" # ΨΠυϥΠϯ౳ͷࡦఆ
  39. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) େֶ΍පӃͰ͸ήϊϜ৘ใ͕࣍ʑʹ! AMEDͳͲͷػߏʹΑͬͯήϊϜ෼໺͕ਪਐ͞Ε͍ͯΔ

  40. Secure cloud computing for genomic data! Datta, Somalee, Keith Bettinger,

    and Michael Snyder. "Secure cloud computing for genomic data." Nature Biotechnology 34.6 (2016): 588-591.! ήϊϜσʔλղੳʹΫϥ΢υΛ༻͍Δ͋ͨΊʹඞཁͳηΩϡϦςΟ͸ ݚڀػؔͱΫϥ΢υϓϩόΠμͷ࿈ܞʹΑͬͯ੒͞ΕΔඞཁ͕͋Δ
  41. Secure cloud computing for genomic data! Datta, Somalee, Keith Bettinger,

    and Michael Snyder. "Secure cloud computing for genomic data." Nature Biotechnology 34.6 (2016): 588-591.! # Security requirements" # The data privacy agreement / σʔλͷऔѻʹ͍ͭͯͷݚڀػؔͱͷ߹ҙ" # Physical and logical security / ෺ཧ/࿦ཧͰͷηΩϡϦςΟ" # Encryption data / σʔλͷอ؅/సૹ࣌ͷ҉߸Խ" # Authentication / Ϣʔβೝূ " # Principle of Least Privilege / ࠷খݖݶͷݪଇ" # Firewalls / ϑΝΠϠʔ΢Υʔϧ" # Logging and monitoring / ϩΪϯάͱϞχλϦϯά" # Training / ηΩϡϦςΟ΍ೝূʹ͍ͭͯͷτϨʔχϯά" # Security and privacy / ݸਓ৘ใͷอޢ
  42. ݸਓ৘ใͷऔΓѻ͍ͱݚڀར༻ͷؔ܎! ೔ຊܦࡁ৽ฉʮҩֶݚڀͱݸਓ৘ใͷཱ྆Λ ʯΑΓ! http://www.nikkei.com/article/DGXKZO05121060S6A720C1EA1000/ ݸਓ৘ใΛؚΉݚڀσʔλ͸පؾͷݪҼղ໌΍࣏ྍʹඇৗʹॏཁ ηΩϡΞͳ؀ڥ͕͋Ε͹ݚڀΛਪਐ͢Δେ͖ͳ෢ثʹͳΔ

  43. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Summary

  44. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) Summary #

    ࠓੜ໋Պֶͷ෼໺ͰԿ͕ى͖͍ͯΔͷ͔" ◦ େن໛ͳσʔλͷ஝ੵʹΑΓܭࢉػधཁ͕ߴ·͍ͬͯΔ" # ࠓͲͷΑ͏ͳܭࢉػ͕ٻΊΒΕ͍ͯΔͷ͔" ◦ ήϊϜ෼໺Ͱ͸ετϨʔδ΍ϝϞϦ͕ॏࢹ͞ΕΔ" ◦ ར༻ऀʹΑͬͯཁٻ͕ࡉ͔͘ҧ͏" # Ϋϥ΢υΛ׆༻ͯ͠໰୊Λղܾ͍͖͍ͯͨ͠" ◦ Ϋϥ΢υͷརศੑΛ͞ΒʹߴΊ͍ͯ͘" ◦ ར༻ࣄྫΛ૿΍͢͜ͱ͕ॏཁ
  45. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ࢀߟࢿྉ

  46. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔʹ͍ͭͯ #

    ϥΠϑαΠΤϯε෼໺ʹ͓͚Δσʔλϕʔε౷߹ʹࢿ͢Δٕज़ ։ൃΛ୲͏" # ج൫ٕज़։ൃ" # ηϚϯςΟοΫ΢Σϒٕज़΍ࣗવݴޠॲཧΛ༻͍ͨϑΣσ Ϩʔγϣϯܕσʔλ౷߹ͷͨΊͷٕज़։ൃ΍ࠃࡍඪ४ͷࡦ ఆʹऔΓ૊Ή" # DDBJ࿈ܞ" # େن໛ήϊϜσʔλΛ࢝Ίͱ͢Δσʔλͷ׆༻ͷ
 ͨΊͷٕज़։ൃΛߦ͏
  47. Licensed under CC-BY 4.0 ©2016 Tazro Ohta (DBCLS) ϥΠϑαΠΤϯε౷߹σʔλϕʔεηϯλʔʹ͍ͭͯ #

    JSTͷηϯλʔ NBDC ͱڞಉͰσʔλϕʔεࣄۀΛਐΊΔ" # DDBJͱ͸ಉ͡૊৫ (ROIS, NII΋ಉ͡) Ͱ࿈ܞ͍ͯ͠Δ http://dbcls.rois.ac.jp/about