Introduction of next-generation sequencing applications, related public databases and resources

Introduction of next-generation sequencing applications, related public databases and resources

統合データベース講習会 AJACS肥後
次世代シーケンサーを活用した研究事例と、それを支える公共ツール・データベース

991f3366d9cc17386e6a66ef4abc6dbc?s=128

Tazro Inutano Ohta

January 23, 2014
Tweet

Transcript

  1. ࣍ੈ୅γʔέϯαʔΛ׆༻ͨ͠ݚڀࣄྫͱɺ ͦΕΛࢧ͑Δެڞπʔϧɾσʔλϕʔε +BOBUԽ݂ݚ %BUBCBTF$FOUFSGPS-JGF4DJFODF େాୡ࿠5B[SP0IUB Introduction of next-generation sequencing applications,

    related public databases and resources
  2. ߨशձͷ໨ඪ • NGSΛ࢖ͬͯͰ͖Δ͜ͱ/Ͱ͖ͳ͍͜ͱΛ஌Δ! • NGSΛར༻ͨ͠ݚڀࣄྫͱެ։σʔλΛݕࡧͰ͖ΔΑ͏ʹͳΔ! • NGSͷσʔλղੳͷਐΊํΛ஌Δ

  3. /(4Λ࢖ͬͯͰ͖Δ͜ͱͰ͖ͳ͍͜ͱ High-throughput sequencing: could it be the silver bullet?

  4. /(4ͰͰ͖Δ͜ͱͰ͖ͳ͍͜ͱ • NGSͱ͸ͦ΋ͦ΋ͳΜͳͷ͔! • NGSͷγʔέϯεݪཧ! • ػցʹ͍ͭͯ! • ݚڀ෼໺΁ͷԠ༻ (ΞϓϦέʔγϣϯ)!

    • ҰൠతͳNGSΛར༻ͨ͠ݚڀͷྲྀΕ! • ެڞNGSσʔλϕʔε SRAʹ͍ͭͯ
  5. /(4ͱ͸ͦ΋ͦ΋ͳΜͳͷ͔ (next)n-generation

  6. /(4ͱ͸ͦ΋ͦ΋ͳΜͳͷ͔ • Next-generation Sequencing! • “The high demand for low-cost

    sequencing has driven the development of high-throughput sequencing (or next-generation sequencing) technologies that parallelize the sequencing process, producing thousands or millions of sequences concurrently” from http://en.wikipedia.org/wiki/Next- generation_sequencing#Next-generation_methods! ! • High-throughput Sequencing, Massively Parallel Sequencing..! • ैདྷ๏ͱͷൺֱͰ͋Γ୯ҰͷγʔέϯεݪཧΛࢦ͢΋ͷͰ͸ͳ͍! • ୈnੈ୅ͱ͍͏ݺͼํΛ͢Δਓ΋͍Δ
  7. /(4ͷγʔέϯγϯάݪཧʹ͍ͭͯ How it works

  8. /(4ݪཧҰཡ֤γʔέϯαʔϕϯμʔͷϥΠϯφοϓ • Roche 454! • Illumina HiSeq/MiSeq! • LifeTech SOLiD!

    • LifeTech IonTorrent/IonProton! • PacBio RS! • ͦͷଞ! • Oxford Nanopore MinIon/GridIon! • GnuBio
  9. 3PDIF XXXDPN

  10. 3PDIF XXXDPN • PyroSequencing! • http://454.com/products/technology.asp! ! • 2016೥ʹαϙʔτΛऴྃ͢Δ͜ͱ͕༧ࠂ͞Ε͍ͯΔ (2013/10)!

    • http://www.genomeweb.com/sequencing/roche-shutting-down-454- sequencing-business! • http://www.bio-itworld.com/BioIT_Article.aspx?id=131053
  11. *MMVNJOB)J4FR.J4FR XXXJMMVNJOBDPN

  12. *MMVNJOB)J4FR.J4FR XXXJMMVNJOBDPN • Sequence By Synthesis (SBS)! • http://res.illumina.com/documents/products/techspotlights/ techspotlight_sequencing.pdf!

    ! • Update: Achieved $1000Genome w/ HiSeq X Ten, announced (15 Jan. 2014)! • HiSeq X! • http://nextgenseek.com/2014/01/how-does-a-single-hiseq-x- compares-with-hiseq-2500/! ! • NextSeq 500! • http://nextgenseek.com/2014/01/how-does-nextseq-500-compare- with-miseq-and-hiseq/
  13. ͓஋ஈ https://twitter.com/dritoshi/status/426178011080048641

  14. -JGF5FDIc"QQMJFE#JPTZTUFNT40-J% IUUQXXXBQQMJFECJPTZTUFNTDPNBCTJUFVTFOIPNFBQQMJDBUJPOTUFDIOPMPHJFTTPMJEOFYUHFOFSBUJPOTFRVFODJOHIUNM

  15. • Sequence By Ligation! • SOLiD: Sequencing Oligonucleotide Ligation and

    Detection! • http://www.appliedbiosystems.com/absite/us/en/home/applications- technologies/solid-next-generation-sequencing/next-generation-systems/ solid-sequencing-chemistry.html! ! • IonTorrent/IonProton ʹஔ͖׵ΘΓͭͭ͋Δ -JGF5FDIc"QQMJFE#JPTZTUFNT40-J% IUUQXXXBQQMJFECJPTZTUFNTDPNBCTJUFVTFOIPNFBQQMJDBUJPOTUFDIOPMPHJFTTPMJEOFYUHFOFSBUJPOTFRVFODJOHIUNM
  16. -JGF5FDI*PO5PSSFOU*PO1SPUPO IUUQTXXXMJGFUFDIOPMPHJFTDPNVTFOIPNFMJGFTDJFODFTFRVFODJOHOFYUHFOFSBUJPOTFRVFODJOHIUNM

  17. • Semiconductor Sequencing Technology! • http://www.lifetechnologies.com/us/en/home/life-science/sequencing/next- generation-sequencing/ion-torrent-next-generation-sequencing- technology.html! ! •

    ൒ಋମνοϓͷੑೳ޲্͕ͦͷ··γʔέϯαʔͷੑೳ޲্ʹ -JGF5FDI*PO5PSSFOU*PO1SPUPO IUUQTXXXMJGFUFDIOPMPHJFTDPNVTFOIPNFMJGFTDJFODFTFRVFODJOHOFYUHFOFSBUJPOTFRVFODJOHIUNM
  18. 1BDJpD#JPTDJFODFT1BD#JP34** XXXQBDJpDCJPTDJFODFTDPN

  19. • SMRT Technology! • http://www.pacificbiosciences.com/products/smrt-technology/! ! • ਺ઍϕʔεͷ௕͍ϦʔυΛಡΉ͜ͱͷͰ͖Δ࠷ॳͷNGS! • ଞͷγʔέϯαʹൺ΂ਫ਼౓͕௿͍!

    • Τϥʔิਖ਼ͷπʔϧͳͲΛར༻͢Δ͜ͱͰվળՄೳ! • IlluminaͳͲͷߴਫ਼౓ͳγʔέϯαͱ૊Έ߹Θͤͯิਖ਼΋ 1BDJpD#JPTDJFODFT1BD#JP34** XXXQBDJpDCJPTDJFODFTDPN
  20. 6QDPNJOH4FRVFODFST0YGPSE/BOPQPSF.JO*0/(SJE*PO XXXOBOPQPSFUFDIDPN

  21. • nanopore sensing technology! • https://www.nanoporetech.com/technology/introduction-to-nanopore-sensing/ introduction-to-nanopore-sensing! ! • MinIon:

    USBن֨Ͱίϯϐϡʔλʹ઀ଓ͢Δখܕγʔέϯαʔ! • GridIon: ௒ฒྻܕφϊϙΞγʔέϯαʔ 6QDPNJOH4FRVFODFST0YGPSE/BOPQPSF.JO*0/(SJE*PO XXXOBOPQPSFUFDIDPN
  22. 6QDPNJOH4FRVFODFSTHOVCJP HOVCJPDPN

  23. 6QDPNJOH4FRVFODFSTHOVCJP HOVCJPDPN • Beta Systems! • Clinical Sequencing! • Sample-to-Answer

    instrument
  24. 4FRVFODFSTCZ3FBETQFD ͬ͘͟Γ Ϧʔυ਺ Ϧʔυ௕ JMMVNJOB)J4FR JMMVNJOB.J4FR 3PDIF 1BD#JP34 *PO5PSSFOU*PO1SPUPO ୹࠯௒ฒྻܕ

    ϕϯντοϓܕ ௕࠯ฒྻܕ
  25. 4FRVFODJOH3FBE-BZPVUʹ͍ͭͯ • Single-end ͔ Paired-end (Mate-Pair)͔ʁ! • mapping/assembleͷਫ਼౓ͱՁ֨Λߟྀͯ͠બ୒ IUUQSFTJMMVNJOBDPNJNBHFTUFDIOPMPHZQBJSFEFOETFRVFODJOHpHVSFHJG

  26. ݚڀ෼໺΁ͷԠ༻ Various sequencing application

  27. ݚڀ෼໺΁ͷԠ༻4FRVFODJOH"QQMJDBUJPO 4UVEZ5ZQFͱ-JCSBSZ4USBUFHZ͕ࠞಉ͕ͪ͠ͳͷͰ஫ҙ • ୅දతͳStudyType! • Whole Genome Sequencing! • Exome!

    • Population Genomics! • Transcriptome! • Epigenetics (Gene regulation study)! • Metagenomics! • Other
  28. ݚڀ෼໺΁ͷԠ༻4FRVFODJOH"QQMJDBUJPO BQQMJDBUJPO͝ͱͷҧ͍ • ϦϑΝϨϯεήϊϜͷ༗ແʹΑͬͯγʔέϯεޙͷσʔλॲཧ͕ҟͳΔ! • Mappingܕ (Splice Alignmentܕ)! • γʔέϯα͔ΒಘΒΕͨ୹͍Ԙج഑ྻ(Ϧʔυ)Λ


    ϦϑΝϨϯεήϊϜʹ഑ྻ૬ಉੑΛݩʹష͍ͬͯ͘(mapping)! • de novo Assembleܕ! • Ϧʔυಉ࢜ͷ഑ྻ૬ಉੑΛݩʹ୹͍ϦʔυΛܨ͍Ͱ
 ௕͍Ϧʔυʹassemble͢Δ
  29. 8IPMF(FOPNF4FRVFODJOH • de novo genome sequencing! • “Sequencing of a

    single organism”! • ৽نήϊϜ! • ήϊϜͷಡ·Ε͍ͯͳ͍ੜ෺ͷήϊϜΛθϩ͔ΒNGSͰߏங͢Δ! • Assembleܕ! • Resequencing! • “Sequencing of a sample with respect to a reference”! • ήϊϜͷಡ·Εͨੜ෺ʹ͍ͭͯଟܕղੳ΍ൺֱήϊϜղੳΛߦ͏! • Mappingܕ
  30. &YPNF4FRVFODJOH • “The study investigates the exons of the genome”!

    • 1ݸମ͋ͨΓͷγʔέϯεྔ͕গͳͯ͘ࡁΉ! • ҰԘجଟܕͷ৘ใͳͲ͕ॏཁͳ͜ͱ͕ଟ͍ͷͰγʔέϯεਫ਼౓͕ॏཁ! • Illumina TruSeq, Agilent SureSelect ͳͲͷࢼༀ͕୅දత! • Update: Illumina TruSeq͸Nextera Rapid Capture Exomeʹมߋ! • http://www.illuminakk.co.jp/products/truseq_exome_enrichment_kit.ilmn! • http://www.illuminakk.co.jp/products/nextera-rapid-capture-exome-kits.ilmn! • ࣬ױݪҼҨ఻ࢠͷ୳ࡧʹ༻͍ΒΕΔ͜ͱ͕ଟ͍
  31. 1PQVMBUJPO(FOPNJDT • “Study of populations and evolution through genomics”! •

    ूஂҨ఻ֶɼ౷ܭӸֶͳͲ! • جຊతʹ͸ର৅͸ώτɼmappingܕͷղੳ! • Exomeͱಉ͘͡γʔέϯεਫ਼౓͕ॏཁ! • 1000 Genomes ProjectͳͲͷࠃࡍϓϩδΣΫτ͕୅දత! • www.1000genomes.org
  32. (FOPNFT1SPKFDU HFOPNFTPSH

  33. 5SBOTDSJQUPNF • “Sequencing and characterization of transcription elements”! • RNA-Seq,

    miRNA-Seq, meta-transcriptomeͳͲ! • ϦϑΝϨϯεʹMapping͢Δ৔߹ͱde novo assembleΛ͢Δ৔߹! • ൃݱྔΛఆྔ͢Δ: େྔͷϦʔυΛmapping! • ϦϑΝϨϯεήϊϜ͕ͳ͍ੜ෺Ͱ΋ൃݱղੳΛߦ͏: assemble! • ϚΠΫϩΞϨΠͱΑ͘ൺֱ͞ΕΔԠ༻ٕज़! • ϝϦοτ: ఆྔੑͷߴ͞ɼμΠφϛοΫϨϯδͷ޿͞ɼղ૾౓ͷߴ͞! • σϝϦοτ: ϚΠΫϩΞϨΠͱػցͷ࢖͍ํɼղੳͷ࢓ํ͕ҧ͏! • ඍྔԽϒʔϜ͕౸དྷ͍ͯ͠Δ! • Quartz-Seq, Smart-SeqͳͲͷख๏Ͱ1ࡉ๔RNA-Seq
  34. ඍྔ3/"4FR2VBSU[4FR!ཧݚೋ֊ಊݚ CJUBDDDSJLFOKQQSPUPDPMT

  35. $BQ3X$-*1TFR େن໛ͳ3/"ͷσʔλ͔Β഑ྻ͚ͩͰͳ͘ߏ଄Λ໌Β͔ʹ͢Δ IUUQHFOPNFCJPMPHZDPN3

  36. &QJHFOFUJDT (FOFSFHVMBUJPOTUVEZ • “Cellular differentiation study/Study of gene expression regulation”!

    • ChIP-Seq! • “Direct sequencing of chromatin immunoprecipitates”! • Bisulfite-Seq! • “Sequencing following treatment of DNA with bisulfite to convert cytosine residues
 to uracil depending on methylation status"! • DNase-Seq! • “Sequencing of hypersensitive sites, or segments of open chromatin 
 that are more readily cleaved by DNaseI."! • FAIRE-Seq! • “Formaldehyde-Assisted Isolation of Regulatory Elements"! • etc, etc..! • શͯϦϑΝϨϯεήϊϜ΁ͷmappingͰఆྔΛߦ͏
  37. .FUBHFOPNJDT • “Sequencing of a community”! • ώτڞੜࡉە(ޱ಺ɼ௎಺ɼetc.)! • ؀ڥϝλήϊϜ!

    • େؾ! • ւ༸! • ౔৕! • ৯඼! • de novo assembleΛߦ͏৔߹͕΄ͱΜͲ! • ϦʔυΛܨ͍ͩͷͪʹBLASTͰΞϊςʔγϣϯΛࢼΈΔ! • աڈͷݚڀࣄྫʹ͍ͭͯMicrobeDB.jp͕ৄ͍͠! • microbedb.jp
  38. .JDSPCF%#+1 NJDSPCFECKQ

  39. 0UIFS • Pooled Clone Sequencing! • “The study is sequencing

    clone pools (BACs, fosmids, other constructs)”! • Synthetic genomics! • “Sequencing of modified, synthetic, or transplanted genomes”
  40. όʔίʔυγʔέϯε GSPN߬฼ίϩΩΞϜ IUUQZFBTUDPMMPRVJVNXPSEQSFTTDPN

  41. όʔίʔυγʔέϯε GSPN߬฼ίϩΩΞϜ IUUQZFBTUDPMMPRVJVNXPSEQSFTTDPN

  42. ΞϓϦέʔγϣϯผʹඞཁͳϦʔυεϖοΫ application / ࣮ݧछ total bases / ૯Ԙج਺ read length

    / Ϧʔυ௕ read number (M) / Ϧʔυ਺ ώτήϊϜϦγʔέϯε 90-150Gb 2x100 900-1500 λʔήοτϦγʔέϯε <1Gb 2x100 10 exome sequence 5~7Gb 2x100 70 RNA-Seq 5Gb 2x100 50 TSS-Seq 1Gb 1x50 20 small RNA 0.35Gb 1x35 >10 ඍੜ෺ήϊϜ >150Mb 2x100 >1.5 ਅ֩ੜ෺ήϊϜ >4Gb 2x100 >40 Bisulfite-Seq 90-150Gb 2x100 900-1500 ChIP-Seq >6Gb 1x100 60 ࡉ๔޻ֶผ࡭࣍ੈ୅γʔέϯαʔ໨తผΞυόϯετϝιουQΑΓҾ༻ ஫ର৅ͷήϊϜαΠζͳͲͰ਺ࣈ͕มΘΔ͜ͱ͕͋Γ·͢ɽ·ͨɼطʹ৘ใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ΋͋Γ·͢
  43. 4FRVFODFSTCZ3FBETQFD ͬ͘͟Γ Ϧʔυ਺ Ϧʔυ௕ JMMVNJOB)J4FR JMMVNJOB.J4FR 3PDIF 1BD#JP34 *PO5PSSFOU*PO1SPUPO ୹࠯௒ฒྻܕ

    ϕϯντοϓܕ ௕࠯ฒྻܕ
  44. ݚڀ෼໺΁ͷԠ༻4FRVFODJOH"QQMJDBUJPO • ख๏͸ઈ͑ͣਐา͍ͯ͠Δ! • ࠷৽ͷϨϏϡʔΛಡΉͷ͕Ұ൪! • Nature Reviews Genetics: Application

    of next-generation sequencing! • www.nature.com/nrg/series/nextgeneration/! ! • ॻ੶ͳͲͰͷ৘ใ΋ঃʑʹग़࢝Ί͍ͯΔ! • ৘ใ͕ݹ͘ͳ͍ͬͯΔՄೳੑ΋! • ݚڀऀίϛϡχςΟͰ࣮ࡍʹ΍͍ͬͯΔਓΛั·͑Δͷ΋͓͢͢Ί! • NGSݱ৔ͷձ
  45. ࣍ੈ୅γʔΫΤϯαʔ໨తผΞυόϯετϝιου ४උಋೖ ώτήϊϜղੳ Ҩ఻ࢠൃݱ੍ޚղੳ ৽نήϊϜ഑ྻܾఆ ΤϐδΣωςΟΫεղੳ ϝλήϊϜղੳ ήϊϜߏ଄ղੳ σʔλղੳπʔϧˍอଘ ౷߹ղੳ

    એ఻Ͱ͕͢པ·Ε͍ͯΔΘ͚Ͱ΋ചΕΔͱ๻ʹ͓͕ۚೖΔΘ͚Ͱ΋͋Γ·ͤΜ
  46. ͓ՈͰͰ͖Δ.BD#PPLͰ΍Δ࣍ੈ୅γʔέϯεσʔλղੳ ిࢠॻ੶

  47. /(4ݱ৔ͷձ XXXOHTpFMEPSH

  48. /(4ݱ৔ͷձ UXJUUFSDPNOHTpFME

  49. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ NGS practical workflow

  50. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ /(4Λ࢖͏ݚڀ͸Կ͕େมͳͷ͔ αϯϓϦϯά ϥΠϒϥϦϓϨοϓ γʔέϯγϯά σʔλղੳ • Πϝʔδ! • ػց͕ߴ͍!

    • σʔλ͕୔ࢁग़Δ! • σʔλղੳ͕Α͘Θ͔Βͳ͍
  51. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ /(4Λ࢖͏ݚڀ͸Կ͕େมͳͷ͔ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ • ࣮ࡍ! • ҿΈձ͕ԕ͍

    ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ
  52. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ /(4Λ࢖͏ݚڀ͸Կ͕େมͳͷ͔ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ • ʮޙ໭Γ͕Ͱ͖ͳ͍ʯϙΠϯτ͕͋Δ! • γʔέϯεͷ݁Ռ͕ѱ͍ͱσʔλղੳͰ͸Ͳ͏ʹ΋ͳΒͳ͍!

    • ࠶ղੳʹ͕͔͔࣌ؒΔ৔߹ʹϦόΠζͷظݶʹؒʹ߹Θͳ͍৔߹͕͋Δ ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ
  53. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ /(4Λ࢖͏ݚڀ͸Կ͕େมͳͷ͔ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ • ༧උ࣮ݧɼ֬ೝ࣮ݧΛؚΊͨσβΠϯ͕ඇৗʹॏཁ! • DNAΛߴ७౓Ͱௐ੔͢ΔͳͲ΢Σοτͷٕज़΋ඞཁ!

    • PCRόΠΞεͷͳ͍֬ೝ࣮ݧΛσβΠϯ͓ͯ͘͠ͳͲͷ४උ΋ඞཁ ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ
  54. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ /(4Λ࢖͏ݚڀ͸Կ͕େมͳͷ͔ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ • ଟ͘ͷδϟʔφϧͰ࿦จ౤ߘલʹσʔλͷެ։͕ٻΊΒΕΔ! • NGSσʔλͷެ։͸ҙ֎ͱେม!

    • ͦ΋ͦ΋Ͳ͜Ͱެ։͢Ε͹͍͍ͷ͔ʁ! • NGSͷެڞσʔλϕʔε͕͋Γ·͢ ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ
  55. ެڞ/(4σʔλϕʔε43"ʹ͍ͭͯ formally Short Read Archive, current Sequence Read Archive

  56. δϟʔφϧͷΨΠυϥΠϯʹσʔλެ։͸໌ه͞Ε͍ͯΔ http://www.plosone.org/static/publication#data%20report

  57. 4FRVFODF3FBE"SDIJWF 43" • NGSͷσʔλϨϙδτϦ • NCBI, EBI, DDBJͷ3ہͰڞಉӡ༻͞ΕΔ • γʔέϯα͔ΒಘΒΕͨੜͷ഑ྻσʔλ


    (લॲཧΛߦ͍ͬͯͳ͍fastqϑΝΠϧ) ͕ొ࿥͞ΕΔ • ଏʹݴ͏NGSͷσʔλ͸શͯ͜ͷDBʹొ࿥͢Δ͜ͱʹͳ͍ͬͯΔ • γʔέϯαͷछྨ (Illumina, Roche, LifeTech, etc) ͸໰Θͳ͍ • ΞϓϦέʔγϣϯͷछྨ (DNA-Seq, RNA-Seq, ChIP-Seq, etc.) ΋໰Θͳ͍ • ੜ෺छ΋໰Θͳ͍
  58. σʔλͷϑΥʔϚοτʹ͍ͭͯ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ • ֤γʔέϯα͔Βग़Δσʔλ͸ҰൠతʹfastqϑΥʔϚοτʹม׵͞ΕΔ! • http://en.wikipedia.org/wiki/FASTQ_format! •

    mapping͞Εͨσʔλ͸.sam/.bamʹɺassemble͞Εͨσʔλ͸.fastaʹ! • ໨తผղੳޙ͸ͦΕͧΕͷϑΥʔϚοτʹม׵͞ΕՄࢹԽͳͲʹ༻͍ΒΕΔ ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ GBTUR TBNCBN GBTUB H⒎WDGXJHFUD
  59. %#$-4େن໛σʔλٕज़։ൃ෦໳/(4ಛघ෦ୂ • NGSͷσʔλ૿ՃʹઌखΛଧͭ໨తͰ࢝ಈ • ެڞͷσʔλϕʔεʹ͜Ε·Ͱʹͳ͍ڊେͳσʔλ͕େྔʹొ࿥͞ΕΔΑ͏ʹͳΔ • େن໛ͳσʔλʹಛ༗ͷ໰୊Λղܾ͢ΔͨΊͷݚڀɾٕज़։ൃΛߦ͏ • DDBJͱڠྗͯ͠SRAͷ׆ಈΛٕज़໘Ͱαϙʔτ

  60. %#$-443" IUUQTSBECDMTKQ • σʔλϕʔεͷछʑͷ౷ܭ৘ใΛऔಘɾఏڙ • ͦ΋ͦ΋ͲΜͳσʔλ͕ೖ͍ͬͯΔ͔ʁ • σʔλݕࡧͷͨΊͷݕࡧγεςϜͷߏங • ଞͷDBͱͷ౷߹

    (PubMed΍PMCͳͲͷจݙ৘ใɼtaxonomy, ࣬ױ৘ใͳͲ) • ݸผͷ഑ྻ৘ใΛݩʹͨ͠γʔέϯγϯάٕज़ͷಈ޲ௐࠪͳͲ
  61. %#$-443" IUUQTSBECDMTKQ

  62. ࿦จͱެ։σʔλͷϚονϯά IUUQTSBECDMTKQDHJCJOQVCMJDBUJPODHJ

  63. ʮ࿦จʹ࢖ΘΕͨσʔλ͚͕ͩొ࿥͞ΕΔʯΘ͚Ͱ͸ͳ͍ 0 37500 75000 112500 150000 total publication #submission 0

    50000 100000 150000 200000 total publication #sample 0 100000 200000 300000 400000 total publication #run 115440 3059 194338 31787 376904 51202 26.5% 16.4% 13.6%
  64. ͲͷγʔέϯαΛ࢖͑͹࿦จʹͳΔͷ͔ total publication 148946 Illumina HiSeq 2000 16481 Illumina Genome

    Analyzer II 65158 Illumina Genome Analyzer II 10944 Illumina Genome Analyzer 33042 454 GS FLX Titanium 5314 454 GS FLX Titanium 22010 Illumina Genome Analyzer 5307 454 GS FLX 18290 Illumina Genome Analyzer IIx 4659 Illumina Genome Analyzer IIx 16361 454 GS FLX 3973 Illumina HiSeq 2000 5495 AB SOLiD System 2.0 1388 PacBio RS 4726 unspecified 575 AB SOLiD System 3.0 4300 PacBio RS 561 Illumina HiSeq 1000 3911 Illumina MiSeq 340 Helicos HeliScope 0 37500 75000 112500 150000 Illumina HiSeq 2000 Illumina Genome Analyzer Illumina MiSeq Helicos HeliScope total 0 5000 10000 15000 20000 Illumina HiSeq 2000 454 GS FLX AB SOLiD System 3.0 publication
  65. ʮήϊϜΛಡΊ͹࿦จʹͳΔʯ͸ਅͳͷ͔ total publication 267185 GENOMIC 33825 GENOMIC 38804 TRANSCRIPTOMIC 12892

    TRANSCRIPTOMIC 16731 METAGENOMIC 1913 METAGENOMIC 4412 OTHER 1481 OTHER 2912 SYNTHETIC 295 VIRAL RNA 941 VIRAL RNA 119 SYNTHETIC 290 METATRANSCRIPTOMIC 48 METATRANSCRIPTOMIC 0 75000 150000 225000 300000 GENOMIC METAGENOMIC SYNTHETIC METATRANSCRIPTOMIC total 0 10000 20000 30000 40000 GENOMIC METAGENOMIC SYNTHETIC METATRANSCRIPTOMIC publication 80.6% 11.7% 5.0% 1.3% 0.9% 66,9% 25.5% 3.8% 2.9%
  66. ొ࿥͞Εͨσʔλ͸࠶ར༻͞ΕΔ 0 1000 2000 3000 4000 total #PMID > 1

    #SRAID 3059 204 id count title SRA008679 SRA030426 SRA024198 SRA008091 SRA000271 50 HapMap project 48 Human Prostate Cancer using Next Generation RNA Sequencing (human) 8 Metagenomic analysis of marine microbes isolated during the Global Ocean Sampling Expedition 8 Human 1000 genomes 7 HapMap project
  67. ࿦จͰ࢖ΘΕ͍ͯΔެ։σʔλΛޮ཰Α͘ݕࡧ͢Δ %#$-443".FUBEBUB4FBSDI IUUQTSBECDMTKQTFBSDI

  68. ࿦จͰ࢖ΘΕ͍ͯΔެ։σʔλΛޮ཰Α͘ݕࡧ͢Δ %#$-443".FUBEBUB4FBSDI IUUQTSBECDMTKQTFBSDI • ʮσʔλͷ࣭͸ղੳͰ͸Ͳ͏ʹ΋ͳΒͳ͍ʯͷ͸ಉ͡! • σʔλͷ࣭ͷ൑அʹ͸࣮ݧ৚݅ͳͲͷϝλ৘ใͷॆ࣮͕ඞཁ • େྔͷσʔλ͔Βޮ཰Α͘ඞཁͳσʔλΛ୳͞ͳͯ͘͸ͳΒͳ͍ •

    ʮ໨తͱ͢Δσʔλ͕Ͳͷ͘Β͍ొ࿥͞Ε͍ͯΔ͔ʯΛՄࢹԽ • αΠζͷେ͖ͳσʔλ͸DLɾల։ʹ͕͔͔࣌ؒΔ • ʮϋζϨʯΛҾ͖ͨ͘ͳ͍ • Ϧʔυ৘ใͷ௥Ճ (Ϧʔυ਺ɼϦʔυ௕ɼΤϥʔ཰ɼetc.) • ༧ΊΫΦϦςΟΛ֬ೝ͢Δ͜ͱͰQCॲཧΛলུ
  69. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   TFBSDIRVFSZWJSVTPOIUUQTSBECDMTKQTFBSDI

  70. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   ࿦จ෇͖ͷ໘നͦ͏ͳϓϩδΣΫτΛൃݟ

  71. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   ݟ͚ͭͨϓϩδΣΫτͷ࿦จͷϦϯΫʹඈͿ

  72. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   .BUFSJBMT.FUIPETͰγʔέϯγϯάͷهड़Λ୳͢

  73. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   ݩ࿦จൃݟ

  74. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   TFRVFODFʹؔ͢Δهड़Λ୳͢

  75. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   σʔλॲཧʹ͍ͭͯͷهड़Λ୳͢

  76. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   πʔϧͷ໊લͰ(PPHMFݕࡧ

  77. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   #SPBE*OTUJUVUFͰΦʔϓϯɾιʔεͷιϑτ΢ΣΞΛൃݟ

  78. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   ϓϩδΣΫτͷϖʔδʹ໭ͬͯσʔλΛݟΔ

  79. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   3VO*%ΛΫϦοΫͯ͠Ϧʔυͷ৘ใ 'BTU2$ͷ݁Ռ ΛݟΔ

  80. ࢼ͠ʹԿ͔୳ͯ͠ΈΔ   ໰୊͕ͳ͚Ε͹μ΢ϯϩʔυͷϦϯΫΛΫϦοΫͯ͠ϑΥʔϚοτΛબ୒ͯ͠μ΢ϯϩʔυ

  81. ิ଍৘ใ • Broad Instituteʹ͍ͭͯ • Ϙετϯʹ͋ΔੈքͰ΋༗਺ͷڊେγʔέϯεڌ఺ • ଞͷγʔέϯεηϯλʔͱͯ͠Sanger Institute @

    UK, BGI @ தࠃ ͳͲ • FastQCʹ͍ͭͯ • NGSσʔλͷQCιϑτ΢ΣΞͷ͏ͪ࠷΋ීٴ͍ͯ͠Δ΋ͷͷ1ͭ • SRA/SRA LiteϑΥʔϚοτʹ͍ͭͯ • ѹॖ͞ΕͨNGSσʔλ • .fastq΍γʔέϯαಠࣗͷੜσʔλϑΥʔϚοτʹల։͕Մೳ • SRA ToolkitΛར༻ͯ͠ѹॖ/ղౚΛߦ͏ • http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software
  82. ެ։σʔλΛར༻࣮ͯ͠ࡍͷݚڀͷྲྀΕΛ௥͍͔͚ͯΈΔ Standing on the shoulders of giants

  83. /(4Λར༻ͨ͠ݚڀͷҰൠతͳྲྀΕ ࣗ෼ͰγʔέϯγϯάΛߦ͏৔߹ ࣮ݧσβΠϯ ༧උ࣮ݧ αϯϓϦϯά %/"ௐ੔ ϥΠϒϥϦ࡞੒ γʔέϯε 2$ ϑΟϧλϦϯά

    NBQQJOHBTTFNCMF 2$ ໨తผղੳ ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ • ΍ͬͺΓҿΈձ͕ԕ͍
  84. ެ։σʔλΛར༻ͨ͠৔߹ͷҰൠతͳྲྀΕ ݕࡧର৅ͷઃఆ ݕࡧ ඞཁͳϝλ৘ใͷऩू μ΢ϯϩʔυ 2$ ϑΟϧλϦϯά NBQQJOHBTTFNCMF 2$ ໨తผղੳ

    ֬ೝ࣮ݧ ࿦จࣥච σʔλެ։ ࿦จ౤ߘ ϦόΠζ࠶ղੳ ΞΫηϓτ ҿΈձ ࿦จ౳ؔ࿈৘ใͷݕࡧ • ެ։σʔλΛར༻͢Δ໨త! • ࣗΒͷ࣮ݧͷิ଍৘ใͱͯ͠ར༻͢Δ! • ৽نʹ্ཱͪ͛ΔݚڀͷαʔϕΠͱͯ͠σʔλղੳ·Ͱߦ͏! • σʔλղੳπʔϧ/ύΠϓϥΠϯΛߏங͢Δ! • ҿΈձ͕ͪΐͬͱ͚ͩۙ͘ͳΔ(͔΋)
  85. ެ։σʔλΛ୳ͯ͠ΈΔ • ֤SRAϒϩʔΧʔͷ΢ΣϒαΠτ͔Β୳͢ • NCBI: http://www.ncbi.nlm.nih.gov/sra • EBI: http://www.ebi.ac.uk/ena/ •

    DDBJ: http://trace.ddbj.nig.ac.jp/DRASearch/ • DBCLS SRA͔Β୳͢ • http://sra.dbcls.jp/ • http://sra.dbcls.jp/search • GEO, ArrayExpress͔Β୳͢ • GEO: http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress: http://www.ebi.ac.uk/arrayexpress/ • PubMed͔Β୳͢ • αΠυόʔͷSRA/GEOͳͲͷϦϯΫ͔Β
  86. ඞཁͳϝλ৘ใΛऩू͢Δ  • ͲͷΑ͏ʹαϯϓϦϯάɾલॲཧΛߦ͔ͬͨʁ • ͲͷΑ͏ʹϥΠϒϥϦௐ੔Λߦ͔ͬͨʁ • Ͱ͖Ε͹ࢼༀͱͦͷόʔδϣϯ΋ • ࢖͍ͬͯΔγʔέϯα͸ʁ

    • Ͱ͖Ε͹όʔδϣϯ΋ • Ϧʔυ਺͸ʁ • single? paired? • pairedͳΒinsert௕͸ʁ • Ϧʔυ௕͸ʁ
  87. ඞཁͳϝλ৘ใΛऩू͢Δ  • λά͸ʁ • multiplex? • όʔίʔσΟϯάʁ • ෳ਺Runͷѻ͍͸ʁ

    • replicates? • ϥΠϒϥϦΛ෼ׂͯ͠γʔέϯεʁ • ͲͷΑ͏ͳσʔλॲཧΛߦ͔ͬͨʁ • mapping? assemble? • Ͳͷιϑτ΢ΣΞ/πʔϧΛ࢖͔ͬͨʁ • ࢖ͬͨϦϑΝϨϯεήϊϜͷόʔδϣϯ͸ʁ • ͲͷΑ͏ͳσʔλղੳΛߦ͔ͬͨʁ • ࢖ͬͨιϑτ΢ΣΞɾπʔϧɾύΠϓϥΠϯ͸ʁ
  88. σʔλΛμ΢ϯϩʔυ͢Δ • NCBI, EBI, DDBJͷͲ͔͜Βμ΢ϯϩʔυͯ͠΋ಉ͡ • Πϯλʔωοτճઢͷ௨৴଎౓Ͱબͼ·͠ΐ͏ • ϑΝΠϧͷେ͖͞͸਺ඦϝΨόΠτʙ਺ςϥόΠτ·Ͱ •

    ࣄલʹϑΝΠϧͷαΠζΛνΣοΫ͠·͠ΐ͏ • ৭ʑͱ޻෉ͯ͠μ΢ϯϩʔυΛศརʹͰ͖·͢ • μ΢ϯϩʔυ༻PCιϑτΛ࢖͏ • NCBI, DDBJ͕ఏڙ͢ΔAspera ConnectΛར༻͢Δ • Linux/Unix(MacͷTerminalΛؚΉ)ͰlftpͳͲͷίϚϯυΛ࢖͏
  89. σʔλॲཧσʔλղੳΛߦ͏ • σʔλॲཧ(mapping/assemble) • ๲େͳܭࢉࢿݯΛཁ͢Δ • ਺ΪΨόΠτͷήϊϜͷ৽نΞηϯϒϧʹ͸20TB΋ͷϝϞϦΛ࢖͏͜ͱ΋ • σʔλॲཧʹ਺೔ʙ਺ि͔͔ؒΔ͜ͱ΋ •

    σʔλղੳ • ίϚϯυΛଧͭλΠϓͷΦʔϓϯιʔεπʔϧ (Linux, Mac) • PCιϑτ΢ΣΞ • ϒϥ΢βͰಈ͘ΦϯϥΠϯɾΞϓϦέʔγϣϯ • σʔλॲཧ/ղੳΛ྆ํͰ͖ΔαʔϏεɾιϑτ΢ΣΞ • DDBJ Read Annotation Pipeline (http://p.ddbj.nig.ac.jp) • CLC Bio Genomics Workbench (http://www.clcbio.co.jp)
  90. %%#+3FBE"OOPUBUJPO1JQFMJOF QEECKOJHBDKQ

  91. %%#+3FBE"OOPUBUJPO1JQFMJOF QEECKOJHBDKQ

  92. %%#+3FBE"OOPUBUJPO1JQFMJOF QEECKOJHBDKQ

  93. l3ͰϚΠΫϩΞϨΠzʹೃછΈͷ͋Δํ͸ͪ͜Β౦େɾ໳ాઌੜ 3ͰԘج഑ྻղੳɺͰݕࡧ 

  94. 3͕޷͖Ͱ޷͖Ͱͨ·Βͳ͍ํ͸ͪ͜Β ཧݚɾೋ֊ಊ͞Μ  DBUIBDLJOHJTCFMJFWJOHPSHMFDUVSF

  95. ࠔͬͨͱ͖͸ • ΠϯλʔωοτͰݕࡧ͢Δ • 90෼ؤுͬͯ΋ݟ͚ͭΒΕͳ͔ͬͨΒਓʹฉ͘

  96. ࠔͬͨͱ͖͸ • ΠϯλʔωοτͰݕࡧ͢Δ • Google • SeqAnswers • seqanswers.com •

    BioStars • www.biostars.org • NGS Surfer’s Wiki • cell-innovation.nig.ac.jp/wiki/ • Sequence Read Archive User Reference • github.com/inutano/sra_metadata_toolkit/wiki
  97. ࠔͬͨ࣌͸ͱΓ͋͑ͣ͜͜Λݕࡧ4&2BOTXFST IUUQTFRBOTXFSTDPN

  98. ಈըͰͷνϡʔτϦΞϧ$-$(FOPNJD8PSLCFODIͷ࢖͍ํ UPHPUWECDMTKQ

  99. ࠔͬͨͱ͖͸ • 90෼ؤுͬͯ΋ݟ͚ͭΒΕͳ͔ͬͨΒਓʹฉ͘ • NGSݱ৔ͷձ ϝʔϦϯάϦετ • BioStars • ϥΠϑαΠΤϯεQA

    • http://qa.lifesciencedb.jp • twitter
  100. ೔ຊޠ2"ͳΒ͜͜ϥΠϑαΠΤϯε2" IUUQRBMJGFTDJFODFECKQ

  101. 4VNNBSZ ͓͔ͭΕ͞·Ͱͨ͠

  102. l4FRVFODJOHJT'3&& TPMFU`TTFRVFODFFWFSZUIJOHz • NGSΛར༻ͨ͠ݚڀ͸ҰےೄͰ͸͍͔ͳ͍ • ͔͜͠͠Ε·Ͱʹ͸ಘΒΕͳ͔ͬͨݱ৅͕؍࡯Ͱ͖Δ • ຊ౰ʹNGSΛ࢖͏΂͖͔ʁNGSͰԿΛݟΔͷ͔ʁͷσβΠϯ͕େࣄ • ෼͔Βͳ͍͜ͱ͸ΠϯλʔωοτͰݕࡧ͢Δ͔ਓʹฉ͚͹ղܾ͠·͢

    • “ޙ໭ΓͰ͖ͳ͍”ϙΠϯτʹಥͬࠐΉલʹ໰୊Λղܾ͓ͯ͘͠