Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Organellar genomes of
 white spruce (Picea glauca): Assembly and annotation

Organellar genomes of
 white spruce (Picea glauca): Assembly and annotation

Conifer Genome Summit 2014

45216dabbd95ea1087fa68234932eb62?s=128

Shaun Jackman

June 17, 2014
Tweet

Transcript

  1. Organellar genomes of
 white spruce (Picea glauca) Assembly and annotation

    Conifer Genome Summit 2014 Shaun Jackman @sjackman 2014-06-17 1 Shaun D Jackman1, Anthony Raymond1, Ben Vandervalk1, Hamid Mohamadi1, René Warren1, Stephen Pleasance1,
 Robin Coope1, Macaire MS Yuen2, Christopher Keeling2, Carol Ritland2, Jean Bousquet3, Alvin Yanchuk4,
 Kermit Ritland2, John MacKay3, Steven JM Jones1, Jörg C Bohlmann2 and İnanç Birol1 (1) BC Cancer Agency, Genome Sciences Centre, Vancouver, BC, Canada, (2) University of British Columbia, Vancouver, BC, Canada,
 (3) Univesité Laval, Quebec, QC, Canada, (4) British Columbia Ministry of Forests, Victoria, BC, Canada Photo credit: Joseph O'Brien, USDA Forest Service, bugwood.org
  2. Organellar genomes of white spruce (Picea glauca) Assembly and annotation

    Conifer Genome Summit 2014 Shaun Jackman @sjackman 2014-06-17 2
  3. Organellar Sequence in the Genome Assembly Courtesy of Tony Raymond

    @tgjraymond 3 ~6 Mbp
  4. Plastid Genome Photo credit Kristian Peters

  5. The plastid genome Assembled one lane of MiSeq using ABySS.

    Six scaffolds with depth of coverage >70x and length >5 kbp reconstruct the plastid 5 Six plastid sequences
  6. Plastid  Genome  Finishing • 1/140 or 0.7% of reads are

    plastidial—80x coverage • 99.2% identity to the Norway spruce plastid • Genome finishing using additional Illumina and PacBio data by René Warren, Daniel Paulino and Greg Taylor 6 Six contigs in one circular scaffold One circular contig after finishing
  7. Annotated with MAKER using Norway spruce genes for evidence

  8. Mitochondrial Genome Illustration courtesy of Gary Carlson
 http://gcarlson.com/

  9. k-mer coverage vs GC content Assembled one lane of
 HiSeq

    data using ABySS 9 Putative! mitochondrion
  10. Classifying the sequences using k-means clustering 10

  11. Mitochondrial  Genome  Assembly • Assembled one lane of HiSeq data

    using ABySS • 8.4 Mbp in 1001 scaffolds larger than 2 kbp with a 29 kbp N50 • Separated putative mitochondrial sequence by
 length, depth of coverage and GC content • 6.0 Mbp in 223 scaffolds larger than 2 kbp with a 39 kbp N50 • Scaffold using one lane of mate-pair HiSeq reads • 5.9 Mbp in 61 scaffolds larger than 2 kbp with a 287 kbp N50 • The largest scaffold is 598 kbp • 1/350 or 0.3% of reads are mitochondrial—30x coverage 11
  12. Annotated using all complete viridiplantae mitochondrial genomes in NCBI GenBank

    as evidence Note: the mitochondrion is assembled in 61 scaffolds and has been artificially circularized for this figure Annotated using MAKER
  13. Mitochondrial  Genome  Annotation • 54 coding genes composing 50 kbp

    (<1%) of the genome • 23 transfer RNA genes and 4 ribosomal RNA genes • Repeats composing 400 kbp (~7%) of the genome 13 0   ! ! ! 10   ! ! ! 20   ! ! ! 30   ! ! ! 40   ! ! ! 50 Size  (kb) ATP synthase ! Cythochrome c maturation Complex III (ubichinol cytochrome c reductase) Complex IV (cytochrome c oxidase) DNA polymerase Maturase Membrane targeting and translocation ! Complex I (NADH dehydrogenase) ! ! DNA-dependent RNA polymerase Ribosomal proteins (LSU) ! Ribosomal proteins (SSU) ! Complex II (succinate dehydrogenase) Transfer RNAs Unknown ! UNK ! Simple repeat ! rRNA ! NHF ! LTR ! Low complexity ! LINE Copia Gypsy Jockey R1
  14. Summary  of  Results 14 Plastid Mitochondrion Number of scaffolds Finished

    61 scaffolds Scaffold N50 287 kbp Genome size 123 kbp 5.92 Mbp Coding genes 74 54 Transfer RNA genes 36 23 Ribosomal RNA genes 4 4 Coding gene content 64 kbp (52%) 50 kbp (<1%) Repeat content - 400 kbp (7%)
  15. Organellar genomes of
 white spruce (Picea glauca) Assembly and annotation

    Conifer Genome Summit 2014 Shaun Jackman @sjackman 2014-06-17 15 Shaun D Jackman1, Anthony Raymond1, Ben Vandervalk1, Hamid Mohamadi1, René Warren1, Stephen Pleasance1,
 Robin Coope1, Macaire MS Yuen2, Christopher Keeling2, Carol Ritland2, Jean Bousquet3, Alvin Yanchuk4,
 Kermit Ritland2, John MacKay3, Steven JM Jones1, Jörg C Bohlmann2 and İnanç Birol1 (1) BC Cancer Agency, Genome Sciences Centre, Vancouver, BC, Canada, (2) University of British Columbia, Vancouver, BC, Canada,
 (3) Univesité Laval, Quebec, QC, Canada, (4) British Columbia Ministry of Forests, Victoria, BC, Canada Photo credit: Joseph O'Brien, USDA Forest Service, bugwood.org
  16. Fin

  17. Further  Work • Improve the mitochondrial assembly by scaffolding
 and

    closing gaps • Investigate how the mitochondrial genome grew
 to such a large size • Look for evidence of transfer of DNA between the nuclear and mitochondrial genomes 17
  18. 500-bp MiSeq reads Courtesy of Robin Coope @robincoope 18 Cartridge

    splitter MiSeq-XL cartridge base MiSeq-XL reagent tray & lid Screws for reagent tray lid Splash guard
  19. Merge overlapping reads FastQC plot of base quality Courtesy of

    Tony Raymond @tgjraymond 19
  20. Connecting Paired-end Reads 20 2x250 2x150 2x300 400 bp 500

    bp 600 bp Exists? Bloom Filter Courtesy of İnanç Birol
  21. Plastid  Genome  Sequence • 9.4 million MiSeq reads of 300

    bp • Merged the overlapping paired reads • 3.0 million merged reads of 492 bp median • Assembled these reads using ABySS • Separated six plastidial sequences by
 length and depth of coverage 21
  22. Mitochondrial  Genome  Sequence • 267 million HiSeq reads of 150

    bp • Filled the gap between the paired-end reads using a Bloom filter de Bruijn Graph (ABySS-connectpairs) • 1.4 million merged reads of 465 bp median • Assembled these reads using ABySS • 377 thousand merged reads (1/350 or 0.3%)
 map to the assembled mitochondrion • 30-fold coverage of the mitochondrion 22
  23. Mitochondrial  Genome  Comparison • The white spruce putative mitochondrial sequence

    is
 6.0 Mbp in 78 scaffolds larger than 2 kbp with a 157 kbp N50 • The Norway spruce putative mitochondrial sequence is
 5.5 Mbp in 294 scaffolds larger than 4 kbp with a 28 kbp N50 • 3.3 Mbp of these two assemblies align to each other with BWA • 98.3% identity and 60% coverage of the Norway spruce putative mitochondrial sequence 23
  24. Summary  of  Results • One lane of MiSeq data assembles

    the
 124 kbp plastid genome of white spruce • One lane of HiSeq data assembles the estimated
 6 Mbp mitochondrion genome of white spruce • Aligned to the complete plastid genome (NC_021456) and putative mitochondrial sequences of Norway spruce 24 Alignment Identity! Coverage Plastid 99.2% 99.2% Mitochondrion 98.3% 60%