Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Organellar genomes of
 white spruce (Picea glauca): Assembly and annotation

Organellar genomes of
 white spruce (Picea glauca): Assembly and annotation

Conifer Genome Summit 2014

Shaun Jackman

June 17, 2014
Tweet

More Decks by Shaun Jackman

Other Decks in Science

Transcript

  1. Organellar genomes of

    white spruce (Picea glauca)
    Assembly and annotation
    Conifer Genome Summit 2014

    Shaun Jackman @sjackman

    2014-06-17
    1
    Shaun D Jackman1, Anthony Raymond1, Ben Vandervalk1, Hamid Mohamadi1, René Warren1, Stephen Pleasance1,

    Robin Coope1, Macaire MS Yuen2, Christopher Keeling2, Carol Ritland2, Jean Bousquet3, Alvin Yanchuk4,

    Kermit Ritland2, John MacKay3, Steven JM Jones1, Jörg C Bohlmann2 and İnanç Birol1
    (1) BC Cancer Agency, Genome Sciences Centre, Vancouver, BC, Canada, (2) University of British Columbia, Vancouver, BC, Canada,

    (3) Univesité Laval, Quebec, QC, Canada, (4) British Columbia Ministry of Forests, Victoria, BC, Canada
    Photo credit: Joseph O'Brien, USDA Forest Service, bugwood.org

    View full-size slide

  2. Organellar genomes of white spruce (Picea glauca)
    Assembly and annotation
    Conifer Genome Summit 2014

    Shaun Jackman @sjackman

    2014-06-17
    2

    View full-size slide

  3. Organellar Sequence in
    the Genome Assembly
    Courtesy of Tony Raymond
    @tgjraymond
    3
    ~6 Mbp

    View full-size slide

  4. Plastid Genome Photo credit Kristian Peters

    View full-size slide

  5. The plastid genome
    Assembled one lane of MiSeq using ABySS.

    Six scaffolds with depth of coverage >70x
    and length >5 kbp reconstruct the plastid
    5
    Six plastid sequences

    View full-size slide

  6. Plastid  Genome  Finishing
    • 1/140 or 0.7% of reads are plastidial—80x coverage
    • 99.2% identity to the Norway spruce plastid
    • Genome finishing using additional Illumina and PacBio
    data by René Warren, Daniel Paulino and Greg Taylor 6
    Six contigs in one circular scaffold
    One circular contig after finishing

    View full-size slide

  7. Annotated with MAKER using
    Norway spruce genes for evidence

    View full-size slide

  8. Mitochondrial Genome Illustration courtesy of Gary Carlson

    http://gcarlson.com/

    View full-size slide

  9. k-mer coverage vs GC content Assembled one lane of

    HiSeq data using ABySS
    9
    Putative!
    mitochondrion

    View full-size slide

  10. Classifying the sequences
    using k-means clustering
    10

    View full-size slide

  11. Mitochondrial  Genome  Assembly
    • Assembled one lane of HiSeq data using ABySS
    • 8.4 Mbp in 1001 scaffolds larger than 2 kbp with a 29 kbp N50
    • Separated putative mitochondrial sequence by

    length, depth of coverage and GC content
    • 6.0 Mbp in 223 scaffolds larger than 2 kbp with a 39 kbp N50
    • Scaffold using one lane of mate-pair HiSeq reads
    • 5.9 Mbp in 61 scaffolds larger than 2 kbp with a 287 kbp N50
    • The largest scaffold is 598 kbp
    • 1/350 or 0.3% of reads are mitochondrial—30x coverage
    11

    View full-size slide

  12. Annotated using all complete viridiplantae mitochondrial genomes in NCBI GenBank as evidence
    Note: the mitochondrion is assembled in 61 scaffolds and has been artificially circularized for this figure
    Annotated using MAKER

    View full-size slide

  13. Mitochondrial  Genome  Annotation
    • 54 coding genes composing 50 kbp (<1%) of the genome
    • 23 transfer RNA genes and 4 ribosomal RNA genes
    • Repeats composing 400 kbp (~7%) of the genome
    13
    0  
    !
    !
    !
    10  
    !
    !
    !
    20  
    !
    !
    !
    30  
    !
    !
    !
    40  
    !
    !
    !
    50
    Size  (kb)
    ATP synthase
    !
    Cythochrome c maturation
    Complex III (ubichinol cytochrome c reductase)
    Complex IV (cytochrome c oxidase)
    DNA polymerase
    Maturase
    Membrane targeting and translocation
    !
    Complex I (NADH dehydrogenase)
    !
    !
    DNA-dependent RNA polymerase
    Ribosomal proteins (LSU)
    !
    Ribosomal proteins (SSU)
    !
    Complex II (succinate dehydrogenase)
    Transfer RNAs
    Unknown
    !
    UNK
    !
    Simple repeat
    !
    rRNA
    !
    NHF
    !
    LTR
    !
    Low complexity
    !
    LINE
    Copia Gypsy
    Jockey R1

    View full-size slide

  14. Summary  of  Results
    14
    Plastid Mitochondrion
    Number of scaffolds Finished 61 scaffolds
    Scaffold N50 287 kbp
    Genome size 123 kbp 5.92 Mbp
    Coding genes 74 54
    Transfer RNA genes 36 23
    Ribosomal RNA genes 4 4
    Coding gene content 64 kbp (52%) 50 kbp (<1%)
    Repeat content - 400 kbp (7%)

    View full-size slide

  15. Organellar genomes of

    white spruce (Picea glauca)
    Assembly and annotation
    Conifer Genome Summit 2014

    Shaun Jackman @sjackman

    2014-06-17
    15
    Shaun D Jackman1, Anthony Raymond1, Ben Vandervalk1, Hamid Mohamadi1, René Warren1, Stephen Pleasance1,

    Robin Coope1, Macaire MS Yuen2, Christopher Keeling2, Carol Ritland2, Jean Bousquet3, Alvin Yanchuk4,

    Kermit Ritland2, John MacKay3, Steven JM Jones1, Jörg C Bohlmann2 and İnanç Birol1
    (1) BC Cancer Agency, Genome Sciences Centre, Vancouver, BC, Canada, (2) University of British Columbia, Vancouver, BC, Canada,

    (3) Univesité Laval, Quebec, QC, Canada, (4) British Columbia Ministry of Forests, Victoria, BC, Canada
    Photo credit: Joseph O'Brien, USDA Forest Service, bugwood.org

    View full-size slide

  16. Further  Work
    • Improve the mitochondrial assembly by scaffolding

    and closing gaps
    • Investigate how the mitochondrial genome grew

    to such a large size
    • Look for evidence of transfer of DNA between the
    nuclear and mitochondrial genomes
    17

    View full-size slide

  17. 500-bp MiSeq reads Courtesy of Robin Coope @robincoope
    18
    Cartridge splitter
    MiSeq-XL cartridge
    base
    MiSeq-XL reagent tray &
    lid
    Screws for reagent tray
    lid
    Splash guard

    View full-size slide

  18. Merge overlapping reads
    FastQC plot of base quality Courtesy of Tony Raymond @tgjraymond
    19

    View full-size slide

  19. Connecting Paired-end Reads
    20
    2x250 2x150 2x300
    400 bp 500 bp 600 bp
    Exists?
    Bloom Filter
    Courtesy of İnanç Birol

    View full-size slide

  20. Plastid  Genome  Sequence
    • 9.4 million MiSeq reads of 300 bp
    • Merged the overlapping paired reads
    • 3.0 million merged reads of 492 bp median
    • Assembled these reads using ABySS
    • Separated six plastidial sequences by

    length and depth of coverage
    21

    View full-size slide

  21. Mitochondrial  Genome  Sequence
    • 267 million HiSeq reads of 150 bp
    • Filled the gap between the paired-end reads using a
    Bloom filter de Bruijn Graph (ABySS-connectpairs)
    • 1.4 million merged reads of 465 bp median
    • Assembled these reads using ABySS
    • 377 thousand merged reads (1/350 or 0.3%)

    map to the assembled mitochondrion
    • 30-fold coverage of the mitochondrion
    22

    View full-size slide

  22. Mitochondrial  Genome  Comparison
    • The white spruce putative mitochondrial sequence is

    6.0 Mbp in 78 scaffolds larger than 2 kbp with a 157 kbp N50
    • The Norway spruce putative mitochondrial sequence is

    5.5 Mbp in 294 scaffolds larger than 4 kbp with a 28 kbp N50
    • 3.3 Mbp of these two assemblies align to each other with BWA
    • 98.3% identity and 60% coverage of the Norway spruce
    putative mitochondrial sequence
    23

    View full-size slide

  23. Summary  of  Results
    • One lane of MiSeq data assembles the

    124 kbp plastid genome of white spruce
    • One lane of HiSeq data assembles the estimated

    6 Mbp mitochondrion genome of white spruce
    • Aligned to the complete plastid genome (NC_021456)
    and putative mitochondrial sequences of Norway spruce
    24
    Alignment Identity! Coverage
    Plastid 99.2% 99.2%
    Mitochondrion 98.3% 60%

    View full-size slide