Kamil S. Jaron polyploidy webinar

Kamil S. Jaron polyploidy webinar

5777dabfd3de7c2a700eebf2ed8b33a4?s=128

Kamil S Jaroň

April 20, 2020
Tweet

Transcript

  1. Characterising features of polyploid genomes directly from sequencing reads Kamil

    S. Jaron @KamilSJaron April 20, 2020 1
  2. Genome assembly is difficult! 2

  3. Assembly errors diploid genome duplication heterozygous region sequencing genome assembly

    separatelly assembled alleles collapsed paralogs 3
  4. Genome assembly is even more difficult for polyploids! More prior

    information the better! 4
  5. Non-sequencing techniques genome size estimate via flow cytometry ploidy via

    karyotype staining Gutekunst et al. 2018 Schwander & Crespi 2009 5
  6. Features from raw sequencing reads An individual genome genome size

    repetitivness transposable element loads heterozygosity ploidy Population genomics divergence of two individuals GWAS references in supplementary slides 6
  7. Kmer spectra analysis decomposition of reads to all consecutive kmers

    read kmers (k = 7) ATCTAAACGATCGATCGATCGA ATCTAAA TCTAAAC CTAAACG TAAACGA ... 7
  8. Reads to kmers 8

  9. Kmer coverage kmer coverage AAAAAAAAAACAACGT 28 AAAAATAACACAACGT 31 AAAAATAACACAACGG 1

    AAAAATAACGCAACGT 47 AAAATTTACGCAACGT 2 AAAATTTACGCAACGA 17 ... ... 9
  10. kmer spectrum of an homozygous genome 10

  11. kmer spectrum of an heterozygous genome 11

  12. GenomeScope: given kmer spectra, what is... 1n coverage the genome

    length heterozygosity repetitivness (% of the genome that is unique) Vurture et al. 2017 12
  13. GenomeScope 2.0 given ploidy, for up to hexaploids ... Rhyker

    Ranallo-Benavidez 13
  14. kmer spectrum of a triploid genome Homozygous: Heterozygous: 3n 1n

    + 2n 3 x1n 3n 2n 1n genomic copies 14
  15. Model fit to the triploid genome 15

  16. kmer spectrum of a polyploid genome 16

  17. Is there more information in the kmer spectra to utilise?

    Every heterozygous kmers has the sister kmers sequenced as well. 17
  18. 18

  19. Search for kmer pairs! kmers that are different in a

    single SNP kmers that form a unique pair 19
  20. Kmer pairs kmer coverage AAAAA 128 AAAAT 41 AAAAC 57

    AAATT 31 AAATA 38 AAACG 67 ... ... 20
  21. The idea kmer pairs are mostly two alleles of the

    same locus. 21
  22. Two informative transformations : covA + covB ≈ number of

    genomic copies covB / (covA + covB) ≈ proportion of the kmers of the less represented allele 22
  23. structure A + B B / (A + B) AB

    200 0.5 AAB 300 0.33 AABB 400 0.5 AAAB 400 0.25 AAAAB 500 0.2 ... ... ... 1n coverage = 100 23
  24. 24

  25. Tetraploid root-knot nematode smudgeplot 25

  26. Combined GenomeScope/Smudgeplot interpretation AB AABB AAAB AAB 26

  27. Applications 27

  28. the mystery of Meladogyne floridensis a diploid among polyploids 28

  29. the mystery of Meladogyne floridensis 29

  30. 30

  31. 31

  32. What about theoretically homozygous genomes? 32

  33. A heterozygosity after gamete duplication? 33

  34. A heterozygosity after gamete duplication? 34

  35. No heterozygosity ⇒ we plot the paralog structure 35

  36. 36

  37. Octoploid strawberry smudgeplot! 37

  38. Main pitfall: duplications >> heterozygosity A diploid strawberry estimated as

    tetraploid. 38
  39. Considerations 39

  40. GenomeScope assumptions: uniform distribution of heterozygous loci and duplications the

    coverage estimate is right heterozygosity is ”reasonable” (< 12%) GenomeScope strengths: negligible computational time, scales to any dataset Works for any coverage the allows a fit of the model Unbelivebly accurate genome size estimate* *There are conditions to it, check the resources in the supplement 40
  41. GenomeScope assumptions: uniform distribution of heterozygous loci and duplications the

    coverage estimate is right heterozygosity is ”reasonable” (< 12%) GenomeScope strengths: negligible computational time, scales to any dataset Works for any coverage the allows a fit of the model Unbelivebly accurate genome size estimate* *There are conditions to it, check the resources in the supplement 40
  42. Smudgeplot assumptions: heterozygosity >> pralogy only single genome with decent

    coverage computationally it’s still rather exhaustive (Biggest so far 3.5Gbp triploid; sorry plant people :-/) Smudgeplot strengths: Strong descriptive visualization technique especially paired up with GenomeScope 2.0 Decent 1n estimate 41
  43. Smudgeplot assumptions: heterozygosity >> pralogy only single genome with decent

    coverage computationally it’s still rather exhaustive (Biggest so far 3.5Gbp triploid; sorry plant people :-/) Smudgeplot strengths: Strong descriptive visualization technique especially paired up with GenomeScope 2.0 Decent 1n estimate 41
  44. Genome assembly is still useful but it’s good to have

    a big scale picture before anything as complicated as genome assembly. Do you have a weird genome to share? https://kamilsjaron.github.io/ peculiar-genomic-observations/ 42
  45. Genome assembly is still useful but it’s good to have

    a big scale picture before anything as complicated as genome assembly. Do you have a weird genome to share? https://kamilsjaron.github.io/ peculiar-genomic-observations/ 42
  46. A big shout-out to my collaborators! Rhyker Ranallo-Benavidez Michael Schatz

    43
  47. Resources Kmer methods GenomeScope 2.0 https://github.com/tbenavi1/genomescope2.0 Smudgeplot https://github.com/KamilSJaron/smudgeplot KMC really

    fast kmer counter https://github.com/refresh-bio/KMC KAT versatile kmer analysis tools; especially useful to make sense of kmer spectra and genome assembly https://github.com/TGAC/KAT kmer GWAS papers: https://doi.org/10.1038/s41588-020-0612-7, https://doi.org/10.1101/2020.04.14.040675 Other kmer resources Zen and the art of k-mers Good kmer introduction https://dib-lab.github.io/zen-khmer/index.html Peculiar genomics cases A colleciton of strange observations in genomics https://kamilsjaron.github.io/peculiar-genomic-observations/ Genome size estimate considerations: Benchmarking of flow cytometry and kmer based methods using Rooibos genome by Mgwatyu et al. 2020 https://doi.org/10.3390/plants9020270; A peculiar case of marbled crayfish genome size: https://kamilsjaron.github.io/peculiar-genomic-observations/biological/2020/01/crayfish.html More details about the strawberry case: https://github.com/KamilSJaron/smudgeplot/wiki/strawberry-tutorial 44
  48. References Vurture, Gregory W., et al. GenomeScope: fast reference-free genome

    profiling from short reads. Bioinformatics 33.14 (2017): 2202-2204. Ranallo-Benavidez, T.R., Jaron, K.S. & Schatz, M.C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432 (2020). https://doi.org/10.1038/s41467-020-14998-3 karyotype from Schwander, T., & Crespi, B. J. (2009). Multiple direct transitions from sexual reproduction to apomictic parthenogenesis in Timema stick insects. Evolution: International Journal of Organic Evolution, 63(1), 84-103. flow cytometry picture from Gutekunst, J., Andriantsoa, R., Falckenhayn, C., Hanna, K., Stein, W., Rasamy, J., & Lyko, F. (2018). Clonal genome evolution and rapid invasive spread of the marbled crayfish. Nature ecology & evolution, 2(3), 567. sequencing errors in kmers pic from https://dib-lab.github.io/zen-khmer/manipulating-histograms.html root knot phylogeny from Lunt, D. H., Kumar, S., Koutsovoulos, G., & Blaxter, M. L. (2014). The complex hybrid origins of the root knot nematodes revealed through comparative genomics. PeerJ, 2, e356. 45
  49. Thank you for your attention! 46