Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kamil S. Jaron polyploidy webinar

Kamil S. Jaron polyploidy webinar

Kamil S Jaroň

April 20, 2020
Tweet

More Decks by Kamil S Jaroň

Other Decks in Research

Transcript

  1. Non-sequencing techniques genome size estimate via flow cytometry ploidy via

    karyotype staining Gutekunst et al. 2018 Schwander & Crespi 2009 5
  2. Features from raw sequencing reads An individual genome genome size

    repetitivness transposable element loads heterozygosity ploidy Population genomics divergence of two individuals GWAS references in supplementary slides 6
  3. Kmer spectra analysis decomposition of reads to all consecutive kmers

    read kmers (k = 7) ATCTAAACGATCGATCGATCGA ATCTAAA TCTAAAC CTAAACG TAAACGA ... 7
  4. Kmer coverage kmer coverage AAAAAAAAAACAACGT 28 AAAAATAACACAACGT 31 AAAAATAACACAACGG 1

    AAAAATAACGCAACGT 47 AAAATTTACGCAACGT 2 AAAATTTACGCAACGA 17 ... ... 9
  5. GenomeScope: given kmer spectra, what is... 1n coverage the genome

    length heterozygosity repetitivness (% of the genome that is unique) Vurture et al. 2017 12
  6. Is there more information in the kmer spectra to utilise?

    Every heterozygous kmers has the sister kmers sequenced as well. 17
  7. 18

  8. Search for kmer pairs! kmers that are different in a

    single SNP kmers that form a unique pair 19
  9. Kmer pairs kmer coverage AAAAA 128 AAAAT 41 AAAAC 57

    AAATT 31 AAATA 38 AAACG 67 ... ... 20
  10. Two informative transformations : covA + covB ≈ number of

    genomic copies covB / (covA + covB) ≈ proportion of the kmers of the less represented allele 22
  11. structure A + B B / (A + B) AB

    200 0.5 AAB 300 0.33 AABB 400 0.5 AAAB 400 0.25 AAAAB 500 0.2 ... ... ... 1n coverage = 100 23
  12. 24

  13. 30

  14. 31

  15. 36

  16. GenomeScope assumptions: uniform distribution of heterozygous loci and duplications the

    coverage estimate is right heterozygosity is ”reasonable” (< 12%) GenomeScope strengths: negligible computational time, scales to any dataset Works for any coverage the allows a fit of the model Unbelivebly accurate genome size estimate* *There are conditions to it, check the resources in the supplement 40
  17. GenomeScope assumptions: uniform distribution of heterozygous loci and duplications the

    coverage estimate is right heterozygosity is ”reasonable” (< 12%) GenomeScope strengths: negligible computational time, scales to any dataset Works for any coverage the allows a fit of the model Unbelivebly accurate genome size estimate* *There are conditions to it, check the resources in the supplement 40
  18. Smudgeplot assumptions: heterozygosity >> pralogy only single genome with decent

    coverage computationally it’s still rather exhaustive (Biggest so far 3.5Gbp triploid; sorry plant people :-/) Smudgeplot strengths: Strong descriptive visualization technique especially paired up with GenomeScope 2.0 Decent 1n estimate 41
  19. Smudgeplot assumptions: heterozygosity >> pralogy only single genome with decent

    coverage computationally it’s still rather exhaustive (Biggest so far 3.5Gbp triploid; sorry plant people :-/) Smudgeplot strengths: Strong descriptive visualization technique especially paired up with GenomeScope 2.0 Decent 1n estimate 41
  20. Genome assembly is still useful but it’s good to have

    a big scale picture before anything as complicated as genome assembly. Do you have a weird genome to share? https://kamilsjaron.github.io/ peculiar-genomic-observations/ 42
  21. Genome assembly is still useful but it’s good to have

    a big scale picture before anything as complicated as genome assembly. Do you have a weird genome to share? https://kamilsjaron.github.io/ peculiar-genomic-observations/ 42
  22. Resources Kmer methods GenomeScope 2.0 https://github.com/tbenavi1/genomescope2.0 Smudgeplot https://github.com/KamilSJaron/smudgeplot KMC really

    fast kmer counter https://github.com/refresh-bio/KMC KAT versatile kmer analysis tools; especially useful to make sense of kmer spectra and genome assembly https://github.com/TGAC/KAT kmer GWAS papers: https://doi.org/10.1038/s41588-020-0612-7, https://doi.org/10.1101/2020.04.14.040675 Other kmer resources Zen and the art of k-mers Good kmer introduction https://dib-lab.github.io/zen-khmer/index.html Peculiar genomics cases A colleciton of strange observations in genomics https://kamilsjaron.github.io/peculiar-genomic-observations/ Genome size estimate considerations: Benchmarking of flow cytometry and kmer based methods using Rooibos genome by Mgwatyu et al. 2020 https://doi.org/10.3390/plants9020270; A peculiar case of marbled crayfish genome size: https://kamilsjaron.github.io/peculiar-genomic-observations/biological/2020/01/crayfish.html More details about the strawberry case: https://github.com/KamilSJaron/smudgeplot/wiki/strawberry-tutorial 44
  23. References Vurture, Gregory W., et al. GenomeScope: fast reference-free genome

    profiling from short reads. Bioinformatics 33.14 (2017): 2202-2204. Ranallo-Benavidez, T.R., Jaron, K.S. & Schatz, M.C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432 (2020). https://doi.org/10.1038/s41467-020-14998-3 karyotype from Schwander, T., & Crespi, B. J. (2009). Multiple direct transitions from sexual reproduction to apomictic parthenogenesis in Timema stick insects. Evolution: International Journal of Organic Evolution, 63(1), 84-103. flow cytometry picture from Gutekunst, J., Andriantsoa, R., Falckenhayn, C., Hanna, K., Stein, W., Rasamy, J., & Lyko, F. (2018). Clonal genome evolution and rapid invasive spread of the marbled crayfish. Nature ecology & evolution, 2(3), 567. sequencing errors in kmers pic from https://dib-lab.github.io/zen-khmer/manipulating-histograms.html root knot phylogeny from Lunt, D. H., Kumar, S., Koutsovoulos, G., & Blaxter, M. L. (2014). The complex hybrid origins of the root knot nematodes revealed through comparative genomics. PeerJ, 2, e356. 45