Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Haplotype resolved human genomes

Arang Rhie
January 16, 2019

Haplotype resolved human genomes

@PacBio Developer's Session

Arang Rhie

January 16, 2019
Tweet

More Decks by Arang Rhie

Other Decks in Research

Transcript

  1. Arang Rhie Adam Phillippy’s Group Genome Informatics Section, Computational and

    Statistical Genomics Branch, NHGRI Haplotype resolved human genomes @ArangRhie
  2. The diploid genome assembly problem Diploid genome Smashed Assembly Phased

    (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Pseudo-haplotype + alts
  3. Asian specific insertions and the frequency, found from AK1 Under-Represented

    Variations in GRCh38 Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  4. Identify haplotype differences A B • CYP2D6 is involved in

    metabolizing >50% of available drugs • Genetic variation and copy number affects drug efficacy CYP2D6*10: Intermediate ~ poor metabolizer CYP2D6*2: Extensive metabolizer Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016) Chr. 22
  5. Can we phase across the whole chromosomes? Seo, Rhie, Kim,

    and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016)
  6. The diploid genome assembly problem Diploid genome Smashed Assembly Phased

    (haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Complete haplotypes
  7. The diploid genome assembly problem Diploid genome Paternal assembly ?

    De novo: From scratch, without looking at the original picture (reference) Phased reads sequencing assembling Phased reads Maternal assembly assembling
  8. Trio binning with parental k-mers Koren and Rhie et al,

    De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Paternal haplotigs Maternal haplotigs • K-mer profiling of each parent (Illumina, 60x) Paternal k-mers Maternal k-mers • K-mer profiling of the child (PacBio, 120x) Child Paternal Maternal 49.6% (67.3x) 10.9 kb 49.3% (66.9x) 11.7 kb 1.1% (1.4x), avg 1.3 kb Paternal reads Maternal reads • Childs’ read binning and assembling canu
  9. Robust for a wide range of heterozygosity 0.8% 1.2% 1.6%

    0.9% *Heterozygosity level estimated with GenomeScope 1.5% 0.12 % 0.20 % 0.29 % NA12878 (CEU) F HG00733 (PUR) F NA19240 (YRI) F HG002 (Ashkenazi) M Platform PacBio (WashU) PacBio 60kb (20kb) PacBio (WashU) PacBio 15kb CCS Haplotype (Cov.) Maternal (32+9x) Paternal (31+9x) Maternal (44.6x) Paternal (43.6x) Maternal (37x) Paternal (31x) Maternal (11+8x) Paternal (11+8x) NG50 (Mb) 1.2 1.2 19.1 23.9 9.0 3.0 17.0 14.9 0.17 %
  10. The HG002 CCS Assembly HG002 (Ashkenazi) M PacBio 15kb CCS

    Maternal (11+8x) Paternal (11+8x) 17.0 14.9 Platform Haplotype (Cov.) NG50 (Mb) 3.04 2.96 Size (Gb)
  11. A nearly perfect diploid genome 125x PacBio coverage (~60x per

    haplotype), TrioCanu haplotig NG50 ~70 Mbp, BUSCOs 94% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 X Maternal (yak) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 X Paternal (highland) Esperanza 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X GRCh38
  12. 1 5 Human Pan-Genome Project Population: http://www.internationalgenome.org/ Initiative to collect

    diverse, high-quality haplotypes with trio binning • Illumina WGS for the parents, PacBio and Nanopore for the child • Pilot 10 trios selected to maximize non-ref haplotype AF 2 PUR 1 KHV 3 ACB 1 MSL 1 PJL 1 GWD 1 CLM 5 African 3 American 1 East Asian 1 South Asian
  13. What can you see from a phased assembly? Koren and

    Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) 0
  14. Phasing the MHC region Koren and Rhie et al, De

    novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018)
  15. No more short-read polishing! • Adding coverage hurts quality! •

    4QV drop • Quality estimate for haplotypes >99.99% (QV45) • Indels are haplotype mixing, not tech • Need a trio • Illumina-polishing not necessary (and can hurt) • Current diploid Arrow not enough (3QV drop) Annotation data courtesy of NCBI (Francoise Thibaud-Nissen) Frameshift corrected protein-coding genes in bovids Illumina 454 + PacBio + Illumina 454 + Illumina SOLiD PacBio + Illumina PacBio Trio 2013 2014 2015 2016 2018 PacBio ?
  16. • Diploid assembly is solved by trios Trio binning is

    current best practice All levels of assembly quality improved Complete haplotypes will become the new norm • A human pan-genome reference A collection of diverse, high-quality haplotypes Including complex heterozygous SVs Summary
  17. Kronenberg et al., FALCON-Phase: Integrating PacBio and Hi-C data for

    phased diploid genomes, BioRxiv (2018) FALCON-Phase Trio-binning FALCON-Phase as an alternative? • Investigating ways to improve for less het. genomes HG002 (0.17) Angus x Brahman (0.93) bTaeGut2 (1.2)
  18. VGP GenomeArk: 1st data release https://vgp.github.io/genomeark Jennifer Vashon of Maine

    Department of Inland Fisheries and Wildlife, left, and UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern Maine whose DNA was used to create first-ever whole genome for the species. The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
  19. Acknowledgements genomeinformatics.github.io • Adam Phillippy • Sergey Koren • Brian

    Walenz • Alexander Dilthey • Brian Ondov • Jay Ghurye Korean (AK1) Jeong-Sun Seo Changhoon Kim Junsoo Kim Sangjin Lee Tim Smith John Williams Cattle/pigs Pan-Genome Karen Miga Benedict Paten NIH NHGRI NISC VGP Assembly Working Group Erich Jarvis Richard Durbin Gene Myers Kerstin Howe Harris Lewin Olivier Fedrigo Shane McCarthy Martin Pippel Will Chow Joana Damas PacBio CCS Michael Hunkapiller Paul Peluso David Rank Trio binning is available in https://github.com/marbl/canu
  20. Koren and Rhie et al, De novo assembly of haplotype-resolved

    genomes with trio binning, Nat. Biotech (2018) 24 Pseudo-haplotype + alts Complete haplotypes Assembly Graph Smashed haplotypes
  21. Trio-binning outperforms FALCON-Unzip Koren and Rhie et al, De novo

    assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Primary = Longest path in the graph (pseudo-hap) Alternate haplotigs = Alternate path in the bubble Haplotigs = Contigs in each assembly agree with parental haplotypes (Phased) TrioCanu FALCON-unzip Angus specific k-mer counts Angus specific k-mer counts Brahman specific k-mer counts Brahman specific k-mer counts
  22. Phasing NA12878 Koren and Rhie et al, De novo assembly

    of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) TrioCanu FALCON-Unzip Supernova