Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trio binning generates nearly complete zebra finch genomes and reveals complex haplotypes and sex chromosome differences

Arang Rhie
January 15, 2019

Trio binning generates nearly complete zebra finch genomes and reveals complex haplotypes and sex chromosome differences

The zebra finch is a songbird that has been studied especially for its ability to communicate through learned vocalizations, like humans. Male zebra finches develop a specialized brain pathway for vocal learning and learn how to sing, whereas in females this pathway atrophies and they don’t learn to sing as adults. Thus, the trait in this species is sex linked. Despite its importance for understanding vocal learning, the current zebra finch Sanger-based reference genome is a male, the homogameotic sex, lacks the W chromosome and it suffers from short continuity. Here, we generated a nearly complete diploid genome assembly of a female zebra finch, using a scaffolding modification of our recently developed trio binning approach. This approach uses short sequence reads of the parents to partition long, single-molecule reads of the offspring into individual haplotype. Using this approach, we successfully assembled >95% of the expected haploid genome size for each haplotype. The contigs were brought together using the 10K Vertebrate Genomes Project (VGP) pipeline with 10X Genomics link reads, longer range Bioano optical maps, and chromosome range Arima Genomics Hi-C interactions to generate independently assembled chromosomal-scale scaffolds of each haplotype, including both the Z and W sex chromosomes. Comparison of this assembly with a non-trio VGP approach applied to the same female data and the same reference male, revealed that the trio approach resulted in a much more complete and accurate chromosome assembly of both haplotypes. Direct comparison of the haplotypes revealed highly heterozygous regions, including genes important for vocal learning brain circuits. This is the first haplotype-resolved assembly of an avian genome, which will be key to understanding the role of allele-specific expression in vocal learning. More generally, we demonstrate that is now possible to assemble and compare fully resolved haplotype chromosomes for complex, vertebrate genomes.
@Plant and Animal Genome 2019, Avian Genomics - Going Wild!

Arang Rhie

January 15, 2019
Tweet

More Decks by Arang Rhie

Other Decks in Research

Transcript

  1. Arang Rhie NHGRI, NIH Bethesda, MD, USA Jan. 16th 2019

    Trio binning generates nearly complete zebra finch genomes and reveals complex haplotypes and sex chromosome differences
  2. The zebra finch, Taeniopygia guttata 2 Only male finches learn

    how to sing (vocal learners) Sexually dimorphic ! ♂ ♀ bTaeGut1 bTaeGut2
  3. The 1st Zebra finch genome Warren et al., Nature 2010

    First draft (2010) Current ver: 3.2.4 Contig N50: 38.6 Kb Scaffold N50: 8.2 Mb Plasmid, cosmid, BAC-end sanger sequencing (~6x) Genetic map Chr1 ~ 28 4 Linkage groups 3 fissions vs chicken: Chr1, Chr1A, Chr1B; Chr4, Chr4A bTaeGut1
  4. Long reads correct misassembled genes Korlach and Jarvis et al.

    GigaScience (2017) R1a R2a R1b R2b Sanger reference PacBio primary PacBio alt. DUSP1 R1a R2a DUSP1 • Repeat in two haplotypes resolved bTaeGut1 R1b R2b • Truncated exon properly positioned
  5. The Vertebrate Genomes Project Pipeline Rhie and VGP Assembly Working

    Group, in preparation 5 PacBio 10XG Contigging Scaffolding BioNano Scaffolding Hi-C Gap-filling & Curation Final assembly A A A C TGGA TGGGGA TGGGGA TGGGGA A TGGGGA Polishing Scaffolding exon 1 exon 2 exon 3 Primary Alternate VGP Workshop Wed. 9:30 am Sunrise Rm
  6. The VGP finch genomes Sanger ref. VGP Primary asm. VGP

    Primary asm. Sanger ref. Each box = Chr Chr Z Chr 2 Chr 1+Chr1B I have both Z and W I am the same bTaeGut1 Contig N50=12.0 Mb Scaffold N50=58.4 Mb Contig N50: 4.0 Mb Scaffold N50: 67.4 Mb bTaeGut1 bTaeGut2 Chr Z Chr 2 Chr 1+Chr1B
  7. Trio binning with parental k-mers Koren and Rhie et al,

    De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Paternal haplotigs Maternal haplotigs • K-mer profiling of each parent (Illumina, 60x) Paternal Maternal Paternal reads Maternal reads • Childs’ read binning and assembling • K-mer profiling of the child (PacBio, 97x) Child
  8. 2 genomes in 1 genome; contigs VGP Non-trio contigs Paternal

    Maternal Trio-binning contigs Paternal Maternal
  9. Heterozygosity causes allelic duplication 0 20 40 60 80 100

    120 140 160 180 Tire track eel Eastern happy C anada Lynx G reater H orseshoe Bat K akapo C hannel bull blenny Platypus G oode's Thornscrub Tortoise Flier C ichlid Tw o-lined caecilian B lunt-snouted clingfish C lim bing perch A nna's hum m ingbird Pale spear-nosed B at Zebra Finch (m ale) Thorny Skate Zebra Finch (fem ale) Primary Alternates - 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 Assembled size / Exp. Genome Size Heterozygosity (%) Assembly Gene Dup. (%) FALCON-Primary 5.0 Sanger ref. 2.5 Trio binning 1.4 Roach et al. BMC Bioinformatics, 2018
  10. Non-Trio VGP scaffolds Trio-binning VGP scaffolds Z W 2 genomes

    in 1 genome; scaffolds Paternal Paternal Maternal Maternal Z
  11. RNA/Iso-Seq confirms allele specific expression 19 Chr. W : 382

    – 461 k Chr. Z : 382 – 461 k TXNL1 ST8SIA3 WDR7 Brain Ovary Brain Ovary TXNL1 ST8SIA3 WDR7 ~25x ~100x ~200x ~16x bTaeGut2 Brain IsoSeq bTaeGut1 Brain IsoSeq bTaeGut2 W Brain Ovary Brain Ovary bTaeGut2 Z ~100x ~25x ~16x ~200x
  12. 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

    5,000 Bos mutus Ovis aries musimon Bison Bison Bos indicus Ovis aries Capra hircus Bos taurus Bubalus bubalis Bos indicus x Bos taurus No more short-read polishing! • Adding coverage hurts quality! • 4QV drop • Quality estimate for haplotypes >99.99% (QV45) • Indels are haplotype mixing, not tech • Need a trio • Illumina-polishing not necessary (and can hurt) • Current diploid Arrow not enough (3QV drop) Annotation data courtesy of NCBI (Francoise Thibaud-Nissen) Frameshift corrected protein-coding genes in bovids Illumina 454 + PacBio + Illumina 454 + Illumina SOLiD PacBio + Illumina PacBio Trio 2013 2014 2015 2016 2018 PacBio ?
  13. • Complete haplotypes as the unit of analysis • Compare

    haplotypes vs. haplotypes 21 Summary • Trio binning resolves diploid genomes • Get 2 haplotypes from 1 genome • Works on low heterozygous genome, even human • Compatible with existing tools • Linear haplotypes, no graphs • No illumina polishing required
  14. 22 The Assembly Working Group Eric D. Jarvis Olivier Fedrigo

    Sadye Paez Adam M. Phillippy Arang Rhie Sergey Koren Zemin Ning Kerstin Howe William Chow Harris Lewin Joana Damas Richard Durbin Shane McCarthy Gene Myers Martin Pippel Marcela U-Silva Jonas Korlach Ivan Sovic Christopher Dunn Sarah Kingan Maria Simbirsky Brett Hannigan Siddarth Selvaraj Guojie Zhang Yang Zhou Chai Fungtammasan