Slide 1

Slide 1 text

Arang Rhie NHGRI, NIH Bethesda, MD, USA Jan. 16th 2019 Trio binning generates nearly complete zebra finch genomes and reveals complex haplotypes and sex chromosome differences

Slide 2

Slide 2 text

The zebra finch, Taeniopygia guttata 2 Only male finches learn how to sing (vocal learners) Sexually dimorphic ! ♂ ♀ bTaeGut1 bTaeGut2

Slide 3

Slide 3 text

The 1st Zebra finch genome Warren et al., Nature 2010 First draft (2010) Current ver: 3.2.4 Contig N50: 38.6 Kb Scaffold N50: 8.2 Mb Plasmid, cosmid, BAC-end sanger sequencing (~6x) Genetic map Chr1 ~ 28 4 Linkage groups 3 fissions vs chicken: Chr1, Chr1A, Chr1B; Chr4, Chr4A bTaeGut1

Slide 4

Slide 4 text

Long reads correct misassembled genes Korlach and Jarvis et al. GigaScience (2017) R1a R2a R1b R2b Sanger reference PacBio primary PacBio alt. DUSP1 R1a R2a DUSP1 • Repeat in two haplotypes resolved bTaeGut1 R1b R2b • Truncated exon properly positioned

Slide 5

Slide 5 text

The Vertebrate Genomes Project Pipeline Rhie and VGP Assembly Working Group, in preparation 5 PacBio 10XG Contigging Scaffolding BioNano Scaffolding Hi-C Gap-filling & Curation Final assembly A A A C TGGA TGGGGA TGGGGA TGGGGA A TGGGGA Polishing Scaffolding exon 1 exon 2 exon 3 Primary Alternate VGP Workshop Wed. 9:30 am Sunrise Rm

Slide 6

Slide 6 text

The VGP finch genomes Sanger ref. VGP Primary asm. VGP Primary asm. Sanger ref. Each box = Chr Chr Z Chr 2 Chr 1+Chr1B I have both Z and W I am the same bTaeGut1 Contig N50=12.0 Mb Scaffold N50=58.4 Mb Contig N50: 4.0 Mb Scaffold N50: 67.4 Mb bTaeGut1 bTaeGut2 Chr Z Chr 2 Chr 1+Chr1B

Slide 7

Slide 7 text

Diploid genome assembly problem 7

Slide 8

Slide 8 text

The genomes assembly problem Esperanza Molly, yak dam Duke, highland sire ~1% heterozygosity

Slide 9

Slide 9 text

Smashed haplotype

Slide 10

Slide 10 text

Pseudo-haplotype

Slide 11

Slide 11 text

Complete haplotypes

Slide 12

Slide 12 text

The haplotype resolved zebra finch 1 2

Slide 13

Slide 13 text

Trio binning with parental k-mers Koren and Rhie et al, De novo assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Paternal haplotigs Maternal haplotigs • K-mer profiling of each parent (Illumina, 60x) Paternal Maternal Paternal reads Maternal reads • Childs’ read binning and assembling • K-mer profiling of the child (PacBio, 97x) Child

Slide 14

Slide 14 text

2 genomes in 1 genome; contigs VGP Non-trio contigs Paternal Maternal Trio-binning contigs Paternal Maternal

Slide 15

Slide 15 text

Heterozygosity causes allelic duplication 0 20 40 60 80 100 120 140 160 180 Tire track eel Eastern happy C anada Lynx G reater H orseshoe Bat K akapo C hannel bull blenny Platypus G oode's Thornscrub Tortoise Flier C ichlid Tw o-lined caecilian B lunt-snouted clingfish C lim bing perch A nna's hum m ingbird Pale spear-nosed B at Zebra Finch (m ale) Thorny Skate Zebra Finch (fem ale) Primary Alternates - 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 Assembled size / Exp. Genome Size Heterozygosity (%) Assembly Gene Dup. (%) FALCON-Primary 5.0 Sanger ref. 2.5 Trio binning 1.4 Roach et al. BMC Bioinformatics, 2018

Slide 16

Slide 16 text

Non-Trio VGP scaffolds Trio-binning VGP scaffolds Z W 2 genomes in 1 genome; scaffolds Paternal Paternal Maternal Maternal Z

Slide 17

Slide 17 text

Nearly complete chromosomes Z W VGP curated asm VGP curated asm Z Z Paternal Maternal Chr 2

Slide 18

Slide 18 text

SVs between 2 haplotypes 18 CR1 Paternal Maternal

Slide 19

Slide 19 text

RNA/Iso-Seq confirms allele specific expression 19 Chr. W : 382 – 461 k Chr. Z : 382 – 461 k TXNL1 ST8SIA3 WDR7 Brain Ovary Brain Ovary TXNL1 ST8SIA3 WDR7 ~25x ~100x ~200x ~16x bTaeGut2 Brain IsoSeq bTaeGut1 Brain IsoSeq bTaeGut2 W Brain Ovary Brain Ovary bTaeGut2 Z ~100x ~25x ~16x ~200x

Slide 20

Slide 20 text

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 Bos mutus Ovis aries musimon Bison Bison Bos indicus Ovis aries Capra hircus Bos taurus Bubalus bubalis Bos indicus x Bos taurus No more short-read polishing! • Adding coverage hurts quality! • 4QV drop • Quality estimate for haplotypes >99.99% (QV45) • Indels are haplotype mixing, not tech • Need a trio • Illumina-polishing not necessary (and can hurt) • Current diploid Arrow not enough (3QV drop) Annotation data courtesy of NCBI (Francoise Thibaud-Nissen) Frameshift corrected protein-coding genes in bovids Illumina 454 + PacBio + Illumina 454 + Illumina SOLiD PacBio + Illumina PacBio Trio 2013 2014 2015 2016 2018 PacBio ?

Slide 21

Slide 21 text

• Complete haplotypes as the unit of analysis • Compare haplotypes vs. haplotypes 21 Summary • Trio binning resolves diploid genomes • Get 2 haplotypes from 1 genome • Works on low heterozygous genome, even human • Compatible with existing tools • Linear haplotypes, no graphs • No illumina polishing required

Slide 22

Slide 22 text

22 The Assembly Working Group Eric D. Jarvis Olivier Fedrigo Sadye Paez Adam M. Phillippy Arang Rhie Sergey Koren Zemin Ning Kerstin Howe William Chow Harris Lewin Joana Damas Richard Durbin Shane McCarthy Gene Myers Martin Pippel Marcela U-Silva Jonas Korlach Ivan Sovic Christopher Dunn Sarah Kingan Maria Simbirsky Brett Hannigan Siddarth Selvaraj Guojie Zhang Yang Zhou Chai Fungtammasan

Slide 23

Slide 23 text

Q & A 2 3 How was my song?