Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Overcoming remaining challenges for completing Phase 1 VGP

GenomeArk
January 16, 2019

Overcoming remaining challenges for completing Phase 1 VGP

Erich D. Jarvis, Chair, G10K
The Rockefeller University
Howard Hughes Medical Institute

GenomeArk

January 16, 2019
Tweet

More Decks by GenomeArk

Other Decks in Research

Transcript

  1. Overcoming remaining challenges for completing Phase 1 VGP Erich D.

    Jarvis, Chair G10K The Rockefeller University Howard Hughes Medical Institute G10K-VGP Workshop, PAG meeting San Diego, CA January 16th, 2019
  2. The goal of the Vertebrate Genomes Project (VGP) is to

    generate at least one high-quality, error-free, near gapless, chromosome-level, haplotype phased, and annotated reference genome assembly for all extant vertebrate species, and to utilize those genomes to address fundamental questions in biology, disease, and conservation. Goal of the VGP
  3. Vertebrate Genomes Project (VGP) https://vertebrategenomesproject.org Phase 3: Original G10K milestone

    Phase 2: EBP family milestone Phase 1: Proof of principle & learning Phase 4: B10K, Bat1K, EPB milestones Redefining species concept Follow on twitter (Chair): @erichjarvis, @Genome10K, @GenomeArk
  4. Selected > 50 million year ago divergence time for Phase

    1 Jarvis et al 2014 Science Meredith et al 2013 Science Vocal learners
  5. Issues that need resolution 1. Need to collecting remaining species

    specimens – 85 of 260 species 2. Add and define a metric for haplotype phasing 3. Non-trio approach for complete phasing of haplotypes, at all steps 4. Missing genomic sequence from VGP assemblies 5. Assembly and identification of mitochondrial genomes 6. Assembly and identification of sex chromosomes 7. Defining “assembled chromosome” and “chromosomal-level assembly” 8. Scaling up production of genome assemblies to 6 per week 9. Improve multilayered and reference free alignments 10. Planning publications with VGP Phase 1 genomes
  6. Phase 1 VGP progress 0% 10% 20% 30% 40% 50%

    60% 70% 80% 90% 100% Total (262) Mammals (58) Birds (52) Reptiles (33) Amphibians (30) Ray-finned fishes (71) Shark fishes (14) Lobe-finned fishes (2) Jawless fishes (2) Proportion of Phase 1 ordinal VGP species genomes Funding Jan 2019 Samples 2019 Sequencing in progress or completed Jan 2019 Percent species September 2018: 17 genomes, 16 species February 2019 plan: +50 genomes, +50 species August 2019 plan: ~350 genomes, 262 vertebrate orders
  7. Adding haplotype phasing metric 3.4.2.QV40.90% h1: 3.4.2.QV40.90% N50 contig N50

    scaffold chromosome base accuracy phasing h2: 3.4.2.QV40.h2.90% maternal paternal x.y.z.QV.p% G10K-VGP Assembly working group (Durbin, Lewin, Jarvis, Myers et al)
  8. F1 Each haplotype contig N50 > 4Mb G10K Assembly working

    group (Koren et al 2018 Nature Biotech; Rhie et al in preparation) Trio approach proves theoretically possible to assemble higher quality 3.4.2.QV50.h1.98% 3.4.2.QV50.h1.98% maternal paternal
  9. Missing genes not missing before Breaks at higher GC rich

    regions, repetitive regions, gene biased to smaller reads; hummingbird G10K Assembly working group (Korlach, Mello et al analyses)
  10. Mitochondria missing in v1.5 assembly G10K Assembly working group (Formenti

    et al analyses)
  11. Sex chromosomes • Differential coverage of sex chromosomes • Short

    read on each sex (Guojie Zhang) • Trio approach (Arang Rhie) • Bionano or HiC mapping ? G10K Assembly working group (planned analyses led by Kateryna Makova and Paul Medvedev )
  12. 1 1B 1 1B Warren et al 2010 VGP assembly

    2018 Chromosome 1 + 1B Chromosome 1 Chromosome 1A Chromosome 1A Chromosome 2 Chromosome 2 Chromosome 3 Chromosome 3 Chromosome 4 Chromosome 4 Chromosome 4A Chromosome 4A Chromosome 5 Chromosome 5 Chromosome 6 Chromosome 6 Chromosome 7 Chromosome 7 Chromosome 8 Chromosome 8 Chromosome 9 Chromosome 9 Chromosome 10 Chromosome 10 Chromosome 11 Chromosome 11 Chromosome 12 Chromosome 12 Chromosome 13 Chromosome 13 Chromosome 14 Chromosome 14 Chromosome 15 Chromosome 15 Chromosome 16 Chromosome 16 Chromosome 17 Chromosome 17 Chromosome 18 Chromosome 18 Chromosome 19 Chromosome 19 Chromosome 20 Chromosome 20 Chromosome 21 Chromosome 21 Chromosome 22 Chromosome 22 Chromosome 23 Chromosome 23 Chromosome 24 Chromosome 24 Chromosome 25 Chromosome 25 Chromosome 26 Chromosome 26 Chromosome 27 Chromosome 27 Chromosome 28 Chromosome 28 LGE22 Chromosome 29 Chromosome 30 Chromosome Z Chromosome Z male female male female FISH mapping HiC mapping G10K Assembly working group (Lewin, Damas, Howe, Wood et al) FISH mapping HiC mapping Defining “chromosomal-level assembly” zebra finch chromosome 1
  13. Scaling up to 6 genomes/week • Agarose plug method timing

    is a bottle neck • PacBio Sequel II and 8M SMRT cell will enable this goal • Need to separate production from R&D • Need dedicated effort to continuously obtain samples • Need more rapid production of RNASeq/IsoSeq • Need VGP pipeline on DNANexus tmoro be functional • Need more systematic upload of data to annotation archives
  14. Reference free Cactus alignment has fewer gaps than MultiZ 79.5%

    80.0% 80.5% 81.0% 81.5% Cactus MultiZ Percentage Gap Free B10K working group (Armstrong, Paton, Cahill, et al analyses)
  15. Planned Phase 1 VGP publications 1. VGP assembly paper, improving

    genome quality and biology, and setting standards 2. DNA sample preparation method comparisons for high quality genomes 3. Use of high quality genomes to inform and reverse the current 6th mass extinction VGP submissions for 2019 1. Genome-scale family tree of vertebrates 2. Comparative genomics of specialized traits 3. Genomics of vocal learning and spoken language 4. A universal vertebrate gene orthology and nomenclature 5. Deciphering vertebrate chromosomal genome evolution 6. Reconstruction ancestor genome of vertebrates and vertebrate clades 7. Evolution of bases and chromosomes of the human genome 8. Why are some lineages more resistant to diseases relative to others 9. Conservation genomics of endangered species 10. The genomes of all remaining Kakapo parrots on the planet 11. Genetic signatures of domestication across vertebrates 12. Sex determination and sex chromosome evolution among vertebrates 13. Brain cell-type evolution and homologies across vertebrates VGP submissions for 2020
  16. Give proper and fair credit