-They are accurate -Long reads with ≥Q20 (99%) single-molecule accuracy -They have single-molecule resolution -Sequence DNA or RNA -They are unbiased -No DNA amplification, least GC content and sequence complexity bias
days 6 days - Collected ~80 g of needles and froze - Extracted 56 µg of DNA with Circulomics kit - Prepped 2 HiFi libraries at ~25 kb - Ran 31 SMRT Cells 8M across 9 instruments in 7 days - Streamed data for immediate CCS analysis conversion to HiFi reads - Used hifiasm for quick, haplotype aware assembly 17 DAYS
short reads in all basic stats 1. Sequencing and assembling mega-genomes of mega-trees: the giant sequoia and coast redwood genomes 2. Transcript set of Abies alba from Neale, D. et al. Varying number of transcripts aligned to each genome (4,958 mapped to PacBio HiFi redwood, 4,760 mapped to ONT redwood) California Redwood Genome Assembly Results Methodology PacBio HiFi ONT + short reads1 Genome Coverage 22-fold 23-fold + 122-fold Assembly Size (Gb) 47.7 26.5 Contig N50 (Mb) 1.92 0.11 BUSCO Complete 59% 56% Mapped transcripts with frameshift errors2 0.12% 1.97%
genomes have very large introns that make BUSCO an inefficient measure of completeness, since it makes out ~70% - 1.92 Mb contig N50 - No gaps - >5X the haploid genome size (resolving most of the hexaploidy) - 59% of BUSCO genes complete* - Only 0.12% of mapped transcripts resulting in frameshift errors
rose genome with HiFi Watch the full presentation: The impact of highly accurate PacBio sequence data on the assembly of a tetraploid rose “We managed to assemble a heterozygous, polyploid genome, without the need for ultra high molecular weight DNA, which is required for a lot of other long-read sequencing”
genome with HiFi compared to Long Reads Metrics Cannabis Long Reads Cannabis HiFi Primary Alt. Haplotype Primary Alt. Haplotype Assembly size 999 Mb 184.7 Mb 991 Mb 290 Mb Contig N50 3.5 Mb 0.2 Mb 8.6 Mb 0.69 Mb BUSCO Complete 97.4% 24.4% 98.3% 40.1% CPU Hours 8,326 - 248 - HiFi sequencing assembled more of the alternative haplotype, captured more complete genes, and took >33-times less CPU hours. All Cannabis genomes made public by Kevin McKernan at Medicinal Genomics: https://www.medicinalgenomics.com/jamaican-lion-data-release/
reads with minimum accuracy of Q20 (99%) -High contiguity and base quality genome assemblies -Small file sizes and fast analysis time -Assemble up to a 2.5 Gb genome in a single SMRT Cell 8M for ~$1,300 -Run up to 200 samples (2.5 Gb) per year, per Sequel II System