Reference-free comparative transcriptomics

Reference-free comparative transcriptomics

Single-cell RNA-seq is a powerful technology for identifying novel and known cell types, however its power is limited to organisms with well-annotated genomes. We present a reference-free method to compare single cells both within and across species. In this method, k-mers from each cell’s RNA-seq profile are randomly subsampled into a compressed representation called a “sketch” using document comparison algorithms of MinHash or HyperLogLog. For within-species comparison, the RNA sketches are sufficient, but as protein sequence is more stable across species, we translate the RNA k-mers into protein k-mers with 6-frame translation, discarding all protein k-mers containing stop codons. We show this method can “lift over” single-cell RNA-seq annotations from mouse to human and compare to using purely 1:1 mapping orthologous genes. Thus, k-mer sketches are an efficient method to find shared and unique cell types both within and across species without need for a reference genome or transcriptome.

8d40364a11a4d8fe33e6c5166046506a?s=128

Olga Botvinnik

May 18, 2019
Tweet

Transcript

  1. 2.

    !2 CELLS ARE AN INTERMEDIATE BETWEEN DNA AND PHENOTYPE !2

    Overview Introduction Methods Results Conclusions DNA Phenotype Technoscience.global2.vic.edu.au Background vector created by freepik Cell Tissue Organ Organ System Organism
  2. 3.

    FROM SMOOTHIE (BULK RNA-SEQ) TO FRUIT SALAD (SINGLE-CELL RNA-SEQ) !3

    Bulk RNA-Seq Single cell RNA-seq !3 Overview Introduction Methods Results Conclusions
  3. 4.

    !4 COMPARATIVE SINGLE-CELL TRANSCRIPTOMICS TO BUILD A PHYLOGENETIC TREE OF

    CELL TYPES !4 Cell type 1 Cell type 2 Cell type 3 Cell type 4 Cell type 5 Cell type 6 Cell type 7 Cell type 8 Cell type 9 Cell type 10 • How can cells from organisms without reference genomes be compared to annotated cell types? • When in evolutionary time could a cell type have originally appeared? • Can a new species be defined by the introduction of a new cell type or cell state? Overview Introduction Methods Results Conclusions
  4. 5.

    !5 NOT ALL GENES HAVE A 1:1 EXACT ORTHOLOGUE MATCH

    BETWEEN SPECIES !5 Altschmied, J., et al (2002). Subfunctionalization of duplicate mitf genes associated with differential degeneration of alternative exons in fish. Genetics, 161(1), 259–267. Alternative first exons Single first exon Alternative first exons <50% of human genes have 1:1 orthologue with mouse or zebrafish Overview Introduction Methods Results Conclusions
  5. 6.

    !6 NOT ALL GENES HAVE A 1:1 EXACT ORTHOLOGUE MATCH

    BETWEEN SPECIES !6 Altschmied, J., et al (2002). Subfunctionalization of duplicate mitf genes associated with differential degeneration of alternative exons in fish. Genetics, 161(1), 259–267. Solution: Use protein k-mers created by six-frame translation of RNA k-mers <50% of human genes have 1:1 orthologue with mouse or zebrafish Overview Introduction Methods Results Conclusions Alternative first exons Single first exon Alternative first exons
  6. 7.

    Overview Introduction Methods Results Conclusions !7 A “SKETCH” OF SEQUENCES

    IS A COMPRESSED REPRESENTATION OF THE ENTIRE DATASET !7 Kathe Kollwitz, "Self Portrait", charcoal on brown laid Ingres paper, 1933 Sketch, Wikipedia (2019)
  7. 8.

    !8 COMPRESS A CELL’S CDNA CONTENT TO A “SKETCH” OF

    PROTEIN K-MERS !8 Overview Introduction Methods Results Conclusions
  8. 9.

    Overview Introduction Methods Results Conclusions K-MERS SEPARATE CELL TYPES AND

    K-MER ABUNDANCE ONLY ADDS NOISE !9 k-mer presence/absence Binarized gene expression Gene expression k-mer abundance Observe 1/1000 k-mers Ksize: 27 Molecule: cDNA Nearest neighbor graphs, n_neighbors=5 Mouse Bladder, SmartSeq2 Single-cell RNA seq
  9. 10.

    SPECIES SIGNAL CURRENTLY OUTWEIGHS CELL TYPE SIGNAL !10 k=7 amino

    acids Observe 4096 k-mers per cell Nearest Neighbor graph with n_neighbors=5 Hematopoiesis/Kidney - SmartSeq2/QUARTZ-seq of single cells in: - Mouse Kidney - Zebrafish Kidney Marrow (primary site of hematopoiesis) - Human Bone Marrow (primary site of hematopoiesis) Overview Introduction Methods Results Conclusions Next Steps • Remove species-specific k-mers with term frequency inverse document frequency (TF-IDF)-like method • Model “cell type” and “species” as latent spaces using machine (deep?) learning methods • Use 3-frame translation for stranded RNA-seq data • Compare protein k-mer nearest neighbor graphs to graphs built on gene counts of 1:1 orthologues
  10. 11.

    CONCLUSIONS + NEXT STEPS Conclusions • A few thousand k-mers

    is sufficient to group similar cells within species • Abundance of k-mers only adds noise • Protein k-mers loosely identify cell types across closely related species Next Steps • Remove species-specific k-mers with term frequency inverse document frequency (TF-IDF)-like method • Model “cell type” and “species” as latent spaces using machine (deep?) learning methods • Use 3-frame translation for stranded RNA-seq data • Compare protein k-mer nearest neighbor graphs to graphs built on gene counts of 1:1 orthologues Orthogonal validation • Compare cell type enriched protein k-mers to cell type specific peptides from bottom-up proteomics Want to check it out? Contributions welcome! https://github.com/czbiohub/kmer-hashing !11 Overview Introduction Methods Results Conclusions
  11. 12.

    ACKNOWLEDGEMENTS !12 - Angela Pisco - James Webber - Josh

    Batson - Ashley Maynard - Lincoln Harris - Spyros Darmanis - Paolo Carnevali (CZI) - Giana Cirolia - Phoenix Logan - Shayan Hosseinzadeh - Kalani Ratnasiri - Aaron McGeever - Greg Huber Outside of Biohub (@github) - Sourmash (https://github.com/dib-lab/sourmash/): - C. Titus Brown (@ctb), Luiz Irber (@luizirber), Camille Scott (@camillescott) - Nextflow (https://github.com/nextflow-io/nextflow/): - Paolo Di Tommaso (@pditommaso), @KochTobi, Rad Suchecki (@rsuchecki) - Bamnostic (https://github.com/betteridiot/bamnostic/): - Marcus D Sherman (@betteridiot) Data Sciences Overview Introduction Methods Results Conclusions