Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Successful scRNA-seq analysis

Successful scRNA-seq analysis

A brief introduction to how single-cell technologies work, how to plan a successful experiment (from an analyst's point of view), the steps in a standard scRNA-seq analysis and touching on some more advanced topics. Presented at the ILC Summer School 2022.

Luke Zappia

June 30, 2022

More Decks by Luke Zappia

Other Decks in Science


  1. Successful scRNA-seq analysis ILC Summer School 2022

  2. Postdoctoral researcher (Theis Lab, Helmholtz Munich) Chemistry, Informatics, Bioinformatics scRNA-seq

    - Methods development - Software development - Benchmarking - Data analysis @_lazappi_ @lazappi lazappi.id.au Luke Zappia
  3. Apply machine learning to biological data scRNA-seq - Integration and

    perturbations - Modelling of transitions - Multimodal analysis Theis Lab @fabian_theis @ICBmunich www.comp.bio
  4. 1. What is scRNA-seq? 2. Designing an scRNA-seq experiment 3.

    Standard scRNA-seq analysis 4. Advanced analysis topics
  5. 1. What is scRNA-seq?

  6. single-cell RNA sequencing

  7. Why single-cell?

  8. Single-cell capture Droplet-based Plate/well-based More cells Easier UMI Fewer cells

    Custom setup Full length, higher depth More flexible
  9. mccarrolllab.com/dropseq/ Macosko et al. DOI: 10.1016/j.cell.2015.05.002

  10. UMI vs full-length Unique Molecular Identifiers 5’ AAAA (PCR){BARCODE}[UMI]TTTT Full-length

    Better quantification Less sequencing No gene-length bias Full coverage More sequencing Affected by gene length
  11. Extensions Protein expression (CITE-seq, feature barcoding) Chromatin accessibility (scATAC-seq, 10x

    Multiome) Spatial location (10 Visium, MERFISH) Immune receptors (TCR/BCR profiling) Methylation, CRISPR screens, electrophysiology,... Pre-sorting (FACS to enrich target cells)
  12. CITE-seq Simultaneous measurement of RNA and protein expression - Protein

    ≠ RNA Uses nucleotide-tagged antibodies Targets need to be carefully selected Particularly useful for PBMCs
  13. Multiplexing Genetic multiplexing Easier but requires genetic diversity and reference

    panels Cell hashing More complex but can be more flexible More samples, less batch effects
  14. Comparison to bulk Gives insight into cellular variability Avoids the

    composition problem Much more complex analysis Much noisier Much sparser - But UMI data isn’t zero inflated!
  15. 2. Experimental design

  16. Who should be involved? Experimentalists Bioinformaticians PIs Collaborators

  17. What is the question? What do you want to answer

    with this experiment? - Not necessarily an hypothesis Experimentalists should have a clear idea that is refined with input from analysts - Discuss everything that is relevant PIs and external collaborators need to be on board
  18. Things to consider Cells are not replicates! - Proper analysis

    requires multiple samples from each condition Avoid confounding batches and conditions - How will the samples be multiplexed? What are your controls? How rare are the cells you are interested in? Are you using the right assay?
  19. Example designs Exploratory Case/control Multiple conditions Time series Cohort study

    Many others…
  20. How long will it take? Experiments take time, so does

    analysis - Often getting results takes longer than generating data Simpler experiments with clearer questions are quicker and easier to analyse You will be likely be competing with other projects, good relationships are key!
  21. Make a plan What is the question? What is the

    design? Who is involved? What is everyone’s role (authorship)? What if somebody leaves? What is the timeline? How is it funded? Write it down!
  22. Tips for good collaborations Involve everyone in the process -

    Give everyone ownership over the project Good, clear communication - Keep everyone in the loop Share all the (relevant) data - If you did FACS, share the measurements Keep good records - Complete, consistent, machine-readable metadata
  23. 3. Standard analysis

  24. @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Gene Cell 1 Cell 2 Cell

    3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0 ?
  25. Alignment and quantification 1. Align to reference genome 2. Compare

    to gene annotation 3. Deduplication Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0 4. Quantification
  26. Over 1300 scRNA-seq tools www.scRNA-tools.org

  27. Ecosystems scverse

  28. Which ecosystem? They all have strengths and weaknesses Possible to

    convert between them Use whatever is best for the task For simple tasks use whichever is easiest
  29. Which tool? Independent benchmarks are the best measure of performance

    Try commonly used tools first Look for good documentation/maintenance Prefer tools that can be installed from major repositories Read more than just the introductory tutorial - Paper, package documentation
  30. Quality control Not every droplet contains a cell Not every

    cell is in good condition Not every cell is informative Not every cell is a single cell Sometimes whole samples can be low-quality
  31. Quality control Cell selection Cell filtering

  32. Normalisation Correct for technical differences between cells (number of counts)

    Most commonly used is simple (log) depth normalisation scran can compute more sophisticated size factors Seurat provides a regression-based method called sctransform Other options…
  33. Integration Remove technical effects between batches* *Deciding what a “batch”

    is can be difficult
  34. Integration Top performer in benchmarks Well-documented, maintained, easy-to-use package Able

    to map new samples Models for different modalities *Personal opinion, other packages can also produce good results
  35. Clustering Group cells based on similar expression profiles Graph-based algorithms

    are most common Selecting a clustering resolution is difficult Sub-clustering often required No clustering is perfect
  36. Visualisation 2D embeddings are the most common visualisation - t-SNE,

    UMAP etc. Can be useful BUT: - Easy to overinterpret - Hides lots of complexity - Potentially misleading
  37. Marker genes Genes that are specifically expressed in a cluster

  38. Annotation Maybe the most difficult part of the process Usually

    relies on interpreting marker genes (and iteratively clustering) Prior knowledge can help: - Automatic classification - Label transfer - Gene sets (maybe)
  39. Explore the data Always look at the output of each

    step - Make sure you understand what it has done - Every method will produce an output, that doesn’t mean it makes any sense Make lots of plots! - Use these to make decisions
  40. 4. Advanced analysis

  41. Differential expression Differences in expression between conditions Multiple benchmarks show

    that “pseudobulk” analysis performs best Models sample level variation Arbitrarily complex models Benefit from 10+ years of development vs
  42. Differential abundance Differences in cell type proportions between conditions Condition

    1 Condition 2 vs
  43. Trajectories Analysis of continuous processes Pseudotime RNA velocity

  44. Multimodal analysis Analysis of multiple different measurements Can provide more

    context and insight… …but methods are still developing Depends on the modalities and the question Unclear whether combined modelling is useful or it’s better to analyse each modality and combine the results
  45. Questions?

  46. Resources Current best practices in single-cell RNA-seq analysis: a tutorial

    Malte Lücken, Fabian Theis DOI: 10.15252/msb.20188746 Extended best practices - Theis Lab (and the community) https://github.com/theislab/extended-single-cell-best-practices Orchestrating Single-Cell Analysis with Bioconductor https://bioconductor.org/books/release/OSCA/ Seurat documentation https://satijalab.org/seurat/ Scanpy documentation https://scanpy.readthedocs.io/en/stable/ scverse community https://scverse.org/ scRNA-tools https://scRNA-tools.org/ Open Problems in Single-Cell Analysis https://openproblems.bio/
  47. Acknowledgements Theis lab Twitter Everyone who has written documentation, tutorials

    etc. Everyone has developed tools and made their code available