Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Successful scRNA-seq analysis

Successful scRNA-seq analysis

A brief introduction to how single-cell technologies work, how to plan a successful experiment (from an analyst's point of view), the steps in a standard scRNA-seq analysis and touching on some more advanced topics. Presented at the ILC Summer School 2022.

Luke Zappia

June 30, 2022

More Decks by Luke Zappia

Other Decks in Science


  1. Postdoctoral researcher (Theis Lab, Helmholtz Munich) Chemistry, Informatics, Bioinformatics scRNA-seq

    - Methods development - Software development - Benchmarking - Data analysis @_lazappi_ @lazappi lazappi.id.au Luke Zappia
  2. Apply machine learning to biological data scRNA-seq - Integration and

    perturbations - Modelling of transitions - Multimodal analysis Theis Lab @fabian_theis @ICBmunich www.comp.bio
  3. 1. What is scRNA-seq? 2. Designing an scRNA-seq experiment 3.

    Standard scRNA-seq analysis 4. Advanced analysis topics
  4. Single-cell capture Droplet-based Plate/well-based More cells Easier UMI Fewer cells

    Custom setup Full length, higher depth More flexible
  5. UMI vs full-length Unique Molecular Identifiers 5’ AAAA (PCR){BARCODE}[UMI]TTTT Full-length

    Better quantification Less sequencing No gene-length bias Full coverage More sequencing Affected by gene length
  6. Extensions Protein expression (CITE-seq, feature barcoding) Chromatin accessibility (scATAC-seq, 10x

    Multiome) Spatial location (10 Visium, MERFISH) Immune receptors (TCR/BCR profiling) Methylation, CRISPR screens, electrophysiology,... Pre-sorting (FACS to enrich target cells)
  7. CITE-seq Simultaneous measurement of RNA and protein expression - Protein

    ≠ RNA Uses nucleotide-tagged antibodies Targets need to be carefully selected Particularly useful for PBMCs
  8. Multiplexing Genetic multiplexing Easier but requires genetic diversity and reference

    panels Cell hashing More complex but can be more flexible More samples, less batch effects
  9. Comparison to bulk Gives insight into cellular variability Avoids the

    composition problem Much more complex analysis Much noisier Much sparser - But UMI data isn’t zero inflated!
  10. What is the question? What do you want to answer

    with this experiment? - Not necessarily an hypothesis Experimentalists should have a clear idea that is refined with input from analysts - Discuss everything that is relevant PIs and external collaborators need to be on board
  11. Things to consider Cells are not replicates! - Proper analysis

    requires multiple samples from each condition Avoid confounding batches and conditions - How will the samples be multiplexed? What are your controls? How rare are the cells you are interested in? Are you using the right assay?
  12. How long will it take? Experiments take time, so does

    analysis - Often getting results takes longer than generating data Simpler experiments with clearer questions are quicker and easier to analyse You will be likely be competing with other projects, good relationships are key!
  13. Make a plan What is the question? What is the

    design? Who is involved? What is everyone’s role (authorship)? What if somebody leaves? What is the timeline? How is it funded? Write it down!
  14. Tips for good collaborations Involve everyone in the process -

    Give everyone ownership over the project Good, clear communication - Keep everyone in the loop Share all the (relevant) data - If you did FACS, share the measurements Keep good records - Complete, consistent, machine-readable metadata
  15. Alignment and quantification 1. Align to reference genome 2. Compare

    to gene annotation 3. Deduplication Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0 4. Quantification
  16. Which ecosystem? They all have strengths and weaknesses Possible to

    convert between them Use whatever is best for the task For simple tasks use whichever is easiest
  17. Which tool? Independent benchmarks are the best measure of performance

    Try commonly used tools first Look for good documentation/maintenance Prefer tools that can be installed from major repositories Read more than just the introductory tutorial - Paper, package documentation
  18. Quality control Not every droplet contains a cell Not every

    cell is in good condition Not every cell is informative Not every cell is a single cell Sometimes whole samples can be low-quality
  19. Normalisation Correct for technical differences between cells (number of counts)

    Most commonly used is simple (log) depth normalisation scran can compute more sophisticated size factors Seurat provides a regression-based method called sctransform Other options…
  20. Integration Top performer in benchmarks Well-documented, maintained, easy-to-use package Able

    to map new samples Models for different modalities *Personal opinion, other packages can also produce good results
  21. Clustering Group cells based on similar expression profiles Graph-based algorithms

    are most common Selecting a clustering resolution is difficult Sub-clustering often required No clustering is perfect
  22. Visualisation 2D embeddings are the most common visualisation - t-SNE,

    UMAP etc. Can be useful BUT: - Easy to overinterpret - Hides lots of complexity - Potentially misleading
  23. Annotation Maybe the most difficult part of the process Usually

    relies on interpreting marker genes (and iteratively clustering) Prior knowledge can help: - Automatic classification - Label transfer - Gene sets (maybe)
  24. Explore the data Always look at the output of each

    step - Make sure you understand what it has done - Every method will produce an output, that doesn’t mean it makes any sense Make lots of plots! - Use these to make decisions
  25. Differential expression Differences in expression between conditions Multiple benchmarks show

    that “pseudobulk” analysis performs best Models sample level variation Arbitrarily complex models Benefit from 10+ years of development vs
  26. Multimodal analysis Analysis of multiple different measurements Can provide more

    context and insight… …but methods are still developing Depends on the modalities and the question Unclear whether combined modelling is useful or it’s better to analyse each modality and combine the results
  27. Resources Current best practices in single-cell RNA-seq analysis: a tutorial

    Malte Lücken, Fabian Theis DOI: 10.15252/msb.20188746 Extended best practices - Theis Lab (and the community) https://github.com/theislab/extended-single-cell-best-practices Orchestrating Single-Cell Analysis with Bioconductor https://bioconductor.org/books/release/OSCA/ Seurat documentation https://satijalab.org/seurat/ Scanpy documentation https://scanpy.readthedocs.io/en/stable/ scverse community https://scverse.org/ scRNA-tools https://scRNA-tools.org/ Open Problems in Single-Cell Analysis https://openproblems.bio/
  28. Acknowledgements Theis lab Twitter Everyone who has written documentation, tutorials

    etc. Everyone has developed tools and made their code available