Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tools and techniques for single-cell RNA sequencing data

Luke Zappia
March 22, 2019

Tools and techniques for single-cell RNA sequencing data

RNA sequencing of individual cells allows us to take a snapshot of the dynamic processes within a cell and explore the differences between cell types. As this technology has developed over the last few years it has been rapidly adopted by researchers in areas such as developmental biology. Along with the development of protocols for producing this data has been a simultaneous burst in the development of computational methods for analysing it. My thesis explores the computational tools and techniques for analysing single-cell RNA-sequencing data. I will present a database that charts the release of analysis software (https://scrna-tools.org), Splatter, a software package for easily simulating single-cell datasets from multiple models (http://bioconductor.org/packages/splatter/) and clustering trees, a visualisation approach for inspecting clustering results at multiple resolutions (https://CRAN.R-project.org/package=clustree). In the final part of my thesis, I use an analysis of a kidney organoid dataset to demonstrate and compare some of the current analysis methods.

Luke Zappia

March 22, 2019
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. TOOLS AND TECHNIQUES
    FOR SINGLE-CELL
    RNA SEQUENCING DATA
    LUKE ZAPPIA

    View full-size slide

  2. 1 Introduction
    2 Tools
    3 Simulations
    4 Clustering trees
    5 Analysis
    6 Conclusion
    1
    2
    3
    4
    5
    6

    View full-size slide

  3. Introduction
    Matthew Daniels via The Cell Image Library
    http://www.cellimagelibrary.org/images/38912
    1

    View full-size slide

  4. ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Sample 1
    A 43
    B 3
    C 17
    D 24
    RNA sequencing

    View full-size slide

  5. Alignment-free
    quantification
    Alignment
    Counting
    Aligned reads
    Raw reads
    Reference
    genome
    Gene
    annotation
    Reference
    transcriptome
    Expression matrix
    Normalisation
    Differential expression testing
    Gene set testing
    Visualisation
    Gene sets
    Interpretation
    Quality control

    View full-size slide

  6. Svensson et al. DOI: 10.1038/nprot.2017.149

    View full-size slide

  7. mccarrolllab.com/dropseq/
    Droplet cell capture

    View full-size slide

  8. scRNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 0 1
    C 9 6 0 0
    D 7 0 4 0

    View full-size slide

  9. Kidney development
    Images from OpenStax College, CC BY 3.0 via Wikimedia Commons

    View full-size slide

  10. Kidney organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid

    View full-size slide

  11. Aims
    1. Understand the computational tools used to
    analyse scRNA-seq data
    2. Contribute to tool development
    3. Apply tools to a kidney organoid dataset

    View full-size slide

  12. Tools
    Andres J Garcia and Ankur Singh via The Cell Image Library
    2
    http://www.cellimagelibrary.org/images/44701

    View full-size slide

  13. Number of tools Publication status

    View full-size slide

  14. Software licenses Platforms

    View full-size slide

  15. Analysis categories

    View full-size slide

  16. Users over time
    By country Top 10 By continent

    View full-size slide

  17. “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”
    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

    View full-size slide

  18. Simulations
    3
    S. Schuller via The Cell Image Library
    http://www.cellimagelibrary.org/images/38903

    View full-size slide

  19. Simulations
    Provide a truth to test against
    But
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View full-size slide

  20. Bioconductor package
    Consistent, easy-to-use interface
    Multiple simulation models

    View full-size slide

  21. Splat model
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View full-size slide

  22. Other models
    Simple - Negative binomial
    Lun - NB with cell factors
    DOI: 10.1186/s13059-016-0947-7
    Lun2 - Sampled NB with batch effects
    DOI: 10.1093/biostatistics/kxw055
    scDD - NB with bimodality
    DOI: 10.1186/s13059-016-1077-y
    BASiCS - NB with spike-ins
    DOI: 10.1371/journal.pcbi.1004333
    mfa - Bifurcating pseudotime trajectory
    DOI: 10.12688/wellcomeopenres.11087.1
    PhenoPath - Pseudotime with gene types
    DOI: 10.1038/s41467-018-04696-6
    ZINB-WaVE - Sophisticated ZINB
    DOI: 10.1186/s13059-018-1406-4
    SparseDC - Clusters across two conditions
    DOI: 10.1093/nar/gkx1113

    View full-size slide

  23. Real data Parameters Dataset
    Estimation Simulation
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    1. Estimate
    2. Simulate
    3. Compare

    View full-size slide

  24. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Real
    Mean log
    2
    (CPM + 1)
    Distribution of mean expression
    ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Rank Difference Mean log
    2
    (CPM + 1)
    Difference in mean expression

    View full-size slide

  25. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Mean
    Variance
    Mean-Variance
    Library size
    % Zeros (Cell)
    % Zeros (Gene)
    Mean-Zeros
    Rank of MAD from real data

    View full-size slide

  26. Complex simulations
    Groups Batches Paths

    View full-size slide

  27. “Splatter: simulation of single-cell RNA sequencing data”
    Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

    View full-size slide

  28. Clustering trees
    4
    http://www.cellimagelibrary.org/images/40483
    M Uhlen et al. via The Cell Image Library

    View full-size slide

  29. How many clusters?
    Low resolution
    (fewer clusters)
    High resolution
    (more clusters)

    View full-size slide

  30. A tree of clusters?

    View full-size slide

  31. Weighting edges
    In proportion =
    Number of cells
    on edge
    Number of cells in
    higher res cluster

    View full-size slide

  32. Gene expression

    View full-size slide

  33. Cell cycle SC3 stability
    Number of
    genes

    View full-size slide

  34. t-SNE 2
    t-SNE 1
    t-SNE 1
    t-SNE 2

    View full-size slide

  35. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”
    GigaScience (2018) DOI: gigascience/giy083

    View full-size slide

  36. Analysis
    Natalie Prigozhina via The Cell Image Library
    http://www.cellimagelibrary.org/images/48101
    5

    View full-size slide

  37. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View full-size slide

  38. Dataset
    4 Organoids
    10x Chromium
    2 Batches (3 + 1)
    7937 cells (6649 + 1288)
    Identify cell types
    Alignment
    Quantification
    Quality control
    Integration
    Clustering
    Gene detection
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Seurat
    Analysis steps

    View full-size slide

  39. Stroma Endothelium
    Cell cycle Podocyte
    Epithelium

    View full-size slide

  40. Glial
    Neural progenitor
    Muscle
    progenitor

    View full-size slide

  41. Human dataset
    16 week fetal kidney
    3178 cells
    10x Chromium
    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor
    Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney”
    J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

    View full-size slide

  42. Stroma
    Fetal kidney
    Stroma
    Organoid
    Stroma
    Stroma
    Endothelium
    Cell cycle
    Nephron progenitor
    Podocyte
    Cell cycle
    Stroma
    Nephron
    Glial
    Immune
    Blood
    Neural progenitor
    Podocyte
    1000
    750
    500
    250
    0
    1250
    1000
    750
    500
    250
    0
    1250
    Number of cells

    View full-size slide

  43. “Single-cell analysis reveals congruence between kidney organoids and human fetal kidney”
    Genome Medicine (2019) DOI: 10.1186/s13073-019-0615-0

    View full-size slide

  44. What if we did
    things
    differently?

    View full-size slide

  45. Droplet selection
    ~ 1 million droplets

    View full-size slide

  46. Quality control
    Manual thresholds PCA based

    View full-size slide

  47. Gene selection
    Seurat

    View full-size slide

  48. Gene selection
    M3Drop

    View full-size slide

  49. Gene selection
    Overlap
    Seurat only
    M3Drop only
    Both

    View full-size slide

  50. Marker genes

    View full-size slide

  51. Partition-based graph abstraction
    Cell graph PAGA cluster graph

    View full-size slide

  52. Cell velocity
    AAAAAAA
    Unspliced RNA
    Mature mRNA

    View full-size slide

  53. Podocyte
    Epithelial
    Endothelial
    Stroma
    Neural
    Muscle
    Immune?

    View full-size slide

  54. Summary
    New droplet selection methods can give many more cells
    Seurat clustering is robust to gene selection
    Possible immune-like population
    Alternative methods can help interpretation

    View full-size slide

  55. Natalie Prigozhina via The Cell Image Library
    http://www.cellimagelibrary.org/images/48108
    Conclusion
    6

    View full-size slide

  56. What did I do?
    Build a database of scRNA-seq analysis tools and a website to interact
    with it
    Develop a software package for simulating scRNA-seq data and a
    flexible simulation model
    Design an algorithm for visualising clustering at multiple resolutions
    and a software package that implements it
    Perform an analysis of kidney organoid data to profile the cell types
    present and demonstrate the effect of different tools and decisions

    View full-size slide

  57. What next?
    Bigger datasets, more computation
    Convergence on methods that work
    Continued development of software
    Reference datasets
    Integration of data types
    Spatial transcriptomics

    View full-size slide

  58. Acknowledgments
    Supervisors
    Alicia Oshlack
    Melissa Little
    Committee
    Andrew Pask
    Christine Wells
    Edmund Crampin
    Everyone that makes their tools and data available
    MCRI Bioinformatics
    Belinda Phipson
    Breon Schmidt
    MCRI KDDR
    Alex Combes
    COMBINE
    Friends and family
    Developers
    dplyr, ggplot2, scater,
    scran, Seurat, workflowr,
    rmarkdown, knitr,
    tidygraph, ggraph, edgeR...

    View full-size slide

  59. install.packages(“clustree”)
    Paper: 10.1093/gigascience/giy083
    Paper: doi.org/10.1186/s13059-017-1305-0
    biocLite(“splatter”)
    Paper: doi.org/10.1093/gigascience/giy083
    www.scRNA-tools.org
    oshlacklab.com/combes-organoid-paper
    Paper: doi.org/10.1186/s13073-019-0615-0
    @_lazappi_
    oshlacklab.com
    github.com/lazappi

    View full-size slide