Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tools and techniques for single-cell RNA sequencing data

Luke Zappia
March 22, 2019

Tools and techniques for single-cell RNA sequencing data

RNA sequencing of individual cells allows us to take a snapshot of the dynamic processes within a cell and explore the differences between cell types. As this technology has developed over the last few years it has been rapidly adopted by researchers in areas such as developmental biology. Along with the development of protocols for producing this data has been a simultaneous burst in the development of computational methods for analysing it. My thesis explores the computational tools and techniques for analysing single-cell RNA-sequencing data. I will present a database that charts the release of analysis software (https://scrna-tools.org), Splatter, a software package for easily simulating single-cell datasets from multiple models (http://bioconductor.org/packages/splatter/) and clustering trees, a visualisation approach for inspecting clustering results at multiple resolutions (https://CRAN.R-project.org/package=clustree). In the final part of my thesis, I use an analysis of a kidney organoid dataset to demonstrate and compare some of the current analysis methods.

Luke Zappia

March 22, 2019
Tweet

More Decks by Luke Zappia

Other Decks in Science

Transcript

  1. TOOLS AND TECHNIQUES
    FOR SINGLE-CELL
    RNA SEQUENCING DATA
    LUKE ZAPPIA

    View Slide

  2. 1 Introduction
    2 Tools
    3 Simulations
    4 Clustering trees
    5 Analysis
    6 Conclusion
    1
    2
    3
    4
    5
    6

    View Slide

  3. Introduction
    Matthew Daniels via The Cell Image Library
    http://www.cellimagelibrary.org/images/38912
    1

    View Slide

  4. View Slide

  5. ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Sample 1
    A 43
    B 3
    C 17
    D 24
    RNA sequencing

    View Slide

  6. Alignment-free
    quantification
    Alignment
    Counting
    Aligned reads
    Raw reads
    Reference
    genome
    Gene
    annotation
    Reference
    transcriptome
    Expression matrix
    Normalisation
    Differential expression testing
    Gene set testing
    Visualisation
    Gene sets
    Interpretation
    Quality control

    View Slide

  7. Svensson et al. DOI: 10.1038/nprot.2017.149

    View Slide

  8. mccarrolllab.com/dropseq/
    Droplet cell capture

    View Slide

  9. scRNA-seq
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    ACTGACTCCA
    TCAGTACTGA
    CGTGTCATAG
    GATTGACCTA
    Gene Cell 1 Cell 2 Cell 3 Cell 4
    A 12 10 9 0
    B 0 0 0 1
    C 9 6 0 0
    D 7 0 4 0

    View Slide

  10. View Slide

  11. View Slide

  12. Kidney development
    Images from OpenStax College, CC BY 3.0 via Wikimedia Commons

    View Slide

  13. Kidney organoids
    Day 0 4
    7 10 18 25
    CHIR FGF9
    FGF9
    CHIR
    Form pellets
    No GF
    iPSCs organoid

    View Slide

  14. Aims
    1. Understand the computational tools used to
    analyse scRNA-seq data
    2. Contribute to tool development
    3. Apply tools to a kidney organoid dataset

    View Slide

  15. Tools
    Andres J Garcia and Ankur Singh via The Cell Image Library
    2
    http://www.cellimagelibrary.org/images/44701

    View Slide

  16. View Slide

  17. www. .org

    View Slide

  18. View Slide

  19. Number of tools Publication status

    View Slide

  20. Software licenses Platforms

    View Slide

  21. Analysis categories

    View Slide

  22. Users over time
    By country Top 10 By continent

    View Slide

  23. “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database”
    PLoS Computational Biology (2018) DOI: 10.1371/journal.pcbi.1006245

    View Slide

  24. Simulations
    3
    S. Schuller via The Cell Image Library
    http://www.cellimagelibrary.org/images/38903

    View Slide

  25. Simulations
    Provide a truth to test against
    But
    - Often poorly documented and explained
    - Not easily reproducible or reusable
    - Don’t demonstrate similarity to real data

    View Slide

  26. Bioconductor package
    Consistent, easy-to-use interface
    Multiple simulation models

    View Slide

  27. Splat model
    Negative binomial
    Expression outliers
    Defined library sizes
    Mean-variance trend
    Dropout

    View Slide

  28. Other models
    Simple - Negative binomial
    Lun - NB with cell factors
    DOI: 10.1186/s13059-016-0947-7
    Lun2 - Sampled NB with batch effects
    DOI: 10.1093/biostatistics/kxw055
    scDD - NB with bimodality
    DOI: 10.1186/s13059-016-1077-y
    BASiCS - NB with spike-ins
    DOI: 10.1371/journal.pcbi.1004333
    mfa - Bifurcating pseudotime trajectory
    DOI: 10.12688/wellcomeopenres.11087.1
    PhenoPath - Pseudotime with gene types
    DOI: 10.1038/s41467-018-04696-6
    ZINB-WaVE - Sophisticated ZINB
    DOI: 10.1186/s13059-018-1406-4
    SparseDC - Clusters across two conditions
    DOI: 10.1093/nar/gkx1113

    View Slide

  29. Real data Parameters Dataset
    Estimation Simulation
    params1 <- splatEstimate(real.data)
    params2 <- simpleEstimate(real.data)
    sim1 <- splatSimulate(params1, ...)
    sim2 <- simpleSimulate(params2, ...)
    datasets <- list(Real = real.data,
    Splat = sim1,
    Simple = sim2)
    comp <- compareSCESets(datasets)
    diff <- diffSCESets(datasets, ref = “Real”)
    1. Estimate
    2. Simulate
    3. Compare

    View Slide

  30. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Real
    Mean log
    2
    (CPM + 1)
    Distribution of mean expression
    ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Rank Difference Mean log
    2
    (CPM + 1)
    Difference in mean expression

    View Slide

  31. ZINB-WaVE
    SparseDC
    PhenoPath
    mfa
    BASiCS
    scDD
    Lun2 (ZINB)
    Lun2
    Lun
    Simple
    Splat (Drop)
    Splat
    Mean
    Variance
    Mean-Variance
    Library size
    % Zeros (Cell)
    % Zeros (Gene)
    Mean-Zeros
    Rank of MAD from real data

    View Slide

  32. Complex simulations
    Groups Batches Paths

    View Slide

  33. “Splatter: simulation of single-cell RNA sequencing data”
    Genome Biology (2017) DOI: 10.1186/s13059-017-1305-0

    View Slide

  34. Clustering trees
    4
    http://www.cellimagelibrary.org/images/40483
    M Uhlen et al. via The Cell Image Library

    View Slide

  35. How many clusters?
    Low resolution
    (fewer clusters)
    High resolution
    (more clusters)

    View Slide

  36. A tree of clusters?

    View Slide

  37. View Slide

  38. View Slide

  39. Weighting edges
    In proportion =
    Number of cells
    on edge
    Number of cells in
    higher res cluster

    View Slide

  40. View Slide

  41. View Slide

  42. Gene expression

    View Slide

  43. Cell cycle SC3 stability
    Number of
    genes

    View Slide

  44. View Slide

  45. t-SNE 2
    t-SNE 1
    t-SNE 1
    t-SNE 2

    View Slide

  46. “Clustering trees: a visualisation for evaluating clusterings at multiple resolutions”
    GigaScience (2018) DOI: gigascience/giy083

    View Slide

  47. Analysis
    Natalie Prigozhina via The Cell Image Library
    http://www.cellimagelibrary.org/images/48101
    5

    View Slide

  48. GATA3
    ECAD
    LTL
    WT1
    CD +
    DT +
    PT +
    Glo

    View Slide

  49. Dataset
    4 Organoids
    10x Chromium
    2 Batches (3 + 1)
    7937 cells (6649 + 1288)
    Identify cell types
    Alignment
    Quantification
    Quality control
    Integration
    Clustering
    Gene detection
    CellRanger
    CellRanger
    scater
    Seurat
    Seurat
    Seurat
    Analysis steps

    View Slide

  50. Stroma Endothelium
    Cell cycle Podocyte
    Epithelium

    View Slide

  51. Glial
    Neural progenitor
    Muscle
    progenitor

    View Slide

  52. Human dataset
    16 week fetal kidney
    3178 cells
    10x Chromium
    Lindström et al. “Conserved and Divergent Features of Mesenchymal Progenitor
    Cell Types within the Cortical Nephrogenic Niche of the Human and Mouse Kidney”
    J Am Soc Nephrol (2018) DOI:10.1681/ASN.2017080890

    View Slide

  53. Stroma
    Fetal kidney
    Stroma
    Organoid
    Stroma
    Stroma
    Endothelium
    Cell cycle
    Nephron progenitor
    Podocyte
    Cell cycle
    Stroma
    Nephron
    Glial
    Immune
    Blood
    Neural progenitor
    Podocyte
    1000
    750
    500
    250
    0
    1250
    1000
    750
    500
    250
    0
    1250
    Number of cells

    View Slide

  54. “Single-cell analysis reveals congruence between kidney organoids and human fetal kidney”
    Genome Medicine (2019) DOI: 10.1186/s13073-019-0615-0

    View Slide

  55. What if we did
    things
    differently?

    View Slide

  56. Droplet selection
    ~ 1 million droplets

    View Slide

  57. View Slide

  58. Quality control
    Manual thresholds PCA based

    View Slide

  59. Gene selection
    Seurat

    View Slide

  60. Gene selection
    M3Drop

    View Slide

  61. Gene selection
    Overlap
    Seurat only
    M3Drop only
    Both

    View Slide

  62. Comparison

    View Slide

  63. Marker genes

    View Slide

  64. Partition-based graph abstraction
    Cell graph PAGA cluster graph

    View Slide

  65. Cell velocity
    AAAAAAA
    Unspliced RNA
    Mature mRNA

    View Slide

  66. Podocyte
    Epithelial
    Endothelial
    Stroma
    Neural
    Muscle
    Immune?

    View Slide

  67. Summary
    New droplet selection methods can give many more cells
    Seurat clustering is robust to gene selection
    Possible immune-like population
    Alternative methods can help interpretation

    View Slide

  68. Natalie Prigozhina via The Cell Image Library
    http://www.cellimagelibrary.org/images/48108
    Conclusion
    6

    View Slide

  69. What did I do?
    Build a database of scRNA-seq analysis tools and a website to interact
    with it
    Develop a software package for simulating scRNA-seq data and a
    flexible simulation model
    Design an algorithm for visualising clustering at multiple resolutions
    and a software package that implements it
    Perform an analysis of kidney organoid data to profile the cell types
    present and demonstrate the effect of different tools and decisions

    View Slide

  70. What next?
    Bigger datasets, more computation
    Convergence on methods that work
    Continued development of software
    Reference datasets
    Integration of data types
    Spatial transcriptomics

    View Slide

  71. Acknowledgments
    Supervisors
    Alicia Oshlack
    Melissa Little
    Committee
    Andrew Pask
    Christine Wells
    Edmund Crampin
    Everyone that makes their tools and data available
    MCRI Bioinformatics
    Belinda Phipson
    Breon Schmidt
    MCRI KDDR
    Alex Combes
    COMBINE
    Friends and family
    Developers
    dplyr, ggplot2, scater,
    scran, Seurat, workflowr,
    rmarkdown, knitr,
    tidygraph, ggraph, edgeR...

    View Slide

  72. install.packages(“clustree”)
    Paper: 10.1093/gigascience/giy083
    Paper: doi.org/10.1186/s13059-017-1305-0
    biocLite(“splatter”)
    Paper: doi.org/10.1093/gigascience/giy083
    www.scRNA-tools.org
    oshlacklab.com/combes-organoid-paper
    Paper: doi.org/10.1186/s13073-019-0615-0
    @_lazappi_
    oshlacklab.com
    github.com/lazappi

    View Slide