Slide 1

Slide 1 text

Simulation and analysis tools for single-cell RNA sequencing data Luke Zappia @_lazappi_ #gi2017

Slide 2

Slide 2 text

Single-cell RNA-seq ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA Gene Cell Cell Cell Cell A B C D

Slide 3

Slide 3 text

Svensson et al. arXiv . ,

Slide 4

Slide 4 text

Gene Cell Cell Cell Cell A B C D Bad cell? Low expression? Cell type specific? Cell cycle? Dropout?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

www. .org

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Provide a truth to test against BUT - Often poorly documented and explained - Not easily reproducible or reusable - Don’t demonstrate similarity to real data Simulations

Slide 12

Slide 12 text

“Splatter: simulation of single-cell RNA sequencing data.” Genome Biology ( ) DOI: . /s - - -

Slide 13

Slide 13 text

Splat Negative binomial Expression outliers Defined library sizes Mean-variance trend Dropout

Slide 14

Slide 14 text

Simple - Negative binomial Lun - NB with cell factors Lun ATL, Bach K, Marioni JC. Genome Biology ( ). DOI: . /s - - - . Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. Biostatistics ( ). DOI: . /biostatistics/kxw . scDD - NB with bimodality Korthauer KD, et al. Genome Biology ( ). DOI: . /s - - -y. BASiCS - NB with spike-ins Vallejos CA, Marioni JC, Richardson S. PLoS Comp. Bio. ( ). DOI: . /journal.pcbi. . Simulations

Slide 15

Slide 15 text

. Estimate . Simulate . Compare params1 <- splatEstimate(real.data) params2 <- simpleEstimate(real.data) sim1 <- splatSimulate(params1, ...) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets) diff <- diffSCESets(datasets, ref = “Real”) Using Splatter

Slide 16

Slide 16 text

Real data HapMap individuals plates each random cells Tung P-Y et al. Sci. Rep. ( ) DOI: . /srep A A A A B B B B C C C C Tung et al. iPSCs, C capture

Slide 17

Slide 17 text

Means Difference in Means

Slide 18

Slide 18 text

Zeros per cell Difference in zeros

Slide 19

Slide 19 text

Mean-zeros Difference

Slide 20

Slide 20 text

Rank

Slide 21

Slide 21 text

Rank Full-length Full-length

Slide 22

Slide 22 text

Complex simulations Groups Batches Paths

Slide 23

Slide 23 text

SingleCellExperiment Batch effects Simulations - BASiCS - mfa - PhenoPath - ZINB-WaVE New in Splatter . . Bioconductor 3.6

Slide 24

Slide 24 text

Many tools for scRNA-seq analysis Catalogued in the scRNA-tools database Can be tested using synthetic datasets Splatter is our package for simulating scRNA-seq data Summary

Slide 25

Slide 25 text

@_lazappi_ oshlacklab.com Supervisors Alicia Oshlack Melissa Little MCRI Bioinformatics Belinda Phipson Breon Schmidt Everyone that makes tools and data available www.scRNA-tools.org @scRNAtools “Splatter: simulation of single-cell RNA sequencing data.” Genome Biology ( ) DOI: . /s - - - “Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database” bioRxiv ( ) DOI: . / bioconductor.org/packages/ splatter