Slide 1

Slide 1 text

Simplifying simulation of single-cell RNA-seq Luke Zappia @_lazappi_

Slide 2

Slide 2 text

What is single-cell? Matthew Daniels via The Cell Image Library http://www.cellimagelibrary.org/images/38912

Slide 3

Slide 3 text

ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA BULK SINGLE-CELL BIOLOGY

Slide 4

Slide 4 text

Gene Sample 1 A 43 B 3 C 17 D 24 BULK SINGLE-CELL Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0

Slide 5

Slide 5 text

Analysis Focus on clustering, lineage tracing Currently > 75 available methods goo.gl/4wcVwn github.com/lazappi/single-cell-software

Slide 6

Slide 6 text

A new analysis method should... 1. Show that it can do what it claims 2. Show that it produces insight

Slide 7

Slide 7 text

http://www.cellimagelibrary.org/images/40483 M Uhlen et al. via The Cell Image Library Simulations

Slide 8

Slide 8 text

Gene Cell 1 Cell 2 Cell 3 Cell 4 A B C D

Slide 9

Slide 9 text

Simulations Provide a known truth Allow us to test… ● Effectiveness ● Assumptions ● Relative performance

Slide 10

Slide 10 text

Current simulations Often poorly documented and explained Not easily reproducible or reusable Don’t demonstrate similarity to real data

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Splatter R package Collection of simulation methods Consistent, easy to use, interface Functions for comparison github.com/Oshlack/splatter

Slide 13

Slide 13 text

The Splat simulation Negative binomial Expression outliers Defined library sizes Mean-variance trend Dropout

Slide 14

Slide 14 text

Other simulations Simple - Negative binomial Lun - NB with cell factors Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. bioRxiv (2016). DOI: 10.1101/073973. scDD - NB with bimodality Korthauer KD, et al. bioRxiv (2015). DOI: 10.1101/035501.

Slide 15

Slide 15 text

Using Splatter params1 <- splatEstimate(real.data) sim1 <- splatSimulate(params1, ...) params2 <- simpleEstimate(real.data) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets)

Slide 16

Slide 16 text

The data http://www.cellimagelibrary.org/images/41467 Jan Schmoranzer via The Cell Image Library

Slide 17

Slide 17 text

Subset of data from study looking at design and batch effects by Tung et al. Tung P-Y, et al. bioRxiv (2016). DOI: 10.1101/062919. Single HapMap stem cell line - 3 batches - 221 cells, 13 058 genes Real data

Slide 18

Slide 18 text

Means

Slide 19

Slide 19 text

Variance

Slide 20

Slide 20 text

Mean-variance

Slide 21

Slide 21 text

Library size

Slide 22

Slide 22 text

Zeros

Slide 23

Slide 23 text

What else? http://www.cellimagelibrary.org/images/44701 Andres J Garcia and Ankur Singh via The Cell Image Library

Slide 24

Slide 24 text

Groups

Slide 25

Slide 25 text

Paths

Slide 26

Slide 26 text

Summary Single-cell RNA-seq is an exciting new technology - Lots of analysis methods Simulations can be used to evaluate methods - But often hard to reuse Splatter - R package for simulation and comparison Splat - Simulation method for groups or paths

Slide 27

Slide 27 text

Acknowledgements Alicia Oshlack Melissa Little Belinda Phipson MCRI Bioinformatics oshalacklab.com

Slide 28

Slide 28 text

github.com/Oshlack/splatter github.com/lazappi/single-cell-software @_lazappi_ oshalacklab.com

Slide 29

Slide 29 text

Solution? http://www.cellimagelibrary.org/images/38804 Wellcome Images via The Cell Image Library

Slide 30

Slide 30 text

Library size

Slide 31

Slide 31 text

Means

Slide 32

Slide 32 text

Negative binomial