Simplifying simulation of single-cell RNA-seq

Simplifying simulation of single-cell RNA-seq Luke Zappia @_lazappi_

What is single-cell? Matthew Daniels via The Cell Image Library
http://www.cellimagelibrary.org/images/38912

ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA
CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA ACTGACTCCA TCAGTACTGA CGTGTCATAG GATTGACCTA BULK SINGLE-CELL BIOLOGY

Gene Sample 1 A 43 B 3 C 17 D
24 BULK SINGLE-CELL Gene Cell 1 Cell 2 Cell 3 Cell 4 A 12 10 9 0 B 0 0 1 4 C 9 6 0 0 D 7 0 4 0

Analysis Focus on clustering, lineage tracing Currently > 75 available
methods goo.gl/4wcVwn github.com/lazappi/single-cell-software

A new analysis method should... 1. Show that it can
do what it claims 2. Show that it produces insight

http://www.cellimagelibrary.org/images/40483 M Uhlen et al. via The Cell Image Library
Simulations

Gene Cell 1 Cell 2 Cell 3 Cell 4 A
B C D

Simulations Provide a known truth Allow us to test… •
Effectiveness • Assumptions • Relative performance

Current simulations Often poorly documented and explained Not easily reproducible
or reusable Don’t demonstrate similarity to real data

Splatter R package Collection of simulation methods Consistent, easy to
use, interface Functions for comparison github.com/Oshlack/splatter

The Splat simulation Negative binomial Expression outliers Defined library sizes
Mean-variance trend Dropout

Other simulations Simple - Negative binomial Lun - NB with
cell factors Lun ATL, Bach K, Marioni JC. Genome Biology (2016). DOI: 10.1186/s13059-016-0947-7. Lun 2 - Sampled NB with batch effects Lun ATL, Marioni JC. bioRxiv (2016). DOI: 10.1101/073973. scDD - NB with bimodality Korthauer KD, et al. bioRxiv (2015). DOI: 10.1101/035501.

Using Splatter params1 <- splatEstimate(real.data) sim1 <- splatSimulate(params1, ...) params2
<- simpleEstimate(real.data) sim2 <- simpleSimulate(params2, ...) datasets <- list(Real = real.data, Splat = sim1, Simple = sim2) comp <- compareSCESets(datasets)

The data http://www.cellimagelibrary.org/images/41467 Jan Schmoranzer via The Cell Image Library

Subset of data from study looking at design and batch
effects by Tung et al. Tung P-Y, et al. bioRxiv (2016). DOI: 10.1101/062919. Single HapMap stem cell line - 3 batches - 221 cells, 13 058 genes Real data

Variance

Mean-variance

Library size

What else? http://www.cellimagelibrary.org/images/44701 Andres J Garcia and Ankur Singh via
The Cell Image Library

Groups

Summary Single-cell RNA-seq is an exciting new technology - Lots
of analysis methods Simulations can be used to evaluate methods - But often hard to reuse Splatter - R package for simulation and comparison Splat - Simulation method for groups or paths

Acknowledgements Alicia Oshlack Melissa Little Belinda Phipson MCRI Bioinformatics oshalacklab.com

github.com/Oshlack/splatter github.com/lazappi/single-cell-software @_lazappi_ oshalacklab.com

Solution? http://www.cellimagelibrary.org/images/38804 Wellcome Images via The Cell Image Library

Library size

Negative binomial

Simplifying simulation of single-cell RNA-seq

Simplifying simulation of single-cell RNA-seq

Luke Zappia

More Decks by Luke Zappia

Other Decks in Science

Featured

Transcript