Slide 1

Slide 1 text

Annotation-agnostic differential expression analysis Leonardo Collado-Torres @fellgernon

Slide 2

Slide 2 text

motivating problem: identify and validate regions of the genome that change expression during brain development

Slide 3

Slide 3 text

RNA-seq reads Genome (DNA) RNA transcripts (many possible variants) Measuring gene expression: RNA-seq Adapted from @jtleek

Slide 4

Slide 4 text

Challenges in counting h"p://www-huber.embl.de/users/anders/HTSeq/doc/count.html

Slide 5

Slide 5 text

Annotation variation Frazee et al, Biostatistics, 2014

Slide 6

Slide 6 text

DER finder approach •  Find contiguous base pairs with Differential Expression signal à DE Regions or DERs •  Find nearest annotated feature

Slide 7

Slide 7 text

coverage vector 2 6 0 11 6 Genome (DNA) Read coverage Adapted from @jtleek

Slide 8

Slide 8 text

Jaffe et al, Nat. Neuroscience, 2015

Slide 9

Slide 9 text

Single-base F-statistics •  Null model •  Alternative Model •  F-statistic i: base-pair j: sample Collado-Torres et al, bioRxiv, 2015

Slide 10

Slide 10 text

Single-base F-statistics Collado-Torres et al, bioRxiv, 2015 BrainSpan data

Slide 11

Slide 11 text

Compare DERs vs annotation Collado-Torres et al, bioRxiv, 2015 BrainSpan data

Slide 12

Slide 12 text

Input data n samples → ~348 million nt 11.24% coverage Rows with at least 1 sample with coverage > 5 Adapted from @jtleek

Slide 13

Slide 13 text

Finding DERs by expressed-regions

Slide 14

Slide 14 text

Simulation similar in power, yet allows new discoveries

Slide 15

Slide 15 text

Identifying brain development DERs Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Discovery data Null: Alt: Models Cutoff Details •  Rank DERs by area •  1000 permutations •  Control FWER (≤ 5%) by max area per permutation Results 63,135 DERs 20.509 Corresponds to p-value 10-08 Jaffe et al, Nat. Neuroscience, 2015

Slide 16

Slide 16 text

Replicating DERs Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data Null: Alt: Models Cutoff Details Per sample and per DER calculate average expression Results 50,650 DERs replicated Single F-statistic per DER p-value < 0.05 Jaffe et al, Nat. Neuroscience, 2015

Slide 17

Slide 17 text

Jaffe et al, Nat. Neuroscience, 2015

Slide 18

Slide 18 text

Widespread differential expression of novel transcriptional activity Jaffe et al, Nat. Neuroscience, 2015

Slide 19

Slide 19 text

DERs validate: Cytosolic vs total mRNA fractions Jaffe et al, Nat. Neuroscience, 2015

Slide 20

Slide 20 text

CBC: 28 MD: 24 STR: 28 AMY: 31 HIP: 32 DFC: 34 Total N samples: 487 BrainSpan data Coverage Data from BrainSpan: h"p://download.alleninsUtute.org/brainspan/MRF_BigWig_Gencode_v10/ VFC: 30 MFC: 32 OFC: 30 M1C: 25 S1C: 26 IPC: 33 A1C: 30 STC: 35 ITC: 33 V1C: 33

Slide 21

Slide 21 text

Age-associated DERs lack regional specificity in the human brain BrainSpan data Jaffe et al, Nat. Neuroscience, 2015

Slide 22

Slide 22 text

ProporUon of Cells Expression changes across development may represent a changing neuronal phenotype Jaffe et al, Nat. Neuroscience, 2015 Estimation method: Houseman et al, BMC Bioinformatics, 2012

Slide 23

Slide 23 text

LIBD Human DLPFC Development •  UCSC “Track Hub” Jaffe et al, Nat. Neuroscience, 2015

Slide 24

Slide 24 text

• Data: 3 tissues, 12 samples each • Align with • Identify expressed regions with derfinder – Adjust coverage (40 mi) – Find expressed regions (cutoff 5) – Discard ERs < 9 bp GTEX: expressed regions

Slide 25

Slide 25 text

•  221246 ERs – 160817 strictly exonic (73%) – 26740 exonic + intronic (12%) – 22375 strictly intronic (10%) •  Can strictly intronic ERs differentiate tissues? Presence of intronic ERs

Slide 26

Slide 26 text

PCs differentiate tissues

Slide 27

Slide 27 text

PCs differentiate tissues

Slide 28

Slide 28 text

Differential intronic ERs adjusting for exonic ERs

Slide 29

Slide 29 text

Differential intronic ERs | exonic ERs

Slide 30

Slide 30 text

Differential intronic ERs | exonic ERs

Slide 31

Slide 31 text

Collado-Torres et al, F1000Research, 2015 regionReport

Slide 32

Slide 32 text

motivating problem: identify and validate regions of the genome that change expression during brain development 1. derfinder permits discovery of novel expressed regions 2. we identified & validated gene expression changes in the developing brain 3. we have developed tools for reproducible/shareable reporting

Slide 33

Slide 33 text

Acknowledgements Hopkins Jeffrey Leek Alyssa Frazee Abhinav Nellore Ben Langmead LIBD Andrew Jaffe Jooheon Shin Nikolay Ivanov Amy Deep Ran Tao Yankai Jia Thomas Hyde Joel Kleinman Daniel Weinberger Harvard Rafael Irizarry Michael Love Funding NIH LIBD CONACyT México

Slide 34

Slide 34 text

References + software + code •  Collado-Torres L, et al. bioRxiv (2015) doi:10.1101/015370 –  http://bioconductor.org/packages/derfinder •  Collado-Torres L, et al. F1000Research (2015) doi:10.12688/f1000research.6379.1 -  http://www.bioconductor.org/packages/regionReport -  http://lcolladotor.github.io/regionReportSupp/ •  Nellore, et al. bioRxiv (2015) doi:10.1101/019067 - rail.bio •  Jaffe AE, et al. Nat. Neurosci. (2015) doi:10.1038/nn.3898 –  https://github.com/lcolladotor/libd_n36 –  https://github.com/lcolladotor/enrichedRanges •  Frazee AC, et al. Biostatistics. (2014) doi:10.1093/biostatistics/kxt053 –  https://github.com/leekgroup/derfinder