Slide 1

Slide 1 text

Developmental regulation of human cortex transcription at base-pair resolution Leonardo Collado-Torres blog: bit.ly/FellBit tweet: @fellgernon #IS3B2014

Slide 2

Slide 2 text

Goal of my work Goal What are the real transcriptional difference(s) between biological conditions or over time? Problems •  Annotation may be incomplete •  Assembly with short reads is challenging •  Counting is harder than it looks

Slide 3

Slide 3 text

Example biological questions What are the differences in transcription: 1.  between cocaine and alcohol addicts in the human hippocampus? 2.  in blood in a natural timecourse for a single individual? 3.  at multiple developmental stages? 4.  in the dorsolateral prefrontal complex over lifespan? Jaffe*, Shin, Collado-Torres, Leek et al, In review, 2014 Zhou et al, PNAS, 2011 Chen et al, Cell, 2012 Xie et al, Cell, 2013

Slide 4

Slide 4 text

Transcription Alberts et al. Molecular Biology of the Cell, 4th edition 2002 Fig 6-21

Slide 5

Slide 5 text

Mapping reads from mRNAs Trapnell  et  al.  Nat.  Biotech  2009  

Slide 6

Slide 6 text

Summarize into features or assemble Trapnell  et  al.  Nat.  Biotech  2009  

Slide 7

Slide 7 text

Annotation variation Frazee  et  al.  Biosta-s-cs  2014    

Slide 8

Slide 8 text

Challenges in counting h3p://www-­‐huber.embl.de/users/anders/HTSeq/doc/count.html  

Slide 9

Slide 9 text

Data n  samples  à   ~348  million  nt   11.24%     Rows with at least 1 sample with coverage > 5 Adapted from @jtleek

Slide 10

Slide 10 text

RNA-seq  data from Xie et al, Cell, 2013 derfinder: annotation-agnostic

Slide 11

Slide 11 text

Compare DERs vs annotation RNA-seq  data from Xie et al, Cell, 2013

Slide 12

Slide 12 text

Statistical model •  Null model •  Alternative Model •  F-statistic

Slide 13

Slide 13 text

DERs p-values Permute model matrices and find null regions for all chromosomes.

Slide 14

Slide 14 text

derfinder: fast BAM to results •  Avoid Input/Output •  Work by chromosome •  Reduce memory •  Use F-stats •  Use multiple cores Raw   coverage   merge   Filtered   coverage   Models   Analysis   Merge   results   Report   1.1   0.5   1.2   0.2   41.2   0.1   2.7   10.3   39.7   9.5   34.2   140.3   8.6   42   36 samples, 1000 permutations à 47 hrs Mean: 102 mi mapped reads Sd: 53.5, Min: 20.9, Max: 284.9 Data from Jaffe et al, In review, 2014 Hrs: Mem GB:

Slide 15

Slide 15 text

Counting & region finding are complementary RNA-seq  data from Zhou et al, PNAS, 2011

Slide 16

Slide 16 text

Lessons learned •  Balancing memory + speed + disk usage is challenging •  Base-pair resolution DE analysis doable –  487 samples: 61 GB & 9 days with 1000 permutations, 20 cores for 15,880,729,865 F-stats (6.37% chr1 * 1001)

Slide 17

Slide 17 text

What’s next for derfinder? 1.  Convert to parametric tests 2.  Build base-resolution models for artifacts: –  RNA quality, cell composition, batch effects 3.  Improve annotation of DERs 4.  Make available via Bioconductor https://blogs.warwick.ac.uk/nichols/entry/spm5_gem_6 Nichols and Holmes, Human Brain Mapping, 2001

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

LIBD Postmortem Brain Collection •  Clinically characterized postmortem human brains from >1300 individuals from DC/VA/MD Medical Examiners Offices •  Non-psychiatric controls from across the lifespan (fetal through aged) and individuals with brain disorders (schizophrenia, bipolar, major depression) •  Generating genomic data from brain regions of interest: genotypes, gene expression microarrays, RNA-seq, DNA methylation, etc

Slide 20

Slide 20 text

•  Human brain transcriptome changes dramatically across development and aging Colantuoni 2011, Kang 2011 •  Previous approaches relied on microarray technologies à pre-defined probe sequences that capture only a limited proportion of transcriptome diversity Background

Slide 21

Slide 21 text

Background •  Existing published RNAseq-based characterizations of brain development have utilized gene- and/or exon-level count- based summarizations (www.brainspan.org) •  Feature-based read counts lack the ability to reliably identify novel transcriptional activity •  Transcript assembly using short reads or counting are hard

Slide 22

Slide 22 text

Data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Discovery data •  Gender balanced •  Similar other covariates like RNA Integrity Number (RIN) Jaffe et al, In review, 2014

Slide 23

Slide 23 text

Data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Independent samples! Jaffe et al, In review, 2014 Replication data

Slide 24

Slide 24 text

Data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Replication data Fetal   Adult   3 / group, N = 6 Total mRNA Fetal   Adult   3 / group, N = 6 Cytosolic fraction N individuals sequenced: 36 + 36 + 6 = 78 N samples: 36 + 36 + 6 * 2 = 84 Jaffe et al, In review, 2014 Validation data

Slide 25

Slide 25 text

Identifying DERs Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Discovery data Null: Alt: Models Cutoff Details •  Rank DERs by area •  1000 permutations •  Control FWER (≤  5%) by max area per permutation Results 63,135 DERs 20.509 Corresponds to p-value 10-08 Jaffe et al, In review, 2014

Slide 26

Slide 26 text

Replicating DERs Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Replication data Null: Alt: Models Cutoff Details •  Per sample and per DER calculate average expression •  Use the 36 numbers to calculate F-statistic Results 50,650 DERs replicated Single F-statistic per DER p-value < 0.05 Jaffe et al, In review, 2014

Slide 27

Slide 27 text

Non-replicated DERs characteristics •  Narrower: – 83.0 bp vs 170.3 bp, p < 10-100 •  Smaller areas: – 2633.9 vs 7034.9, p<10-100 mean expected diff: 1790 – therefore lower ranks •  Lower coverage: – 6.6 reads vs 108.7 reads, p<10-100 Jaffe et al, In review, 2014

Slide 28

Slide 28 text

Identification of extensive transcriptional changes across brain development •  Majority of the DERs have the highest expression levels in fetal life (81.7%) •  Overlap genes enriched for neurogenesis, signaling, development; genes involved in brain development, e.g. SOX11, DCX, GAT1, NRGN, CAMK2A, CNTNAP1 Jaffe et al, In review, 2014

Slide 29

Slide 29 text

Jaffe et al, In review, 2014

Slide 30

Slide 30 text

Widespread differential expression of novel transcriptional activity Jaffe et al, In review, 2014

Slide 31

Slide 31 text

DERs validate: Cytosolic vs total mRNA fractions Jaffe et al, In review, 2014 Developmental  regulaSon  of  potenSally  unspliced  mRNA  in  the   cytosolic  fracSon  of  the  human  frontal  cortex  

Slide 32

Slide 32 text

Confirmation in BrainSpan Data •  ~40 individuals across the lifespan in many brain regions (www.brainspan.org) – Gene/exon counts – Coverage-level data •  Downloaded and processed RNA-seq data from ~500 samples in 16 brain regions (11 from neocortex), extracting coverage levels within the DERs Coverage  Data  from  BrainSpan:   h3p://download.alleninsStute.org/brainspan/MRF_BigWig_Gencode_v10/  

Slide 33

Slide 33 text

Age-associated DERs lack regional specificity in the human brain Jaffe et al, In review, 2014

Slide 34

Slide 34 text

Age-associated DERs lack regional specificity in the human brain Jaffe et al, In review, 2014

Slide 35

Slide 35 text

Age-associated DERs are conserved in the developing mouse cortex Mouse cerebral cortex, comparing E17 (N=4) to adult (N=3) C57BL/6 mice Data  from  Dillman  2013   Jaffe et al, In review, 2014

Slide 36

Slide 36 text

Age-associated DERs are expressed in other cell and tissue types •  Downloaded and reprocessed RNA-seq data from stem cell and somatic tissue •  Majority of the DERs had on average > 5 reads in at least one stem cell (86.4%) or tissue (84.0%) type •  53.3% of all DERs, and 26.5% of non- exonic DERs were expressed in all five stem cell conditions Jaffe et al, In review, 2014 Illumina BodyMap data

Slide 37

Slide 37 text

Age-associated DERs are expressed in other cell and tissue types Postnatal  Brain   Fetal  Brain   Stem  Cell   Tissues   Jaffe et al, In review, 2014

Slide 38

Slide 38 text

Age-associated DERs are expressed in other cell and tissue types Jaffe et al, In review, 2014

Slide 39

Slide 39 text

Expression changes across development represent a changing neuronal phenotype •  Utilized DNA methylation data from: – flow-sorted cortex GuinSvano  2013 – and stem cell developmental system Kim  2014 •  Performed composition estimation using recently published approaches for Illumina 450k Houseman  et  al  2012,  Jaffe  and  Irizarry  2014 Jaffe et al, In review, 2014

Slide 40

Slide 40 text

ProporSon  of  Cells   Jaffe et al, In review, 2014 Expression changes across development represent a changing neuronal phenotype

Slide 41

Slide 41 text

LIBD Human DLPFC Development •  UCSC “Track Hub” Jaffe et al, In review, 2014

Slide 42

Slide 42 text

Analysis summary •  Found DERs associated with age and development •  Identified DERs that replicate •  Validated DERs (cytosol vs total mRNA) •  Confirmed with BrainSpan data •  Identified DERs conserved in mouse •  DERs expressed in other tissues (BodyMap data) •  Estimated cell composition from DNA methylation

Slide 43

Slide 43 text

•  Highlights conserved molecular signatures of transcriptional dynamics across brain development •  Incomplete annotation of the human brain transcriptome •  Differences in expression occurring across birth, may be driven principally by changing neuronal phenotypes, rather than the rise of non-neuronal cell types Discussion

Slide 44

Slide 44 text

•  Future biological experiments may better characterize the functional roles of these DERs, particularly intronic and intergenic regions •  Data will soon be publicly available Discussion

Slide 45

Slide 45 text

Acknowledgements Leek Group Jeffrey Leek Alyssa Frazee Hopkins Sarven Sabunciyan Ben Langmead LIBD Andrew Jaffe Jooheon Shin Nikolay Ivanov Amy Deep Ran Tao Yankai Jia Thomas Hyde Joel Kleinman Daniel Weinberger Harvard Rafael Irizarry Funding NIH LIBD CONACyT México