Upgrade to Pro — share decks privately, control downloads, hide ads and more …

is3b2014

 is3b2014

Leonardo Collado-Torres

August 04, 2014
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. Developmental regulation of human cortex transcription at base-pair resolution Leonardo

    Collado-Torres blog: bit.ly/FellBit tweet: @fellgernon #IS3B2014
  2. Goal of my work Goal What are the real transcriptional

    difference(s) between biological conditions or over time? Problems •  Annotation may be incomplete •  Assembly with short reads is challenging •  Counting is harder than it looks
  3. Example biological questions What are the differences in transcription: 1. 

    between cocaine and alcohol addicts in the human hippocampus? 2.  in blood in a natural timecourse for a single individual? 3.  at multiple developmental stages? 4.  in the dorsolateral prefrontal complex over lifespan? Jaffe*, Shin, Collado-Torres, Leek et al, In review, 2014 Zhou et al, PNAS, 2011 Chen et al, Cell, 2012 Xie et al, Cell, 2013
  4. Data n  samples  à   ~348  million  nt   11.24%

        Rows with at least 1 sample with coverage > 5 Adapted from @jtleek
  5. derfinder: fast BAM to results •  Avoid Input/Output •  Work

    by chromosome •  Reduce memory •  Use F-stats •  Use multiple cores Raw   coverage   merge   Filtered   coverage   Models   Analysis   Merge   results   Report   1.1   0.5   1.2   0.2   41.2   0.1   2.7   10.3   39.7   9.5   34.2   140.3   8.6   42   36 samples, 1000 permutations à 47 hrs Mean: 102 mi mapped reads Sd: 53.5, Min: 20.9, Max: 284.9 Data from Jaffe et al, In review, 2014 Hrs: Mem GB:
  6. Lessons learned •  Balancing memory + speed + disk usage

    is challenging •  Base-pair resolution DE analysis doable –  487 samples: 61 GB & 9 days with 1000 permutations, 20 cores for 15,880,729,865 F-stats (6.37% chr1 * 1001)
  7. What’s next for derfinder? 1.  Convert to parametric tests 2. 

    Build base-resolution models for artifacts: –  RNA quality, cell composition, batch effects 3.  Improve annotation of DERs 4.  Make available via Bioconductor https://blogs.warwick.ac.uk/nichols/entry/spm5_gem_6 Nichols and Holmes, Human Brain Mapping, 2001
  8. LIBD Postmortem Brain Collection •  Clinically characterized postmortem human brains

    from >1300 individuals from DC/VA/MD Medical Examiners Offices •  Non-psychiatric controls from across the lifespan (fetal through aged) and individuals with brain disorders (schizophrenia, bipolar, major depression) •  Generating genomic data from brain regions of interest: genotypes, gene expression microarrays, RNA-seq, DNA methylation, etc
  9. •  Human brain transcriptome changes dramatically across development and aging

    Colantuoni 2011, Kang 2011 •  Previous approaches relied on microarray technologies à pre-defined probe sequences that capture only a limited proportion of transcriptome diversity Background
  10. Background •  Existing published RNAseq-based characterizations of brain development have

    utilized gene- and/or exon-level count- based summarizations (www.brainspan.org) •  Feature-based read counts lack the ability to reliably identify novel transcriptional activity •  Transcript assembly using short reads or counting are hard
  11. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data •  Gender balanced •  Similar other covariates like RNA Integrity Number (RIN) Jaffe et al, In review, 2014
  12. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Independent samples! Jaffe et al, In review, 2014 Replication data
  13. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Replication data Fetal   Adult   3 / group, N = 6 Total mRNA Fetal   Adult   3 / group, N = 6 Cytosolic fraction N individuals sequenced: 36 + 36 + 6 = 78 N samples: 36 + 36 + 6 * 2 = 84 Jaffe et al, In review, 2014 Validation data
  14. Identifying DERs Fetal   Infant   Child   Teen  

    Adult   50+   6 / group, N = 36 Discovery data Null: Alt: Models Cutoff Details •  Rank DERs by area •  1000 permutations •  Control FWER (≤  5%) by max area per permutation Results 63,135 DERs 20.509 Corresponds to p-value 10-08 Jaffe et al, In review, 2014
  15. Replicating DERs Fetal   Infant   Child   Teen  

    Adult   50+   6 / group, N = 36 Replication data Null: Alt: Models Cutoff Details •  Per sample and per DER calculate average expression •  Use the 36 numbers to calculate F-statistic Results 50,650 DERs replicated Single F-statistic per DER p-value < 0.05 Jaffe et al, In review, 2014
  16. Non-replicated DERs characteristics •  Narrower: – 83.0 bp vs 170.3 bp,

    p < 10-100 •  Smaller areas: – 2633.9 vs 7034.9, p<10-100 mean expected diff: 1790 – therefore lower ranks •  Lower coverage: – 6.6 reads vs 108.7 reads, p<10-100 Jaffe et al, In review, 2014
  17. Identification of extensive transcriptional changes across brain development •  Majority

    of the DERs have the highest expression levels in fetal life (81.7%) •  Overlap genes enriched for neurogenesis, signaling, development; genes involved in brain development, e.g. SOX11, DCX, GAT1, NRGN, CAMK2A, CNTNAP1 Jaffe et al, In review, 2014
  18. DERs validate: Cytosolic vs total mRNA fractions Jaffe et al,

    In review, 2014 Developmental  regulaSon  of  potenSally  unspliced  mRNA  in  the   cytosolic  fracSon  of  the  human  frontal  cortex  
  19. Confirmation in BrainSpan Data •  ~40 individuals across the lifespan

    in many brain regions (www.brainspan.org) – Gene/exon counts – Coverage-level data •  Downloaded and processed RNA-seq data from ~500 samples in 16 brain regions (11 from neocortex), extracting coverage levels within the DERs Coverage  Data  from  BrainSpan:   h3p://download.alleninsStute.org/brainspan/MRF_BigWig_Gencode_v10/  
  20. Age-associated DERs are conserved in the developing mouse cortex Mouse

    cerebral cortex, comparing E17 (N=4) to adult (N=3) C57BL/6 mice Data  from  Dillman  2013   Jaffe et al, In review, 2014
  21. Age-associated DERs are expressed in other cell and tissue types

    •  Downloaded and reprocessed RNA-seq data from stem cell and somatic tissue •  Majority of the DERs had on average > 5 reads in at least one stem cell (86.4%) or tissue (84.0%) type •  53.3% of all DERs, and 26.5% of non- exonic DERs were expressed in all five stem cell conditions Jaffe et al, In review, 2014 Illumina BodyMap data
  22. Age-associated DERs are expressed in other cell and tissue types

    Postnatal  Brain   Fetal  Brain   Stem  Cell   Tissues   Jaffe et al, In review, 2014
  23. Expression changes across development represent a changing neuronal phenotype • 

    Utilized DNA methylation data from: – flow-sorted cortex GuinSvano  2013 – and stem cell developmental system Kim  2014 •  Performed composition estimation using recently published approaches for Illumina 450k Houseman  et  al  2012,  Jaffe  and  Irizarry  2014 Jaffe et al, In review, 2014
  24. ProporSon  of  Cells   Jaffe et al, In review, 2014

    Expression changes across development represent a changing neuronal phenotype
  25. Analysis summary •  Found DERs associated with age and development

    •  Identified DERs that replicate •  Validated DERs (cytosol vs total mRNA) •  Confirmed with BrainSpan data •  Identified DERs conserved in mouse •  DERs expressed in other tissues (BodyMap data) •  Estimated cell composition from DNA methylation
  26. •  Highlights conserved molecular signatures of transcriptional dynamics across brain

    development •  Incomplete annotation of the human brain transcriptome •  Differences in expression occurring across birth, may be driven principally by changing neuronal phenotypes, rather than the rise of non-neuronal cell types Discussion
  27. •  Future biological experiments may better characterize the functional roles

    of these DERs, particularly intronic and intergenic regions •  Data will soon be publicly available Discussion
  28. Acknowledgements Leek Group Jeffrey Leek Alyssa Frazee Hopkins Sarven Sabunciyan

    Ben Langmead LIBD Andrew Jaffe Jooheon Shin Nikolay Ivanov Amy Deep Ran Tao Yankai Jia Thomas Hyde Joel Kleinman Daniel Weinberger Harvard Rafael Irizarry Funding NIH LIBD CONACyT México