Upgrade to Pro — share decks privately, control downloads, hide ads and more …

is3b2014

 is3b2014

7382f7fe30561274624635116513ca37?s=128

Leonardo Collado-Torres

August 04, 2014
Tweet

Transcript

  1. Developmental regulation of human cortex transcription at base-pair resolution Leonardo

    Collado-Torres blog: bit.ly/FellBit tweet: @fellgernon #IS3B2014
  2. Goal of my work Goal What are the real transcriptional

    difference(s) between biological conditions or over time? Problems •  Annotation may be incomplete •  Assembly with short reads is challenging •  Counting is harder than it looks
  3. Example biological questions What are the differences in transcription: 1. 

    between cocaine and alcohol addicts in the human hippocampus? 2.  in blood in a natural timecourse for a single individual? 3.  at multiple developmental stages? 4.  in the dorsolateral prefrontal complex over lifespan? Jaffe*, Shin, Collado-Torres, Leek et al, In review, 2014 Zhou et al, PNAS, 2011 Chen et al, Cell, 2012 Xie et al, Cell, 2013
  4. Transcription Alberts et al. Molecular Biology of the Cell, 4th

    edition 2002 Fig 6-21
  5. Mapping reads from mRNAs Trapnell  et  al.  Nat.  Biotech  2009

     
  6. Summarize into features or assemble Trapnell  et  al.  Nat.  Biotech

     2009  
  7. Annotation variation Frazee  et  al.  Biosta-s-cs  2014    

  8. Challenges in counting h3p://www-­‐huber.embl.de/users/anders/HTSeq/doc/count.html  

  9. Data n  samples  à   ~348  million  nt   11.24%

        Rows with at least 1 sample with coverage > 5 Adapted from @jtleek
  10. RNA-seq  data from Xie et al, Cell, 2013 derfinder: annotation-agnostic

  11. Compare DERs vs annotation RNA-seq  data from Xie et al,

    Cell, 2013
  12. Statistical model •  Null model •  Alternative Model •  F-statistic

  13. DERs p-values Permute model matrices and find null regions for

    all chromosomes.
  14. derfinder: fast BAM to results •  Avoid Input/Output •  Work

    by chromosome •  Reduce memory •  Use F-stats •  Use multiple cores Raw   coverage   merge   Filtered   coverage   Models   Analysis   Merge   results   Report   1.1   0.5   1.2   0.2   41.2   0.1   2.7   10.3   39.7   9.5   34.2   140.3   8.6   42   36 samples, 1000 permutations à 47 hrs Mean: 102 mi mapped reads Sd: 53.5, Min: 20.9, Max: 284.9 Data from Jaffe et al, In review, 2014 Hrs: Mem GB:
  15. Counting & region finding are complementary RNA-seq  data from Zhou

    et al, PNAS, 2011
  16. Lessons learned •  Balancing memory + speed + disk usage

    is challenging •  Base-pair resolution DE analysis doable –  487 samples: 61 GB & 9 days with 1000 permutations, 20 cores for 15,880,729,865 F-stats (6.37% chr1 * 1001)
  17. What’s next for derfinder? 1.  Convert to parametric tests 2. 

    Build base-resolution models for artifacts: –  RNA quality, cell composition, batch effects 3.  Improve annotation of DERs 4.  Make available via Bioconductor https://blogs.warwick.ac.uk/nichols/entry/spm5_gem_6 Nichols and Holmes, Human Brain Mapping, 2001
  18. None
  19. LIBD Postmortem Brain Collection •  Clinically characterized postmortem human brains

    from >1300 individuals from DC/VA/MD Medical Examiners Offices •  Non-psychiatric controls from across the lifespan (fetal through aged) and individuals with brain disorders (schizophrenia, bipolar, major depression) •  Generating genomic data from brain regions of interest: genotypes, gene expression microarrays, RNA-seq, DNA methylation, etc
  20. •  Human brain transcriptome changes dramatically across development and aging

    Colantuoni 2011, Kang 2011 •  Previous approaches relied on microarray technologies à pre-defined probe sequences that capture only a limited proportion of transcriptome diversity Background
  21. Background •  Existing published RNAseq-based characterizations of brain development have

    utilized gene- and/or exon-level count- based summarizations (www.brainspan.org) •  Feature-based read counts lack the ability to reliably identify novel transcriptional activity •  Transcript assembly using short reads or counting are hard
  22. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data •  Gender balanced •  Similar other covariates like RNA Integrity Number (RIN) Jaffe et al, In review, 2014
  23. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Independent samples! Jaffe et al, In review, 2014 Replication data
  24. Data Fetal   Infant   Child   Teen   Adult

      50+   6 / group, N = 36 Discovery data Fetal   Infant   Child   Teen   Adult   50+   6 / group, N = 36 Replication data Fetal   Adult   3 / group, N = 6 Total mRNA Fetal   Adult   3 / group, N = 6 Cytosolic fraction N individuals sequenced: 36 + 36 + 6 = 78 N samples: 36 + 36 + 6 * 2 = 84 Jaffe et al, In review, 2014 Validation data
  25. Identifying DERs Fetal   Infant   Child   Teen  

    Adult   50+   6 / group, N = 36 Discovery data Null: Alt: Models Cutoff Details •  Rank DERs by area •  1000 permutations •  Control FWER (≤  5%) by max area per permutation Results 63,135 DERs 20.509 Corresponds to p-value 10-08 Jaffe et al, In review, 2014
  26. Replicating DERs Fetal   Infant   Child   Teen  

    Adult   50+   6 / group, N = 36 Replication data Null: Alt: Models Cutoff Details •  Per sample and per DER calculate average expression •  Use the 36 numbers to calculate F-statistic Results 50,650 DERs replicated Single F-statistic per DER p-value < 0.05 Jaffe et al, In review, 2014
  27. Non-replicated DERs characteristics •  Narrower: – 83.0 bp vs 170.3 bp,

    p < 10-100 •  Smaller areas: – 2633.9 vs 7034.9, p<10-100 mean expected diff: 1790 – therefore lower ranks •  Lower coverage: – 6.6 reads vs 108.7 reads, p<10-100 Jaffe et al, In review, 2014
  28. Identification of extensive transcriptional changes across brain development •  Majority

    of the DERs have the highest expression levels in fetal life (81.7%) •  Overlap genes enriched for neurogenesis, signaling, development; genes involved in brain development, e.g. SOX11, DCX, GAT1, NRGN, CAMK2A, CNTNAP1 Jaffe et al, In review, 2014
  29. Jaffe et al, In review, 2014

  30. Widespread differential expression of novel transcriptional activity Jaffe et al,

    In review, 2014
  31. DERs validate: Cytosolic vs total mRNA fractions Jaffe et al,

    In review, 2014 Developmental  regulaSon  of  potenSally  unspliced  mRNA  in  the   cytosolic  fracSon  of  the  human  frontal  cortex  
  32. Confirmation in BrainSpan Data •  ~40 individuals across the lifespan

    in many brain regions (www.brainspan.org) – Gene/exon counts – Coverage-level data •  Downloaded and processed RNA-seq data from ~500 samples in 16 brain regions (11 from neocortex), extracting coverage levels within the DERs Coverage  Data  from  BrainSpan:   h3p://download.alleninsStute.org/brainspan/MRF_BigWig_Gencode_v10/  
  33. Age-associated DERs lack regional specificity in the human brain Jaffe

    et al, In review, 2014
  34. Age-associated DERs lack regional specificity in the human brain Jaffe

    et al, In review, 2014
  35. Age-associated DERs are conserved in the developing mouse cortex Mouse

    cerebral cortex, comparing E17 (N=4) to adult (N=3) C57BL/6 mice Data  from  Dillman  2013   Jaffe et al, In review, 2014
  36. Age-associated DERs are expressed in other cell and tissue types

    •  Downloaded and reprocessed RNA-seq data from stem cell and somatic tissue •  Majority of the DERs had on average > 5 reads in at least one stem cell (86.4%) or tissue (84.0%) type •  53.3% of all DERs, and 26.5% of non- exonic DERs were expressed in all five stem cell conditions Jaffe et al, In review, 2014 Illumina BodyMap data
  37. Age-associated DERs are expressed in other cell and tissue types

    Postnatal  Brain   Fetal  Brain   Stem  Cell   Tissues   Jaffe et al, In review, 2014
  38. Age-associated DERs are expressed in other cell and tissue types

    Jaffe et al, In review, 2014
  39. Expression changes across development represent a changing neuronal phenotype • 

    Utilized DNA methylation data from: – flow-sorted cortex GuinSvano  2013 – and stem cell developmental system Kim  2014 •  Performed composition estimation using recently published approaches for Illumina 450k Houseman  et  al  2012,  Jaffe  and  Irizarry  2014 Jaffe et al, In review, 2014
  40. ProporSon  of  Cells   Jaffe et al, In review, 2014

    Expression changes across development represent a changing neuronal phenotype
  41. LIBD Human DLPFC Development •  UCSC “Track Hub” Jaffe et

    al, In review, 2014
  42. Analysis summary •  Found DERs associated with age and development

    •  Identified DERs that replicate •  Validated DERs (cytosol vs total mRNA) •  Confirmed with BrainSpan data •  Identified DERs conserved in mouse •  DERs expressed in other tissues (BodyMap data) •  Estimated cell composition from DNA methylation
  43. •  Highlights conserved molecular signatures of transcriptional dynamics across brain

    development •  Incomplete annotation of the human brain transcriptome •  Differences in expression occurring across birth, may be driven principally by changing neuronal phenotypes, rather than the rise of non-neuronal cell types Discussion
  44. •  Future biological experiments may better characterize the functional roles

    of these DERs, particularly intronic and intergenic regions •  Data will soon be publicly available Discussion
  45. Acknowledgements Leek Group Jeffrey Leek Alyssa Frazee Hopkins Sarven Sabunciyan

    Ben Langmead LIBD Andrew Jaffe Jooheon Shin Nikolay Ivanov Amy Deep Ran Tao Yankai Jia Thomas Hyde Joel Kleinman Daniel Weinberger Harvard Rafael Irizarry Funding NIH LIBD CONACyT México