Upgrade to Pro — share decks privately, control downloads, hide ads and more …

is3b2014

 is3b2014

Leonardo Collado-Torres

August 04, 2014
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. Developmental regulation of human cortex
    transcription at base-pair resolution
    Leonardo Collado-Torres
    blog: bit.ly/FellBit
    tweet: @fellgernon
    #IS3B2014

    View Slide

  2. Goal of my work
    Goal
    What are the real transcriptional difference(s)
    between biological conditions or over time?
    Problems
    •  Annotation may be incomplete
    •  Assembly with short reads is challenging
    •  Counting is harder than it looks

    View Slide

  3. Example biological questions
    What are the differences in transcription:
    1.  between cocaine and alcohol addicts in
    the human hippocampus?
    2.  in blood in a natural timecourse for a
    single individual?
    3.  at multiple developmental stages?
    4.  in the dorsolateral prefrontal complex
    over lifespan? Jaffe*, Shin, Collado-Torres, Leek et al, In
    review, 2014
    Zhou et al, PNAS, 2011
    Chen et al, Cell, 2012
    Xie et al, Cell, 2013

    View Slide

  4. Transcription
    Alberts et al. Molecular Biology of the Cell,
    4th edition 2002 Fig 6-21

    View Slide

  5. Mapping reads from mRNAs
    Trapnell  et  al.  Nat.  Biotech  2009  

    View Slide

  6. Summarize into features or assemble
    Trapnell  et  al.  Nat.  Biotech  2009  

    View Slide

  7. Annotation variation
    Frazee  et  al.  Biosta-s-cs  2014  
     

    View Slide

  8. Challenges in counting
    h3p://www-­‐huber.embl.de/users/anders/HTSeq/doc/count.html  

    View Slide

  9. Data
    n  samples  à  
    ~348  million  nt  
    11.24%    
    Rows with at least 1 sample with coverage > 5
    Adapted from @jtleek

    View Slide

  10. RNA-seq  data from Xie et al, Cell, 2013
    derfinder: annotation-agnostic

    View Slide

  11. Compare DERs vs annotation
    RNA-seq  data from Xie et al, Cell, 2013

    View Slide

  12. Statistical model
    •  Null model
    •  Alternative Model
    •  F-statistic

    View Slide

  13. DERs p-values
    Permute model matrices and find null
    regions for all chromosomes.

    View Slide

  14. derfinder: fast BAM to results
    •  Avoid Input/Output
    •  Work by chromosome
    •  Reduce memory
    •  Use F-stats
    •  Use multiple cores
    Raw  
    coverage  
    merge   Filtered  
    coverage  
    Models   Analysis   Merge  
    results  
    Report  
    1.1   0.5   1.2   0.2   41.2   0.1   2.7  
    10.3   39.7   9.5   34.2   140.3   8.6   42  
    36 samples, 1000 permutations à 47 hrs
    Mean: 102 mi mapped reads
    Sd: 53.5, Min: 20.9, Max: 284.9
    Data from Jaffe et al, In review, 2014
    Hrs:
    Mem GB:

    View Slide

  15. Counting & region finding are complementary
    RNA-seq  data from Zhou et al, PNAS, 2011

    View Slide

  16. Lessons learned
    •  Balancing memory + speed + disk usage is
    challenging
    •  Base-pair resolution DE analysis doable
    –  487 samples: 61 GB & 9 days with 1000 permutations, 20 cores
    for 15,880,729,865 F-stats (6.37% chr1 * 1001)

    View Slide

  17. What’s next for derfinder?
    1.  Convert to parametric tests
    2.  Build base-resolution models for
    artifacts:
    –  RNA quality, cell composition, batch effects
    3.  Improve annotation of DERs
    4.  Make available via Bioconductor
    https://blogs.warwick.ac.uk/nichols/entry/spm5_gem_6
    Nichols and Holmes, Human Brain Mapping, 2001

    View Slide

  18. View Slide

  19. LIBD Postmortem Brain Collection
    •  Clinically characterized postmortem human
    brains from >1300 individuals from DC/VA/MD
    Medical Examiners Offices
    •  Non-psychiatric controls from across the
    lifespan (fetal through aged) and individuals
    with brain disorders (schizophrenia, bipolar,
    major depression)
    •  Generating genomic data from brain regions of
    interest: genotypes, gene expression
    microarrays, RNA-seq, DNA methylation, etc

    View Slide

  20. •  Human brain transcriptome changes
    dramatically across development and
    aging Colantuoni 2011, Kang 2011
    •  Previous approaches relied on microarray
    technologies à pre-defined probe
    sequences that capture only a limited
    proportion of transcriptome diversity
    Background

    View Slide

  21. Background
    •  Existing published RNAseq-based
    characterizations of brain development have
    utilized gene- and/or exon-level count-
    based summarizations (www.brainspan.org)
    •  Feature-based read counts lack the ability
    to reliably identify novel transcriptional
    activity
    •  Transcript assembly using short reads or
    counting are hard

    View Slide

  22. Data
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Discovery data
    •  Gender balanced
    •  Similar other covariates
    like RNA Integrity Number
    (RIN)
    Jaffe et al, In review, 2014

    View Slide

  23. Data
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Discovery data
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Independent samples!
    Jaffe et al, In review, 2014
    Replication data

    View Slide

  24. Data
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Discovery data
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Replication data
    Fetal   Adult  
    3 / group, N = 6
    Total mRNA
    Fetal   Adult  
    3 / group, N = 6
    Cytosolic fraction
    N individuals sequenced: 36 + 36 + 6 = 78
    N samples: 36 + 36 + 6 * 2 = 84
    Jaffe et al, In review, 2014
    Validation data

    View Slide

  25. Identifying DERs
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Discovery data Null:
    Alt:
    Models
    Cutoff
    Details
    •  Rank DERs by area
    •  1000 permutations
    •  Control FWER (≤  5%) by max area
    per permutation
    Results
    63,135 DERs
    20.509
    Corresponds to p-value 10-08
    Jaffe et al, In review, 2014

    View Slide

  26. Replicating DERs
    Fetal   Infant  
    Child   Teen  
    Adult   50+  
    6 / group, N = 36
    Replication data Null:
    Alt:
    Models
    Cutoff
    Details
    •  Per sample and per DER calculate
    average expression
    •  Use the 36 numbers to calculate
    F-statistic
    Results
    50,650 DERs replicated
    Single F-statistic per DER
    p-value < 0.05
    Jaffe et al, In review, 2014

    View Slide

  27. Non-replicated DERs characteristics
    •  Narrower:
    – 83.0 bp vs 170.3 bp, p < 10-100
    •  Smaller areas:
    – 2633.9 vs 7034.9, p<10-100
    mean expected diff: 1790
    – therefore lower ranks
    •  Lower coverage:
    – 6.6 reads vs 108.7 reads, p<10-100
    Jaffe et al, In review, 2014

    View Slide

  28. Identification of extensive
    transcriptional changes across brain
    development
    •  Majority of the DERs have the highest
    expression levels in fetal life (81.7%)
    •  Overlap genes enriched for neurogenesis,
    signaling, development; genes involved in
    brain development, e.g. SOX11, DCX,
    GAT1, NRGN, CAMK2A, CNTNAP1
    Jaffe et al, In review, 2014

    View Slide

  29. Jaffe et al, In review, 2014

    View Slide

  30. Widespread differential expression of novel
    transcriptional activity
    Jaffe et al, In review, 2014

    View Slide

  31. DERs validate: Cytosolic vs total mRNA
    fractions
    Jaffe et al, In review, 2014
    Developmental  regulaSon  of  potenSally  unspliced  mRNA  in  the  
    cytosolic  fracSon  of  the  human  frontal  cortex  

    View Slide

  32. Confirmation in BrainSpan Data
    •  ~40 individuals across the lifespan in
    many brain regions (www.brainspan.org)
    – Gene/exon counts
    – Coverage-level data
    •  Downloaded and processed RNA-seq data
    from ~500 samples in 16 brain regions (11
    from neocortex), extracting coverage
    levels within the DERs
    Coverage  Data  from  BrainSpan:  
    h3p://download.alleninsStute.org/brainspan/MRF_BigWig_Gencode_v10/  

    View Slide

  33. Age-associated DERs lack regional
    specificity in the human brain
    Jaffe et al, In review, 2014

    View Slide

  34. Age-associated DERs lack regional
    specificity in the human brain
    Jaffe et al, In review, 2014

    View Slide

  35. Age-associated DERs are conserved in the
    developing mouse cortex
    Mouse cerebral cortex, comparing E17 (N=4) to
    adult (N=3) C57BL/6 mice Data  from  Dillman  2013  
    Jaffe et al, In review, 2014

    View Slide

  36. Age-associated DERs are expressed in other
    cell and tissue types
    •  Downloaded and reprocessed RNA-seq
    data from stem cell and somatic tissue
    •  Majority of the DERs had on average > 5
    reads in at least one stem cell (86.4%) or
    tissue (84.0%) type
    •  53.3% of all DERs, and 26.5% of non-
    exonic DERs were expressed in all five
    stem cell conditions
    Jaffe et al, In review, 2014
    Illumina
    BodyMap
    data

    View Slide

  37. Age-associated DERs are expressed in other
    cell and tissue types
    Postnatal  Brain  
    Fetal  Brain  
    Stem  Cell  
    Tissues  
    Jaffe et al, In review, 2014

    View Slide

  38. Age-associated DERs are expressed in other
    cell and tissue types
    Jaffe et al, In review, 2014

    View Slide

  39. Expression changes across development
    represent a changing neuronal phenotype
    •  Utilized DNA methylation data from:
    – flow-sorted cortex GuinSvano  2013
    – and stem cell developmental system Kim  2014
    •  Performed composition estimation using
    recently published approaches for
    Illumina 450k Houseman  et  al  2012,  Jaffe  and  Irizarry  2014
    Jaffe et al, In review, 2014

    View Slide

  40. ProporSon  of  Cells  
    Jaffe et al, In review, 2014
    Expression changes across development
    represent a changing neuronal phenotype

    View Slide

  41. LIBD Human DLPFC Development
    •  UCSC “Track Hub”
    Jaffe et al, In review, 2014

    View Slide

  42. Analysis summary
    •  Found DERs associated with age and
    development
    •  Identified DERs that replicate
    •  Validated DERs (cytosol vs total mRNA)
    •  Confirmed with BrainSpan data
    •  Identified DERs conserved in mouse
    •  DERs expressed in other tissues (BodyMap
    data)
    •  Estimated cell composition from DNA
    methylation

    View Slide

  43. •  Highlights conserved molecular signatures
    of transcriptional dynamics across brain
    development
    •  Incomplete annotation of the human
    brain transcriptome
    •  Differences in expression occurring across
    birth, may be driven principally by
    changing neuronal phenotypes, rather
    than the rise of non-neuronal cell types
    Discussion

    View Slide

  44. •  Future biological experiments may better
    characterize the functional roles of these
    DERs, particularly intronic and intergenic
    regions
    •  Data will soon be publicly available
    Discussion

    View Slide

  45. Acknowledgements
    Leek Group
    Jeffrey Leek
    Alyssa Frazee
    Hopkins
    Sarven Sabunciyan
    Ben Langmead
    LIBD
    Andrew Jaffe
    Jooheon Shin
    Nikolay Ivanov
    Amy Deep
    Ran Tao
    Yankai Jia
    Thomas Hyde
    Joel Kleinman
    Daniel Weinberger
    Harvard
    Rafael Irizarry
    Funding
    NIH
    LIBD
    CONACyT México

    View Slide