Upgrade to Pro — share decks privately, control downloads, hide ads and more …

recount-webinar

 recount-webinar

recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor

Leonardo Collado-Torres

April 17, 2018
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. 11
    recount workflow: Accessing over 70,000 human RNA-seq
    samples with Bioconductor
    Leonardo Collado-Torres
    @fellgernon
    sites.google.com/view/lalresearchgroup/our-weekly-webinar
    speakerdeck.com/lcolladotor/recount-webinar
    April 17, 2018

    View Slide

  2. Reference genome
    Reads
    f1000research.com/articles/6-1558/v1

    View Slide

  3. f1000research.com/articles/6-1558/v1

    View Slide

  4. GTEx TCGA
    slide adapted from Shannon Ellis

    View Slide

  5. SRA

    View Slide

  6. http://rail.bio/
    Slide adapted from Ben Langmead

    View Slide

  7. http://blogs.citrix.com/2012/10/17/announcing-general-availability-of-sharefile-with-storagezones/

    View Slide

  8. https://jhubiostatistics.shinyapps.io/recount/

    View Slide

  9. jx 1 jx 2 jx 3 jx 4
    jx 5
    jx 6
    Coverage
    Reads
    Gene
    Isoform 1
    Isoform 2
    Potential
    isoform 3
    exon 1 exon 2 exon 3 exon 4
    Expressed region 1:
    potential exon 5
    f1000research.com/articles/6-1558/v1

    View Slide

  10. f1000research.com/articles/6-1558/v1

    View Slide

  11. exon 1 exon 2
    exon 3
    f1000research.com/articles/6-1558/v1

    View Slide

  12. disjoint exon 1
    disjoint exon 2 disjoint exon 3
    f1000research.com/articles/6-1558/v1

    View Slide

  13. f1000research.com/articles/6-1558/v1

    View Slide

  14. f1000research.com/articles/6-1558/v1

    View Slide

  15. > library('recount')
    > download_study( 'ERP001942', type='rse-gene')
    > load(file.path('ERP001942 ', 'rse_gene.Rdata'))
    > rse github.com/leekgroup/recount-analyses/

    View Slide

  16. >library('recount')
    > download_study('SRP029880', type='rse-gene')
    > download_study('SRP059039', type='rse-gene')
    > load(file.path('SRP029880 ', 'rse_gene.Rdata'))
    > load(file.path('SRP059039', 'rse_gene.Rdata'))
    > mdat github.com/leekgroup/recount-analyses/

    View Slide

  17. Collado Torres et al. Nat. Biotech 2017

    View Slide

  18. slide adapted from Kai Kammers
    Can combine with
    genotype data to
    identify eQTLs

    View Slide

  19. biorxiv.org/content/early/2018/01/12/247346

    View Slide

  20. http://www.rna-seqblog.com/
    Can we use
    expression
    data to predict
    tissue?
    slide adapted from Shannon Ellis

    View Slide

  21. Number of Regions 589 589 589 589
    Number of
    Samples (N)
    4,769 4,769 7,193 8,951
    97.3% 96.5%
    71.9%
    50.6%
    Tissue
    prediction is
    accurate
    across data
    sets
    doi.org/10.1093/nar/gky102

    View Slide

  22. Number of Regions 589 589 589 589 589
    Number of Samples
    (N)
    4,769 4,769 613 6,579 8,951
    97.3% 96.5% 91.0%
    70.2%
    Prediction
    is more
    accurate in
    healthy
    tissue
    50.6%
    doi.org/10.1093/nar/gky102

    View Slide

  23. > library('recount')
    > download_study( 'ERP001942', type='rse-gene')
    > load(file.path('ERP001942 ', 'rse_gene.Rdata'))
    > rse > rse_with_pred github.com/leekgroup/recount-analyses/

    View Slide

  24. Ashkaun Razmara, in prep.

    View Slide

  25. expression data for ~70,000 human samples
    samples
    phenotypes
    ?
    GTEx
    N=9,962
    TCGA
    N=11,284
    SRA
    N=49,848
    samples
    expression
    estimates
    gene
    exon
    junctions
    ERs
    Answer meaningful
    questions about
    human biology and
    expression
    sex tissue
    M Blood
    F Heart
    F Liver
    slide adapted from Shannon Ellis

    View Slide

  26. Code Example:
    research.libd.org/recount-brain/example_PMI/example_PMI.html
    research.libd.org/recount-brain/example_PMI/example_PMI.Rmd
    Replicates part of the GTEx PMI paper by Ferreira et al.
    doi.org/10.1038/s41467-017-02772-x
    Ashkaun Razmara, in prep.

    View Slide

  27. slide adapted from Jeff Leek

    View Slide

  28. The recount2 team
    Hopkins
    Kai Kammers
    Shannon Ellis
    Margaret Taub
    Kasper Hansen
    Jeff Leek
    Ben Langmead
    OHSU
    Abhinav Nellore
    LIBD
    Leonardo
    Collado-Torres
    Andrew Jaffe
    recount-brain
    Ashkaun Razmara
    Funding and hosting
    NIH R01 GM105705
    NIH 1R21MH109956
    CONACyT 351535
    AWS in Education
    Seven Bridges
    IDIES SciServer

    View Slide

  29. expression data for ~70,000 human samples
    (Multiple) Postdoc positions available to
    - develop methods to process and analyze data from recount2
    - use recount2 to address specific biological questions
    This project involves the Hansen, Leek, Langmead and Battle labs at JHU
    Contact: Kasper D. Hansen ([email protected] | www.hansenlab.org)
    Code for making a 5 min video on recount2:
    github.com/lcolladotor/biopeerprize2018

    View Slide