Upgrade to Pro — share decks privately, control downloads, hide ads and more …

psb-recount2

 psb-recount2

EARLY CAREER CLINICAL RESEARCH SYMBIONT AWARD presentation at #PSB19 for #recount2

Leonardo Collado-Torres

January 06, 2019
Tweet

More Decks by Leonardo Collado-Torres

Other Decks in Science

Transcript

  1. 11
    Reproducible RNA-seq analysis with
    Leonardo Collado-Torres
    @fellgernon #PSB19
    Data Science I with Andrew Jaffe
    Slides: speakerdeck.com/lcolladotor/psb-recount2

    View Slide

  2. https://jhubiostatistics.shinyapps.io/recount/
    #PSB19

    View Slide

  3. > library('recount')
    > download_study( 'ERP001942', type='rse-gene')
    > load(file.path('ERP001942 ', 'rse_gene.Rdata'))
    > rse <- scale_counts(rse_gene)
    github.com/leekgroup/recount-analyses/
    #PSB19

    View Slide

  4. doi.org/10.12688/f1000research.12223.1
    Powered by
    @bioconductor
    #rstats
    #PSB19

    View Slide

  5. jx 1 jx 2 jx 3 jx 4
    jx 5
    jx 6
    Coverage
    Reads
    Gene
    Isoform 1
    Isoform 2
    Potential
    isoform 3
    exon 1 exon 2 exon 3 exon 4
    Expressed region 1:
    potential exon 5
    doi.org/10.12688/f1000research.12223.1
    #PSB19

    View Slide

  6. slide adapted from Jeff Leek
    #PSB19

    View Slide

  7. related projects
    • Bioconductor recountWorkflow: use and documentation
    doi.org/10.12688/f1000research.12223.1
    • Snaptron by Christopher Wilks & Langmead: exon-exon junctions
    doi.org/10.1093/bioinformatics/btx547
    • Shannon Ellis & Leek: phenotype prediction
    doi.org/10.1093/nar/gky102
    • Jack Fu & Taub: transcript estimations
    biorxiv.org/content/early/2018/05/25/247346
    • Madugundu & Pandey (JHU): proteomics
    • Luidy-Imada & Marchionni (JHU): cancer
    • Kuri-Magaña & Martínez-Barnetche (INSP Mexico): immune expression
    doi.org/10.3389/fimmu.2018.02679
    • D. Zhang & S. Guelfi with Ryten (UCL) improving annotation
    biorxiv.org/content/early/2018/12/19/499103 #PSB19

    View Slide

  8. expression data for ~70,000 human samples
    samples
    phenotypes
    ?
    GTEx
    N=9,962
    TCGA
    N=11,284
    SRA
    N=49,848
    samples
    expression
    estimates
    gene
    exon
    junctions
    ERs
    Answer meaningful
    questions about
    human biology and
    expression
    slide adapted from Shannon Ellis
    #PSB19 doi.org/10.1093/nar/gky102

    View Slide

  9. Category Frequency
    F 95
    female 2036
    Female 51
    M 77
    male 1240
    Male 141
    Total 3640
    Even when information is provided, it’s not always
    clear…
    sra_meta$S
    ex
    “1 Male, 2 Female”, “2 Male, 1 Female”,
    “3 Female”, “DK”, “male and female”
    “Male (note: ….)”, “missing”, “mixed”,
    “mixture”, “N/A”, “Not available”, “not
    applicable”, “not collected”, “not
    determined”, “pooled male and female”,
    “U”, “unknown”, “Unknown”
    slide adapted from Shannon Ellis
    #PSB19 doi.org/10.1093/nar/gky102

    View Slide

  10. Goal :
    to accurately
    predict critical
    phenotype
    information for
    all samples in
    recount
    gene, exon, exon-exon junction and expressed region RNA-Seq data
    SRA
    Sequence Read Archive
    N=49,848
    GTEx
    Genotype Tissue Expression
    Project
    N=9,662
    divide
    samples
    build and
    optimize
    phenotype
    predictor
    training
    set
    predict
    phenotypes
    across SRA
    samples
    test
    accuracy
    of
    predictor
    predict
    phenotypes
    across
    samples in
    TCGA
    test
    set
    TCGA
    The Cancer Genome
    Atlas
    N=11,284
    slide adapted from Shannon Ellis
    #PSB19 doi.org/10.1093/nar/gky102

    View Slide

  11. Sex Female Male
    Age/Development Fetus Child Adolescent Adult
    Race/Ethnicity Asian Black Hispanic White
    Tissue Site 1 Cerebral cortex Hippocampus Brainstem Cerebellum
    Tissue Site 2 Frontal lobe Temporal lobe Midbrain Basal ganglia
    Tissue Site 3 Dorsolateral
    prefrontal cortex
    Superior temporal
    gyrus
    Substantia nigra Caudate
    Hemisphere Left Right
    Brodmann Area 1-52
    Disease Status Disease Neurological control
    Disease Brain tumor Alzheimer’s disease Parkinson’s disease Bipolar disorder
    Tumor Type Glioblastoma Astrocytoma Oligodendroglioma Ependymoma
    Clinical Stage 1 Grade I Grade II Grade III Grade IV
    Clinical Stage 2 Primary Secondary Recurrent
    Viability Postmortem Biopsy
    Preparation Frozen Thawed
    Ashkaun Razmara, in prep.
    github.com/LieberInstitute/recount-brain
    #PSB19

    View Slide

  12. The recount2 team
    Hopkins
    Kai Kammers
    Shannon E. Ellis
    Margaret Taub
    Kasper Hansen
    Jeff T. Leek
    Ben Langmead
    OHSU
    Abhinav Nellore
    LIBD
    Leonardo
    Collado-Torres
    Andrew E. Jaffe
    recount-brain
    Ashkaun Razmara
    Dustin J. Sokolowski
    Michael D. Wilson
    Sean Davis
    Funding
    NIH R01 GM105705
    NIH 1R21MH109956
    CONACyT 351535
    AWS in Education
    Seven Bridges
    IDIES SciServer
    Hosting recount2
    #PSB19

    View Slide

  13. Research Symbiont awards:
    http://researchsymbionts.org/
    Apply!!!
    #PSB19

    View Slide

  14. expression data for ~70,000 human samples
    (Multiple) Postdoc positions available to
    - develop methods to process and analyze data from recount2
    - use recount2 to address specific biological questions
    This project involves the Hansen, Leek, Langmead and Battle labs at JHU
    Contact: Kasper D. Hansen ([email protected] | www.hansenlab.org)
    #PSB19
    Or ask me @fellgernon and I’ll put you in touch with the #recount2 PIs

    View Slide