Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CDSBMexico

 CDSBMexico

From learning to using to teaching to developing R

Leonardo Collado Torres
@fellgernon #rstats #teaching #CDSBMexico

7382f7fe30561274624635116513ca37?s=128

Leonardo Collado-Torres

July 30, 2018
Tweet

Transcript

  1. TIB.2018(R para todos) & Latin American R/Bioconductor Developers Workshop From

    learning to using to teaching to developing R Leonardo Collado Torres @fellgernon #rstats #teaching #CDSBMexico https://speakerdeck.com/lcolladotor/CDSBMexico
  2. @lcgunam

  3. @cendrinou https://www.stat.berkeley.edu/users/sandrine/ Sandrine Dudoit Fall 2007

  4. Who knows about ? Sandrine Dudoit: She’s one of the

    @Bioconductor project founders! @cendrinou
  5. http://www.wholebiome.com/team.html#james-bullard James Bullard January 2008 1 week intense course

  6. @AlexielMedyna http://liigh.unam.mx/profile/dra-alejandra-medina-rivera/ Alejandra Medina Rivera BioC2008 Developer’s day + 2

    conference days Supported by @lcgunam
  7. @Bioconductor https://bioconductor.org/help/course-materials/2008/BioC2008/

  8. #lattice Deepayan Sarkar https://github.com/deepayan

  9. #ShortRead @mt_morgan @Bioconductor http://bioconductor.org/packages/ShortRead

  10. @fellgernon & Osam http://lcolladotor.github.io/courses/Courses/R/ Fall 2008

  11. @areyesq http://alejandroreyes.org/ Alejandro Reyes (first BioC: 2009) BioC2009 + BioC2010

    + BioC2011 Developer’s day + 2 conference days + Europe Bioc 2010 http://www- huber.embl.de/biocdeveleurope2010/ With support from: @Bioconductor, @lcgunam, @WINTERGENOMICS
  12. @fellgernon #rstats #teaching #educollab http://lcolladotor.github.io/ 2008-2011

  13. @fellgernon #rstats #teaching #educollab http://lcolladotor.github.io/courses/Courses/B/ (has videos of me teaching

    :P, it was a pilot for OpenCourseware) TAs: Alejandro Reyes @areyesq José Víctor Moreno Mayar https://geogenetics.ku.dk/staff/?pure=en/persons/475726 José Reyes http://sysbiophd.harvard.edu/people/student-profiles/jose-reyes
  14. None
  15. None
  16. None
  17. Always ask for support! • Support for traveling or registration

    or lodging • Support for teaching: Robert Gentleman gave me free copies of books he had in his office (authors normally get several free copies of books) • Support for community building: almost had Bioconductor’s support in 2010ish for 1 visit, we didn’t give up! #CDSBMexico • Feel free to ask for help! We all started somewhere!! Check your spam box and filters: • Almost lost a scholarship for user!2013 that way :P Check the dates for applying for support! Ask for emails and keep in touch • I asked for PhD application and career advice to Davis McCarthy @davisjmcc in 2010 • That’s how I got into my PhD Socialize! Take advantage of opportunities offered to you!
  18. None
  19. BioC2010 First time presenting a poster about an R package

    (BacterialTranscription): Transcription initiation mapping and transcription unit identification in E. coli Rafael Irizarry https://rafalab.github.io/ @rafalab Ingo Ruczinski http://www.biostat.jhsph.edu/~iruczins/ Them: Have you heard about Johns Hopkins? Me: Johns???? No idea Them: come join us at @jhubiostat !!
  20. 11 Reproducible RNA-seq analysis with Leonardo Collado-Torres @fellgernon #CDSBMexico and

  21. Reference genome Reads

  22. None
  23. GTEx TCGA slide adapted from Shannon Ellis @Shannon_E_Ellis

  24. SRA

  25. Slide adapted from Ben Langmead @BenLangmead

  26. http://rail.bio/ Slide adapted from Ben Langmead @AbhiNellore @BenLangmead

  27. http://blogs.citrix.com/2012/10/17/announcing-general-availability-of-sharefile-with-storagezones/

  28. https://jhubiostatistics.shinyapps.io/recount/

  29. jx 1 jx 2 jx 3 jx 4 jx 5

    jx 6 Coverage Reads Gene Isoform 1 Isoform 2 Potential isoform 3 exon 1 exon 2 exon 3 exon 4 Expressed region 1: potential exon 5
  30. Uses the #SummarizedExperiment @Bioconductor package

  31. > library('recount') > download_study( 'ERP001942', type='rse-gene') > load(file.path('ERP001942 ', 'rse_gene.Rdata'))

    > rse <- scale_counts(rse_gene) https://github.com/leekgroup/recount-analyses/
  32. slide adapted from Jeff Leek @jtleek

  33. Collado-Torres et al, NAR, 2017

  34. Fetal Infant Child Teen Adult 50+ 6 / group, N

    = 36 Discovery data Jaffe et al, Nat. Neuroscience, 2015 Postmortem Human Brain Samples Fetal Infant Child Teen Adult 50+ 6 / group, N = 36 Replication data @andrewejaffe
  35. Jaffe et al, Nat. Neuroscience, 2015 @andrewejaffe

  36. BrainSpan data Jaffe et al, Nat. Neuroscience, 2015 Method implemented

    in the #derfinder @Bioconductor package
  37. expression data for ~70,000 human samples GTEx N=9,962 TCGA N=11,284

    SRA N=49,848 samples expression estimates gene exon junctions ERs slide adapted from Shannon Ellis @Shannon_E_Ellis
  38. expression data for ~70,000 human samples Answer meaningful questions about

    human biology and expression GTEx N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs slide adapted from Shannon Ellis @Shannon_E_Ellis
  39. expression data for ~70,000 human samples samples phenotypes ? GTEx

    N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression slide adapted from Shannon Ellis @Shannon_E_Ellis
  40. Category Frequency F 95 female 2036 Female 51 M 77

    male 1240 Male 141 Total 3640 Even when information is provided, it’s not always clear… sra_meta$S ex “1 Male, 2 Female”, “2 Male, 1 Female”, “3 Female”, “DK”, “male and female” “Male (note: ….)”, “missing”, “mixed”, “mixture”, “N/A”, “Not available”, “not applicable”, “not collected”, “not determined”, “pooled male and female”, “U”, “unknown”, “Unknown” slide adapted from Shannon Ellis @Shannon_E_Ellis
  41. SRA phenotype information is far from complete SubjectID Sex Tissue

    Race Age 662 0 NA female liver NA NA 662 1 NA female liver NA NA 662 2 NA female liver NA NA 662 3 NA female liver NA NA 662 4 NA female liver NA NA 662 5 NA male liver NA NA 662 6 NA male liver NA NA 662 7 NA male liver NA NA 662 8 NA male liver NA NA z z z z slide adapted from Shannon Ellis @Shannon_E_Ellis
  42. Goal : to accurately predict critical phenotype information for all

    samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 TCGA The Cancer Genome Atlas N=11,284 GTEx Genotype Tissue Expression Project N=9,662 slide adapted from Shannon Ellis @Shannon_E_Ellis
  43. Goal : to accurately predict critical phenotype information for all

    samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 GTEx Genotype Tissue Expression Project N=9,662 divide samples build and optimize phenotype predictor training set test accurac y of predicto r test set TCGA The Cancer Genome Atlas N=11,284 slide adapted from Shannon Ellis @Shannon_E_Ellis
  44. Goal : to accurately predict critical phenotype information for all

    samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 GTEx Genotype Tissue Expression Project N=9,662 divide samples build and optimize phenotype predictor training set test accurac y of predicto r predict phenotypes across samples in TCGA test set TCGA The Cancer Genome Atlas N=11,284 slide adapted from Shannon Ellis @Shannon_E_Ellis
  45. Goal : to accurately predict critical phenotype information for all

    samples in recount gene, exon, exon-exon junction and expressed region RNA-Seq data SRA Sequence Read Archive N=49,848 GTEx Genotype Tissue Expression Project N=9,662 divide samples build and optimize phenotype predictor training set predict phenotypes across SRA samples test accurac y of predicto r predict phenotypes across samples in TCGA test set TCGA The Cancer Genome Atlas N=11,284 slide adapted from Shannon Ellis @Shannon_E_Ellis
  46. select_regions() Output: Coverage matrix (data.frame) Region information (GRanges) slide adapted

    from Shannon Ellis @Shannon_E_Ellis
  47. Sex prediction is accurate across data sets Number of Regions

    20 20 20 20 Number of Samples (N) 4,769 4,769 11,245 3,640 99.8 % 99.6 % 99.4 % 88.5 % slide adapted from Shannon Ellis @Shannon_E_Ellis
  48. Sex prediction is accurate across data sets Number of Regions

    20 20 20 20 Number of Samples (N) 4,769 4,769 11,245 3,640 99.8 % 99.6 % 99.4 % 88.5 % slide adapted from Shannon Ellis @Shannon_E_Ellis
  49. http://www.rna-seqblog.com/ Can we use expression data to predict tissue? slide

    adapted from Shannon Ellis @Shannon_E_Ellis
  50. Number of Regions 589 589 589 589 Number of Samples

    (N) 4,769 4,769 7,193 8,951 97.3 % 96.5 % 71.9 % 50.6 % Tissue prediction is accurate across data sets slide adapted from Shannon Ellis @Shannon_E_Ellis
  51. Number of Regions 589 589 589 589 589 Number of

    Samples (N) 4,769 4,769 613 6,579 8,951 97.3 % 96.5 % 91.0 % 70.2 % Prediction is more accurate in healthy tissue 50.6 % slide adapted from Shannon Ellis @Shannon_E_Ellis
  52. > library('recount') > download_study( 'ERP001942', type='rse-gene') > load(file.path('ERP001942 ', 'rse_gene.Rdata'))

    > rse <- scale_counts(rse_gene) > rse_with_pred <- add_predictions(rse_gene) https://github.com/leekgroup/recount-analyses/
  53. expression data for ~70,000 human samples samples phenotypes ? GTEx

    N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression sex tissue M Blood F Heart F Liver slide adapted from Shannon Ellis @Shannon_E_Ellis
  54. None
  55. slide adapted from Kai Kammers Can combine with genotype data

    to identify eQTLs @KaiKammers
  56. biorxiv.org/content/early/2018/01/12/247346 @JFuBiostats @biorxivpreprint

  57. expression data for ~70,000 human samples samples phenotypes ? GTEx

    N=9,962 TCGA N=11,284 SRA N=49,848 samples expression estimates gene exon junctions ERs Answer meaningful questions about human biology and expression sex tissue M Blood F Heart F Liver slide adapted from Shannon Ellis @Shannon_E_Ellis
  58. Sex Female Male Age/Development Fetus Child Adolescent Adult Race/Ethnicity Asian

    Black Hispanic White Tissue Site 1 Cerebral cortex Hippocampus Brainstem Cerebellum Tissue Site 2 Frontal lobe Temporal lobe Midbrain Basal ganglia Tissue Site 3 Dorsolateral prefrontal cortex Superior temporal gyrus Substantia nigra Caudate Hemisphere Left Right Brodmann Area 1-52 Disease Status Disease Neurological control Disease Brain tumor Alzheimer’s disease Parkinson’s disease Bipolar disorder Tumor Type Glioblastoma Astrocytoma Oligodendroglioma Ependymoma Clinical Stage 1 Grade I Grade II Grade III Grade IV Clinical Stage 2 Primary Secondary Recurrent Viability Postmortem Biopsy Preparation Frozen Thawed
  59. Ashkaun Razmara, in prep. @ashkaun_razmara

  60. None
  61. None
  62. Code Example: research.libd.org/recount-brain/example_PMI/example_PMI.html research.libd.org/recount-brain/example_PMI/example_PMI.Rmd Replicates part of the GTEx PMI

    paper by Ferreira et al. doi.org/10.1038/s41467-017-02772-x Ashkaun Razmara, in prep. http://research.libd.org/recount-brain/ @ashkaun_razmara
  63. The recount2 team Hopkins Kai Kammers Shannon Ellis Margaret Taub

    Kasper Hansen Jeff Leek Ben Langmead OHSU Abhinav Nellore LIBD Leonardo Collado-Torres Andrew Jaffe recount-brain Ashkaun Razmara Funding and hosting NIH R01 GM105705 NIH 1R21MH109956 CONACyT 351535 AWS in Education Seven Bridges IDIES SciServer
  64. There are many communities you can join! Ask for help

    / support / #rstats love ^_^
  65. #Rladies @RLadiesGlobal

  66. Check #runconf18 @rOpenSci

  67. This is where it starts for you and us: #CDSBMexico

    @CDSBMexico It’s your home now! Help us build it and maintain it! Submit your blog posts too!
  68. expression data for ~70,000 human samples (Multiple) Postdoc positions available

    to - develop methods to process and analyze data from recount2 - use recount2 to address specific biological questions This project involves the Hansen, Leek, Langmead and Battle labs at JHU Contact: Kasper D. Hansen (khansen@jhsph.edu | www.hansenlab.org) @KasperDHansen @jtleek @BenLangmead @alexisjbattle
  69. None
  70. https://speakerdeck.com/lcolladotor/CDSBMexico