Upgrade to Pro — share decks privately, control downloads, hide ads and more …

COS Talk: Enabling Reproducible Bioinformatics ...

COS Talk: Enabling Reproducible Bioinformatics Research with Service and Training

Slides for talk at Center for Open Science

Stephen Turner

July 22, 2014
Tweet

More Decks by Stephen Turner

Other Decks in Education

Transcript

  1. Enabling Reproducible Bioinformatics Research with Service and Training Stephen D.

    Turner, Ph.D. Bioinformatics Core Director @genetics_blog 1 April 22, 2014 Slides at stephenturner.us/slides
  2. 3

  3. Subdisciplines • Sequence alignment • Genome assembly • Metagenomics •

    Genome annotation • Evolutionary biology / comparative genomics • Analysis of gene expression • Analysis of gene regulation • Genotype-phenotype association • Mutation analysis • Structural biology • Biomarker identification • Pathway analysis / "systems biology" • Literature analysis / text-mining 4
  4. UVA Bioinformatics Core • Founded October 2011 • Mission: build

    and maintain a centralized resource for expert bioinformatics consulting & data analysis and to help collaborators fund & publish their work - 1. Service - 2. Training - 3. Infrastructure building 5
  5. Recent work • Microbiome 2:22 (2014) doi: 10.1186/2049-2618-2-22 • Rhinovirus

    challenge • Isolate/sequence DNA from nasal lavage fluid samples • Analyze phylogenetic content 7
  6. Recent work • Nature comm. 5:3273 (2014). doi: 10.1038/ncomms427 •

    Deleted a gene in mouse. • Gene expression profiling reveals B-cell gene program and constrains differentiation. 8
  7. Recent work • Cell metab. 19:667 (2014). doi: 10.1016/j.cmet.2014.03.005 •

    Mouse model of T2DM • Gene expression profiling reveals new mechanism of insulin secretion suppression. 9
  8. Bioinformatics Challenges • Data integration: how to best integrate multiple

    disparate data types? - See “data integration” talk at stephenturner.us/slides • New technologies: how to best support new and emerging technologies? - See “new technologies” talk at stephenturner.us/slides • Transparency & reproducibility ! • Training 10
  9. Reproducibility barriers • Data: not all available, difficult to access.

    • Tools: inaccessible, poor version control. • Publication: results, data, methods separate. • Incentives: - Scarce funding - Reward for being “first” - Career incentives not obvious • Training: scientists aren’t taught these skills! 11
  10. Enabling Reproducibility • Version control (git/GitHub) • Dynamic documents -

    R, RStudio, knitr: Markdown + embedded R code » HTML/PDF report - IPython notebook • Galaxy - Web-based bioinformatics toolkit - Tracks history, versions, parameters, data • Wiki - Version controlled place to code, scripts, data, and results used for client projects • Training 12
  11. Training • Software Carpentry (software-carpentry.org) - Volunteer organization to teach

    basic computing skills to scientists - Core curriculum: ‣ Basic programming ‣ Version control ‣ Automation ‣ Testing - Two-day bootcamps - Coming soon: train-the-trainer program • Workshops (bioconnector.github.io/workshops) - All course material on GitHub - All R-related materials compiled as RMarkdown dynamic document - Courses: ‣ Introduction to R for life scientists ‣ RNA-seq data analysis (coming soon) ‣ Data visualization with R and ggplot2 (coming soon) ‣ Data manipulation with data.table and dplyr (coming soon) 13
  12. The end? / Open questions 1. What does reproducibility mean?

    2. How to incentivize open science + reproducibility for “traditional” scientists? How to change culture at the senior faculty level? 3. What are the technical barriers (if any) to open science and reproducibility? How solve? 4. Training: how to make sustainable & scalable? 14 twitter @genetics_blog web stephenturner.us core bioinformatics.virginia.edu email [email protected]