Seminar at JMU Biology Department Seminar, October 11, 2013. Title: "Bioinformatics-as-a-Service: Applications, Opportunities, and Challenges with Large-Scale -Omics Data." Audience: Undergrads, faculty, grads.
consortium of biorepositories linked to electronic medical records data for conducting genomic studies. • Many phenotypes - no “ascertainment” necessary. • Multiple sites = replication, joint analysis. • One of the goals: assessment. Will this work? October 10, 2013 bioinformatics.virginia.edu
October 10, 2013 bioinformatics.virginia.edu SNP (say “snip”) = Single Nucleotide Polymorphism. A common variant (mutation) in the population. Millions exist throughout the human genome.
G e µ β = + + + ∑ 2 ~ (0, ) e G MVN σ Φ ˆ ˆ ( ) i j ji i i e y c G µ β = − + + ∑ ˆ i i i i e g e µ β = + + Ancestry: PCA Relatedness: used linear mixed effects model October 10, 2013 bioinformatics.virginia.edu • Marker call rate • Sample call rate • Mendelian errors • Discordant calls • Minor allele frequency • Hardy-Weinberg equilibrium • …
interactions. – B. Maher 2008 Nature – T. Manolio, F. Collins, et al. Nature 2009 – T. Manolio 2010 NEJM – Mouse, E. coli, S. cerevisiae • GxG hard to test: – GWAS: 1.25×1011 tests – Computationally difficult – Multiple testing • Limit GxG tests to: – SNPs with large main effect – GxG with biological relevance October 10, 2013 bioinformatics.virginia.edu
explains fat distribution better with biomarkers (vs anthro alone). • Important biomarkers varied by trait. • RF >>> linear regression. October 10, 2013 bioinformatics.virginia.edu PLoS ONE Aug 2012 7(8):e43502
variant analysis in IGF1 • GWAS in MEC, pathway analysis • Other statistical analysis, pathway analysis, etc. • All of the above, and others at UHCC and USC: host genetics x microbiome U01 proposal (funded Aug 2012) October 10, 2013 bioinformatics.virginia.edu
Published in 1965 • Pioneered algorithm development for – Comparison of protein sequences – Derivation of evolutionary histories from alignments “In this paper we shall describe a completed computer program for the IBM 7090, which to our knowledge is the first successful attempt at aiding the analysis of the amino acid chain structure of protein.” October 10, 2013 bioinformatics.virginia.edu
regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.” M. Dayhoff, February 27, 1967 October 10, 2013 bioinformatics.virginia.edu
Rush”. Genome Biol 2013. • What if microscopes got 10x more powerful every year… – Could do the same experiment every few months with the same slide. – Make new discoveries! Publish interesting findings! • Not too different from genomics… – Sequence a Human Genome (HGP 2001) – Sequence 1000 human genomes (1000genomes.org) – Sequence 2000 human genomes (1000genomes.org) – Sequence Human Microbiomes (hmpdacc.org) – Sequence Earth (earthmicrobiome.org)
from 2012 ISMB at stephenturner.us/slides) • Training: how to make scalable and sustainable? • New technologies: how to best support new and emerging technologies? October 10, 2013 bioinformatics.virginia.edu
and timely bioinformatics consulting and data analysis. • Main goals: help collaborators publish and get funding. – 1. Service – 2. Training – 3. Infrastructure October 10, 2013 bioinformatics.virginia.edu
Analysis • Pathway analysis • DNA Methylation • DNA Binding / ChIP-Seq • DNA Variation • Metagenomics • Grant / Manuscript support • Custom development October 10, 2013 bioinformatics.virginia.edu
bioinformatics.virginia.edu • Adaptation to the environment • Bundled analysis – easy answers • Collaboration • Automation vs. innovation • Downstream analysis • New tech: no pre-built pipelines • Training & Infrastructure: help collaborators help themselves!
Sciences Library – Div. Clinical Informatics • Mission: Get researchers connected to the tools and people they need. • Tools: – Galaxy server – VIVO (collaboration) – Wiki (documentation) – CDR – Awesome space October 10, 2013 bioinformatics.virginia.edu
• Integrate microarray analysis with clinical data using machine learning • Made the cover of Transplantation (2012) October 10, 2013 bioinformatics.virginia.edu
How does gene knockout result in leukemia? – What are the downstream molecular effects? • Gene Expression Microarray: QC, differential gene expression, pathway analysis • Results: – KO De-represses a B cell specific gene program – Increased cell-cycle progression • Now: currently looking for mutations in human gene October 10, 2013 bioinformatics.virginia.edu
cancer • Bioinformatics: – How does “tumoroid” grown in 3D compare to tumor tissue? – Gene expression profiling of multiple samples comparing tumoroids to tumors, and cells isolated from tumor margins. Research Vignette #3: Deb Lannigan October 10, 2013 bioinformatics.virginia.edu
interpretation of evidence for attribution of an act of bioterrorism, biocrime, hoax, or inadvertent release of a toxin or biological threat agent. October 10, 2013 bioinformatics.virginia.edu Figures from Turner et al 2013 Report to DOD, “Harnessing Next-Generation Sequencing Capabilities for Microbial Forensics.”