Save 37% off PRO during our Black Friday Sale! »

UNC Professional Orientation Seminar

8c8cb9d49f0ff8139e459414aeb4c055?s=47 Stephen Turner
November 15, 2013

UNC Professional Orientation Seminar

8c8cb9d49f0ff8139e459414aeb4c055?s=128

Stephen Turner

November 15, 2013
Tweet

Transcript

  1. Bioinformatics-as-a-Service: Applications, Opportunities, and Challenges with Large-Scale -Omics Data Stephen

    D. Turner, Ph.D. Bioinformatics Core Director bioinformatics@virginia.edu bioinformatics.virginia.edu Slides available at: stephenturner.us/slides
  2. Today’s Talk 1.  My Background: Genetics, Statistics, & Bioinformatics 2. 

    Bioinformatics: Origins & Contemporary Applications 3.  Bioinformatics Core: Staying Relevant; Research Vignettes November 15, 2013 bioinformatics.virginia.edu
  3. Grad School: 5 years in 5 minutes •  Ph.D. Human

    Genetics, M.S. Applied Statistics –  Research: Genetic Epidemiology –  Working Hypothesis: common disease, common variant •  Lipids: –  Risk factors for CVD •  1 mg/dL ↑ LDL = 1% ↑ risk for CV event. •  1 mg/dL ↓ HDL = 6% ↑ risk for CV event. –  Therapeutic targets –  Easy to phenotype –  Heritable (HDL ~70% heritable!) –  Finding genetic factors ~ HDL = new bio = new treatments. November 15, 2013 bioinformatics.virginia.edu
  4. eMERGE •  The electronic Medical Records and GEnomics Network: a

    consortium of biorepositories linked to electronic medical records data for conducting genomic studies. •  Many phenotypes - no “ascertainment” necessary. •  Multiple sites = replication, joint analysis. •  One of the goals: assessment. Will this work? November 15, 2013 bioinformatics.virginia.edu
  5. Genome-Wide Association Study Manolio TA. N Engl J Med 2010;363:166-176.

    November 15, 2013 bioinformatics.virginia.edu SNP (say “snip”) = Single Nucleotide Polymorphism. A common variant (mutation) in the population. Millions exist throughout the human genome.
  6. Quality Control i j ji i i i y c

    G e µ β = + + + ∑ 2 ~ (0, ) e G MVN σ Φ ˆ ˆ ( ) i j ji i i e y c G µ β = − + + ∑ ˆ i i i i e g e µ β = + + Ancestry: PCA Relatedness: used linear mixed effects model November 15, 2013 bioinformatics.virginia.edu •  Marker call rate •  Sample call rate •  Mendelian errors •  Discordant calls •  Minor allele frequency •  Hardy-Weinberg equilibrium •  …
  7. GWAS: HDL-Cholesterol Peripheral Cell Lipid Source ABCA1 FC CE FC

    CE LCAT Peripheral Cell Lipid Destination LIPC TGàFFA LIPG PLàFFA LPL TGàFFA TG CE CETP Hepatobiliary Elimination
  8. None
  9. Grammatical Evolution of Neural Networks for Epistasis Detection •  Turner

    SD, Ritchie MD, Bush WS. Conquering the Needle-in-a-Haystack: How Correlated Input Variables Beneficially Alter the Fitness Landscape for Neural Networks. Lec Notes Comp Sci. 5483:80-91 (2009). •  Turner SD, Dudek SK, Ritchie MD. Grammatical Evolution of Neural Networks for Discovering Epistasis among Quantitative Trait Loci. Lec Notes Comp Sci. 6023:86-97 (2010). •  Holzinger ER, Buchanan C, Turner SD, Dudek SM, Torstenson ES, Ritchie MD. Initialization Parameter Sweep in ATHENA: Optimizing Neural Networks for Detecting Gene-Gene Interactions in the Presence of Small Main Effects. Genetic and Evolutionary Computation Conference – GECCO 2010: 203-210. ACM Press (2010). •  Turner SD, Dudek SM, Ritchie MD. Incorporating Domain Knowledge into Evolutionary Computing for Discovering Gene- Gene Interaction. 11th Int’l Conference on Parallel Problem Solving From Nature (PPSN), Lecture Notes in Computer Science. 6238(I): 394-403 (2010). •  Turner SD, Dudek SM, Ritchie MD. ATHENA: A Knowledge-Based Hybrid Backpropagation- Grammatical Evolution Neural Network Algorithm for Discovering Epistasis among Quantitative Trait Loci. BMC BioData Mining. 3:5 (2010). y Σ x x x x x Σ Σ Σ Σ Σ Σ Σ
  10. Postdoc: Obesity Epidemiology No Data <10% 10%–14% 15%–19% 20%–24% 25%–29%

    ≥30% 1990 1999 2009 !
  11. Obesity No Data <10% 10%–14% 15%–19% 20%–24% 25%–29% ≥30% 1990

    1999 2009 Central obesity Liver Fat
  12. Obesity Central obesity Liver Fat DXA ($$) MRI ($$$)

  13. Obesity Central obesity Liver Fat DXA ($$) MRI ($$$) Biomarkers

    ($) BMI WHR Lipidomics Adipokines Cytokines
  14. Study Design •  MEC: >215,000 adults •  5 ethnic groups:

    AA, JA, H, NHW, NH. •  30 JA, 30 NHW postmenopausal women •  Anthro data + 60 biomarkers for adipocytokines, inflammation, insulin resistance & lipid profile. •  Random Forest to predict: –  Total body fat (DXA) –  Trunk:periphery fat ratio (DXA) –  Hepatic adiposity (MRI)
  15. Random Forest: Results •  Automatic variable selection using RF: model

    explains fat distribution better with biomarkers (vs anthro alone). •  Important biomarkers varied by trait. •  RF >>> linear regression. November 15, 2013 bioinformatics.virginia.edu PLoS ONE Aug 2012 7(8):e43502
  16. Other UHCC projects •  Obesity biomarkers / RF •  Rare

    variant analysis in IGF1 •  GWAS in MEC, pathway analysis •  Other statistical analysis, pathway analysis, etc. •  All of the above, and others at UHCC and USC: host genetics x microbiome U01 proposal (funded Aug 2012) November 15, 2013 bioinformatics.virginia.edu
  17. Meanwhile… •  Democratization of next-gen sequencing •  UHCC role: Bioinformatics-as-a-service

    •  Interest in consulting / contract research November 15, 2013 bioinformatics.virginia.edu What does bioinformatics mean in 2013?
  18. Bioinformatics Origins •  Rooted in sequence analysis •  Driven by

    need to: -  Collect -  Annotate -  Analyze
  19. Margaret Dayhoff 1925-1983 •  Collected all known protein sequences • 

    Published in 1965 •  Pioneered algorithm development for –  Comparison of protein sequences –  Derivation of evolutionary histories from alignments “In this paper we shall describe a completed computer program for the IBM 7090, which to our knowledge is the first successful attempt at aiding the analysis of the amino acid chain structure of protein.” November 15, 2013 bioinformatics.virginia.edu
  20. IBM 7090

  21. Margaret Dayhoff 1925-1983 “There is a tremendous amount of information

    regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.” M. Dayhoff, February 27, 1967 November 15, 2013 bioinformatics.virginia.edu
  22. What is bioinformatics? Modified from @drewconway

  23. What is bioinformatics? 1960 1970 1980 1990 2000 2010 November

    15, 2013 bioinformatics.virginia.edu
  24. What is bioinformatics? 1960 1970 1980 1990 2000 2010 November

    15, 2013 bioinformatics.virginia.edu
  25. None
  26. Between April-October 2012: Cost of a human genome: +$717 (+12%)

    genome.gov/sequencingcosts
  27. After the Gold Rush… •  Hall, N. “After the Gold

    Rush”. Genome Biol 2013. •  What if microscopes got 10x more powerful every year… –  Could do the same experiment every few months with the same slide. –  Make new discoveries! Publish interesting findings! •  Not too different from genomics… –  Sequence a Human Genome (HGP 2001) –  Sequence 1000 human genomes (1000genomes.org) –  Sequence 2000 human genomes (1000genomes.org) –  Sequence Human Microbiomes (hmpdacc.org) –  Sequence Earth (earthmicrobiome.org)
  28. After the Gold Rush… •  What’s possible next year will

    be the same as what’s possible now. •  Fresh ideas needed! •  Stability will be good for us in the end.
  29. Challenges in Bioinformatics •  Data integration (see data integration talk

    from 2012 ISMB at stephenturner.us/slides) •  Training: how to make scalable and sustainable? •  New technologies: how to best support new and emerging technologies? November 15, 2013 bioinformatics.virginia.edu
  30. UVA Bioinformatics Core •  A centralized resource for providing expert

    and timely bioinformatics consulting and data analysis. •  Main goals: help collaborators publish and get funding. –  1. Service –  2. Training –  3. Infrastructure November 15, 2013 bioinformatics.virginia.edu
  31. Sample prep Sequencing Raw data Differential expression Gene identification Novel

    Genes Discoveries …etc. This is the “stuff” we do in the bioinformatics core! Find out what this “stuff” is at bioinformatics.virginia.edu
  32. bioinformatics.virginia.edu/services •  Gene expression: Microarray Analysis •  Gene expression: RNA-seq

    Analysis •  Pathway analysis •  DNA Methylation •  DNA Binding / ChIP-Seq •  DNA Variation •  Metagenomics •  Grant / Manuscript support •  Custom development November 15, 2013 bioinformatics.virginia.edu
  33. Research Vignettes

  34. Research Vignette #1: Valeria Mas •  Kidney transplant health (GFR)

    •  Integrate microarray analysis with clinical data using machine learning •  Made the cover of Transplantation (2012) November 15, 2013 bioinformatics.virginia.edu
  35. Research Vignette #2: Gomez/Belyea •  Mouse model of leukemia November

    15, 2013 bioinformatics.virginia.edu
  36. Research Vignette #2: Gomez/Belyea •  Mouse model of leukemia – 

    How does gene knockout result in leukemia? –  What are the downstream molecular effects? •  Gene Expression Microarray: QC, differential gene expression, pathway analysis •  Results: –  KO De-represses a B cell specific gene program –  Increased cell-cycle progression •  Now: currently looking for mutations in human gene November 15, 2013 bioinformatics.virginia.edu
  37. Research Vignette #3: Deb Lannigan •  Personalizing breast cancer chemotherapy

    •  Current state of the art: test survival in 2D culture November 15, 2013 bioinformatics.virginia.edu Excise cancer cells Survival assays in 2D culture
  38. •  Goal: Develop 3D culture system to mimic in situ

    cancer •  Bioinformatics: –  How does “tumoroid” grown in 3D compare to tumor tissue? –  Gene expression profiling of multiple samples comparing tumoroids to tumors, and cells isolated from tumor margins. Research Vignette #3: Deb Lannigan November 15, 2013 bioinformatics.virginia.edu
  39. Research vignette #4: U.S. Government •  Microbial forensics: analysis and

    interpretation of evidence for attribution of an act of bioterrorism, biocrime, hoax, or inadvertent release of a toxin or biological threat agent. November 15, 2013 bioinformatics.virginia.edu Figures from Turner et al 2013 Report to Army Research Office, “Harnessing Next-Generation Sequencing Capabilities for Microbial Forensics.”
  40. Other Current Projects •  Metagenomics & Microbial Forensics •  Microarray

    analysis •  RNA-seq •  MeDIP-seq •  ChIP-seq •  GWAS •  Predictive analysis & machine learning for biomarker discovery •  Acquisition and Analysis of public data (GEO, SRA, dbGaP, etc.) •  Grant preparation •  Literature & database searching for gene expression signatures •  Pathway analysis gettinggeneticsdone.blogspot.com/2012/03/pathway-analysis-for-high-throughput.html •  Gene ID conversion gettinggeneticsdone.blogspot.com/2012/03/video-tip-convert-gene-ids-with-biomart.html •  Array annotation gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html November 15, 2013 bioinformatics.virginia.edu
  41. Thank you Web: bioinformatics.virginia.edu E-mail: bioinformatics@virginia.edu Blog: www.GettingGeneticsDone.com Twitter: @genetics_blog

    November 15, 2013 bioinformatics.virginia.edu Slides available at: stephenturner.us/slides