Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CoDAWork 2017 Talk

CoDAWork 2017 Talk

These are the slides from my talk at CoDAWork 2017

Justin Silverman

June 09, 2017
Tweet

More Decks by Justin Silverman

Other Decks in Research

Transcript

  1. TIME-SERIES MODELS FOR MICROBIOME DATA JUSTIN D. SILVERMAN MEDICAL SCIENTIST

    TRAINING PROGRAM COMPUTATIONAL BIOLOGY AND BIOINFORMATICS DUKE UNIVERSITY
  2. SAMPLE COLLECTION AND PROCESSING TECHNICAL NOISE CAN ARISE FROM VARIOUS

    SOURCES Adapted from Hamady. et al., Nature Methods, 2008 Sample Collection 
 and Storage DNA Extraction
 PCR Amplification Sequencing Assign Sequences 
 to Samples Denoise Reads
 or Cluster in OTUs 
 Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Sample 1 23 53 2 44 10 88 94 66 73 67 Sample 2 69 64 70 47 8 97 47 6 64 19 Sample 3 33 100 68 78 59 87 71 31 67 24 Sample 4 5 63 57 27 86 81 83 92 46 62 Sample 5 76 80 46 70 92 92 6 46 37 68 Sample 6 58 7 37 45 25 62 78 44 89 30 Sample 7 10 87 32 80 9 91 59 90 67 77 Sample 8 21 89 73 39 44 80 97 83 80 4 Sample 9 85 77 82 72 15 19 44 4 83 76 Sample 10 67 87 68 58 73 29 87 4 48 79 Sample 11 90 5 28 49 39 20 78 92 12 23 Sample 12 98 93 55 12 54 75 27 95 83 98 Sample 13 31 97 52 9 93 84 45 97 81 27 Sample 14 12 77 22 17 71 12 56 86 18 0 Sample 15 40 30 71 71 54 13 77 96 75 11 Make Count Table
  3. BUILDING A FRAMEWORK MULTINOMIAL-LOGISTIC NORMAL (OR NORMAL ON THE SIMPLEX)

    Y ⇠ Multinomial( ⇡ ) ⇡ ⇠ Logistic Normal( ⇢, ⌅) Y ⇠ Multinomial( ⇡ ) ⇡ = ILR 1 ( ⌘ ) ⌘ ⇠ Multivariate Normal( µ, ⌃)
  4. BUILDING A FRAMEWORK MODELING TIME-EVOLUTION Y t ⇠ Multinomial( ⇡t)

    ⇡t = ILR 1 ( ⌘t) ⌘t = F 0 t ✓t + ⌫t ⌫t ⇠ N ✓t = Gt✓t 1 + !t !t ⇠ N True State with Biological Noise Addition of Technical Noise Observed Counts θ0 θ1 Y1 θ2 Y2 ... θT YT η1 η2 ηT V1 V2 VT W1 W2 WT True State with Biological Noise Observed Counts Addition of Technical Noise ILR
  5. SIMULATED DATA AND RESULTS DYNAMIC REGRESSION MODEL AND IMPUTATION OF

    ZEROES Y t ⇠ Multinomial( ⇡t) ⇡t = ILR 1 ( ⌘t) ⌘t = µt + 1xt + 2xt 1 + ⌫t ⌫t ⇠ N(0, V ) µt = µt 1 + ↵t 1 ↵t = ↵t 1 + !t !t ⇠ N(0, W )
  6. IMPROVING MODEL INFERENCE OVERVIEW 4 Time Series of 30 Taxa

    each with 144 Samples 
 As measured by Effective Sample Size of Sampler ▸ Metropolis within Gibbs → Hamiltonian MCMC
 (Weeks → Days) ▸ Marginalize State Space using Kalman Filter and Smoother
 (Days → ½ Day) ▸ Model Assumptions and Simplifications
 (½ Day → 1 hour)
  7. EXPERIMENTAL DESIGN AND MODELING COMBINED LONGITUDINAL AND CROSS-SECTIONAL MODEL 28

    DAILY SAMPLES 120 HOURLY SAMPLES Attempted 
 Perturbation 20 REPLICATE
 SAMPLES 4x LONGITUDINAL MODEL CROSS-SECTIONAL MODEL Y tj ⇠ Multinomial( ⇡tj) ⇡tj = ILR 1 ( ⌘tj) ⌘tj = µtj + ⌫tj ⌫tj ⇠ N (0 , V ) µtj = µt 1,j + !tj !tj ⇠ N (0 , W ) Y ij ⇠ Multinomial( ⇡ij) ⇡ij = ILR 1 ( ⌘ij) ⌘ij = µj + ⌫ij ⌫ij ⇠
  8. REAL DATA AND RESULTS DOMINATED BY TECHNICAL NOISE Biological Noise

    to Technical Noise Ratio Percent of Noise (Excluding Counting) 
 Attributable to Biology 0 5 10 0.1 10.0 W/v Ratio Density Distribution Posteior Prior Prior vs. Posterior on W/v Ratio (Signal to Noise) ≈23% POSTERIOR PRIOR Tr(W)/Tr(V) Ratio 0.1 10.0 Density
  9. REAL DATA AND RESULTS RIKENELLACAE RATIO CHANGES UPON STARVATION 0

    2 4 6 Nov 30 Dec 07 Dec 14 Dec 21 Balance Value Vessel 1 2 3 4 Posterior mean and 95% credible interval Rikenellacae / Remaining Taxa VESSELS 1 AND 2 WERE 
 ACCIDENTALLY STARVED FOR ≈2 DAYS
  10. SUMMARY SUMMARY ▸ Microbiome Data is Count-Compositional
 ▸ Multinomial-Logistic Normal

    Dynamic Linear Models are a Powerful and General Tool for Count-Compositional Data
 
 ▸ Understanding Noise In Microbiome Data is Essential
  11. ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS Duke University Lawrence David Sayan Mukherjee Rachael Bloom

    Heather Durand Firas Midani Aspen Reese Max Villa University de Girona Juan José Egozcue Vera Pawlowsky-Glahn UNC Chappel Hill Rachel Silverman Funding Duke Collaborative Quantitative Approaches to Problems in the Basic and Clinical Sciences 
 Duke MSTP NIH T32