CoDAWork 2017 Talk

CoDAWork 2017 Talk

These are the slides from my talk at CoDAWork 2017

E6a63597e64ab3951e140d4cc4dae4f8?s=128

Justin Silverman

June 09, 2017
Tweet

Transcript

  1. TIME-SERIES MODELS FOR MICROBIOME DATA JUSTIN D. SILVERMAN MEDICAL SCIENTIST

    TRAINING PROGRAM COMPUTATIONAL BIOLOGY AND BIOINFORMATICS DUKE UNIVERSITY
  2. DIVERSITY AND DISEASE OF THE MICROBIOME THE MICROBIOME IS DYNAMIC

    David, et al., Genome Biology, 2014
  3. SAMPLE COLLECTION AND PROCESSING TECHNICAL NOISE CAN ARISE FROM VARIOUS

    SOURCES Adapted from Hamady. et al., Nature Methods, 2008 Sample Collection 
 and Storage DNA Extraction
 PCR Amplification Sequencing Assign Sequences 
 to Samples Denoise Reads
 or Cluster in OTUs 
 Species 1 Species 2 Species 3 Species 4 Species 5 Species 6 Species 7 Species 8 Species 9 Species 10 Sample 1 23 53 2 44 10 88 94 66 73 67 Sample 2 69 64 70 47 8 97 47 6 64 19 Sample 3 33 100 68 78 59 87 71 31 67 24 Sample 4 5 63 57 27 86 81 83 92 46 62 Sample 5 76 80 46 70 92 92 6 46 37 68 Sample 6 58 7 37 45 25 62 78 44 89 30 Sample 7 10 87 32 80 9 91 59 90 67 77 Sample 8 21 89 73 39 44 80 97 83 80 4 Sample 9 85 77 82 72 15 19 44 4 83 76 Sample 10 67 87 68 58 73 29 87 4 48 79 Sample 11 90 5 28 49 39 20 78 92 12 23 Sample 12 98 93 55 12 54 75 27 95 83 98 Sample 13 31 97 52 9 93 84 45 97 81 27 Sample 14 12 77 22 17 71 12 56 86 18 0 Sample 15 40 30 71 71 54 13 77 96 75 11 Make Count Table
  4. MICROBIOME DATA IS COUNT COMPOSITIONAL SEQUENCING AS COUNTING WITHOUT A

    TOTAL 31% Blue 19% Orange 50% Green
  5. CHALLENGES OF MICROBIOME DATA MICROBIOME DATA IS SPARSE Silverman, et

    al., eLife 2017
  6. BUILDING A FRAMEWORK MULTINOMIAL-LOGISTIC NORMAL (OR NORMAL ON THE SIMPLEX)

    Y ⇠ Multinomial( ⇡ ) ⇡ ⇠ Logistic Normal( ⇢, ⌅) Y ⇠ Multinomial( ⇡ ) ⇡ = ILR 1 ( ⌘ ) ⌘ ⇠ Multivariate Normal( µ, ⌃)
  7. BUILDING A FRAMEWORK MODELING TIME-EVOLUTION Y t ⇠ Multinomial( ⇡t)

    ⇡t = ILR 1 ( ⌘t) ⌘t = F 0 t ✓t + ⌫t ⌫t ⇠ N ✓t = Gt✓t 1 + !t !t ⇠ N True State with Biological Noise Addition of Technical Noise Observed Counts θ0 θ1 Y1 θ2 Y2 ... θT YT η1 η2 ηT V1 V2 VT W1 W2 WT True State with Biological Noise Observed Counts Addition of Technical Noise ILR
  8. SIMULATED DATA AND RESULTS A SIMPLE PERTURBATION EXPERIMENT

  9. SIMULATED DATA AND RESULTS A SIMPLE PERTURBATION EXPERIMENT

  10. SIMULATED DATA AND RESULTS DYNAMIC REGRESSION MODEL AND IMPUTATION OF

    ZEROES Y t ⇠ Multinomial( ⇡t) ⇡t = ILR 1 ( ⌘t) ⌘t = µt + 1xt + 2xt 1 + ⌫t ⌫t ⇠ N(0, V ) µt = µt 1 + ↵t 1 ↵t = ↵t 1 + !t !t ⇠ N(0, W )
  11. IMPROVING MODEL INFERENCE OVERVIEW 4 Time Series of 30 Taxa

    each with 144 Samples 
 As measured by Effective Sample Size of Sampler ▸ Metropolis within Gibbs → Hamiltonian MCMC
 (Weeks → Days) ▸ Marginalize State Space using Kalman Filter and Smoother
 (Days → ½ Day) ▸ Model Assumptions and Simplifications
 (½ Day → 1 hour)
  12. EXPERIMENTAL DESIGN AND MODELING IDENTIFIABILITY AND SIGNAL-TO-NOISE RATIO Petris, Petrone,

    and Campagnoli. Dynamic Linear Models with R 2009
  13. EXPERIMENTAL DESIGN AND MODELING COMBINED LONGITUDINAL AND CROSS-SECTIONAL MODEL 28

    DAILY SAMPLES 120 HOURLY SAMPLES Attempted 
 Perturbation 20 REPLICATE
 SAMPLES 4x LONGITUDINAL MODEL CROSS-SECTIONAL MODEL Y tj ⇠ Multinomial( ⇡tj) ⇡tj = ILR 1 ( ⌘tj) ⌘tj = µtj + ⌫tj ⌫tj ⇠ N (0 , V ) µtj = µt 1,j + !tj !tj ⇠ N (0 , W ) Y ij ⇠ Multinomial( ⇡ij) ⇡ij = ILR 1 ( ⌘ij) ⌘ij = µj + ⌫ij ⌫ij ⇠
  14. REAL DATA AND RESULTS DOMINATED BY TECHNICAL NOISE Biological Noise

    to Technical Noise Ratio Percent of Noise (Excluding Counting) 
 Attributable to Biology 0 5 10 0.1 10.0 W/v Ratio Density Distribution Posteior Prior Prior vs. Posterior on W/v Ratio (Signal to Noise) ≈23% POSTERIOR PRIOR Tr(W)/Tr(V) Ratio 0.1 10.0 Density
  15. REAL DATA AND RESULTS RIKENELLACAE RATIO CHANGES UPON STARVATION 0

    2 4 6 Nov 30 Dec 07 Dec 14 Dec 21 Balance Value Vessel 1 2 3 4 Posterior mean and 95% credible interval Rikenellacae / Remaining Taxa VESSELS 1 AND 2 WERE 
 ACCIDENTALLY STARVED FOR ≈2 DAYS
  16. SUMMARY SUMMARY ▸ Microbiome Data is Count-Compositional
 ▸ Multinomial-Logistic Normal

    Dynamic Linear Models are a Powerful and General Tool for Count-Compositional Data
 
 ▸ Understanding Noise In Microbiome Data is Essential
  17. ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS Duke University Lawrence David Sayan Mukherjee Rachael Bloom

    Heather Durand Firas Midani Aspen Reese Max Villa University de Girona Juan José Egozcue Vera Pawlowsky-Glahn UNC Chappel Hill Rachel Silverman Funding Duke Collaborative Quantitative Approaches to Problems in the Basic and Clinical Sciences 
 Duke MSTP NIH T32