Estimating Microbial Diversity

Estimating Microbial Diversity

Slides from Vanderbilt Microbiome Research Meeting on 31 January, 2012

B9ac79232e794df7c8e63e5e0df2fc26?s=128

Chris Fonnesbeck

February 01, 2012
Tweet

Transcript

  1. Estimating Microbial Diversity Chris Fonnesbeck Department of Biostatistics Wednesday, February

    1, 12
  2. Diversity Measures taxon- and phylogenetic-based Wednesday, February 1, 12

  3. α diversity (species richness, evenness) Wednesday, February 1, 12

  4. β diversity (species turnover) Wednesday, February 1, 12

  5. generalization of population estimation methods Wednesday, February 1, 12

  6. sample-based estimate of diversity Wednesday, February 1, 12

  7. Wednesday, February 1, 12

  8. Wednesday, February 1, 12

  9. Wednesday, February 1, 12

  10. n < N Wednesday, February 1, 12

  11. n ≪ N Wednesday, February 1, 12

  12. estimate model data Wednesday, February 1, 12

  13. 10 0 1 2 3 4 5 6 7 8

    9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency community sampling 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency physical sample Wednesday, February 1, 12
  14. 10 0 1 2 3 4 5 6 7 8

    9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency physical sample 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency amplification amplified sample Wednesday, February 1, 12
  15. 10 0 1 2 3 4 5 6 7 8

    9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 Species Frequency sequencing amplified sample identified species Wednesday, February 1, 12
  16. Rarefaction curves Wednesday, February 1, 12

  17. source: Hughes et al. 2001 Wednesday, February 1, 12

  18. Wednesday, February 1, 12

  19. Diversity indices Wednesday, February 1, 12

  20. Shannon-Weiner Index H = − log ∑ i=1 n pi

    pi E = H log n “evenness” Wednesday, February 1, 12
  21. Non-parametric models Wednesday, February 1, 12

  22. Chao1 Chao 1984 = + N ˆ Sobs n2 1

    2n2 Wednesday, February 1, 12
  23. Variance of Chao1 Var( ) = ( + ( /

    + ) S ˆ n2 ( / n1 n2 )4 4 n1 n2 )3 ( / n1 n2 )2 2 Wednesday, February 1, 12
  24. Abundance-based Coverage Estiamtor where = 1 − CACE F1 i

    ∑10 i=1 Fi = + + S ˆ ACE Sabund Srare CACE F1 CACE γ2 ACE N ˆ ACE Wednesday, February 1, 12
  25. >>> x1 array([ 2., 1., 1., 1., 10., 1., 1.,

    1., 1., 1., 1., 29., 1., 1., 3., 2., 1., 31., 11., 159., 408., 23., 62., 95., 42., 105., 4., 7910., 702., 13., 1., 2., 7., 1., 2., 3., 13., 1., 1., 2., 1., 6., 2., 8., 1., 1., 15., 16., 13., 3., 1., 3., 4., 1., 5., 4., 1., 2., 10., 4., 1., 3., 1., 1., 1., 8., 1., 1., 1., 2., 1., 2., 2., 1., 1., 1., 1., 1., 2., 1., 1., 3., 2., 5., 2., 1., 2., 229., 2., 1., 1., 4., 1., 2., 1., 1., 3., 1., 1., 1., 12., 5., 45., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3., 1., 1., 1., 2., 11., 1., 1., 9., 1., 1., 1., 2., 4., 1., 2., 1., 1., 13., 4., 6., 44.]) Example Wednesday, February 1, 12
  26. >>> s = len(x1) >>> n1 = len(x1[x1==1]) >>> n2

    = len(x1[x1==2]) >>> s, n1, n2 (134, 66, 19) >>> s + (n1**2)/(2*n2) # Chao1 estimator 248 >>> np.sqrt(n2*((n1/n2)**4/4 + (n1/n2)**3 + (n1/n2)**2/2)) # SE 31.128764832546761 Example Wednesday, February 1, 12
  27. Chao2 = + N ˆ Sobs (1 − 1/t)Q2 1

    2Q2 Wednesday, February 1, 12
  28. sensitivity to library size from Gihring et al. 2011 Wednesday,

    February 1, 12
  29. Parametric models Wednesday, February 1, 12

  30. 0 1 2 3 4 5 6 7 26 0

    2 4 6 8 10 12 14 16 18 20 22 24 # Individuals # Species Empirical distributions Wednesday, February 1, 12
  31. Empirical distributions Wednesday, February 1, 12

  32. detection parameter E( ) = ni Nipi Wednesday, February 1,

    12
  33. detection parameter = / N ˆ i ni p ˆ

    i Wednesday, February 1, 12
  34. abundance and detection = 1 − (1 − ) pij

    ∏ k=1 nj pijk individual k species j sample i Wednesday, February 1, 12
  35. abundance and detection = 1 − (1 − ) pij

    ∏ k=1 nj pijk = 1 − (1 − pij pij )nj Wednesday, February 1, 12
  36. mark-recapture designs Wednesday, February 1, 12

  37. unique markings Wednesday, February 1, 12

  38. species in first sample n1 species in second sample n2

    m species in also seen in n2 n1 Wednesday, February 1, 12
  39. marked proportion in 2nd sample proportion captured in 1st sample

    = = m n2 n1 N p1 Wednesday, February 1, 12
  40. Lincoln-Petersen estimator = = N ˆ n1 p ˆ1 n1n2

    m Wednesday, February 1, 12
  41. multinomial model P( , , m|N, , ) = n1

    n2 p1 p2 N! m!( − m)!( − m)(N − n)! n1 n2 × ( [ (1 − ) p1p2 )m p1 p2 ] −m n1 × [(1 − ) [(1 − )(1 − ) p1 p2 ] −m n2 p1 p2 ]N−n Wednesday, February 1, 12
  42. multiple sampling occasions Wednesday, February 1, 12

  43. incidence matrix 1 2 3 4 species 1 1 1

    0 0 species 2 0 1 1 1 species 3 0 0 0 1 sample Wednesday, February 1, 12
  44. simplest model: M0 111 p3 110 p2(1-p) 101 p2(1-p) 100

    p(1-p)2 ... ... observations probability (π) P( |N, ) = xijk πijk N! ! ∏ ijk xijk ∏ ijk πxijk ijk Wednesday, February 1, 12
  45. Mh model individual heterogeneity { } ∼ F(p) pi Wednesday,

    February 1, 12
  46. Mh model expected multinomial probabilities = (1 − p dF(p)

    πj ∫ 1 0 K! (K − j)!j! pj )K−j Wednesday, February 1, 12
  47. Mh model estimation jackknife coefficients = N ˆ k ∑

    j=1 K ajkfj capture frequencies Wednesday, February 1, 12
  48. measures of community variation Wednesday, February 1, 12

  49. community 2 community 1 x1 (1) x2 (1) xJ (1)

    ... x1 (2) x2 (2) xJ (2) ... Wednesday, February 1, 12
  50. relative richness = / λ(12) i N(1) i N(2) i

    Wednesday, February 1, 12
  51. relative richness = / λ ˆ(12) i N ˆ(1) i

    N ˆ(2) i Wednesday, February 1, 12
  52. species co-occurrence = ϕ ˆ(12) i | M ˆ (2)

    i R(1) i R(1) i Cam et al. 2000 Wednesday, February 1, 12
  53. unshared species = − B ˆ(12) i N ˆ(2) i

    ϕ ˆ(12) i N ˆ(1) i Cam et al. 2000 Wednesday, February 1, 12
  54. Occupancy models Dorazio and Royle 2005 Dorazio and Royle 2006

    MacKenzie et al. 2005 Wednesday, February 1, 12
  55. presence-absence data 1 0 1 1 1 0 1 Wednesday,

    February 1, 12
  56. presence-absence data 1 0 1 1 1 0 1 Wednesday,

    February 1, 12
  57. sample occurrence of species Wednesday, February 1, 12

  58. Pr(observe species) = Pr(species detected|species present) × Pr(species present) Wednesday,

    February 1, 12
  59. incidence matrix 1 2 ... J species 1 x11 x12

    x1J species 2 x21 x22 ... x2J ... species n xn1 xn2 xnJ sample locations = 0, 1, … , ( samples from each location) xij KJ KJ Wednesday, February 1, 12
  60. 1 2 ... J species 1 x11 x12 x1J species

    2 x21 x22 ... x2J ... species n xn1 xn2 xnJ species n+1 0 0 0 ... ... species N 0 0 0 Wednesday, February 1, 12
  61. 1 2 ... J species 1 x11 x12 x1J species

    2 x21 x22 ... x2J ... species n xn1 xn2 xnJ species n+1 0 0 0 ... ... species N 0 0 0 observed unobserved Wednesday, February 1, 12
  62. 1 2 ... J species 1 x11 x12 x1J species

    2 x21 x22 ... x2J ... species n xn1 xn2 xnJ species n+1 0 0 0 ... ... species N 0 0 0 observed unobserved X Wednesday, February 1, 12
  63. 1 2 ... J species 1 z11 z12 z1J species

    2 z21 z22 ... z2J ... species n zn1 zn2 znJ species n+1 z(n+1)1 z(n+1)2 z(n+1)J ... ... species N zN1 zN2 zNJ Wednesday, February 1, 12
  64. 1 2 ... J species 1 1 z12 1 species

    2 z21 1 ... z2J ... species n 1 zn2 znJ species n+1 z(n+1)1 z(n+1)2 z(n+1)J ... ... species N zN1 zN2 zNJ Wednesday, February 1, 12
  65. Z 1 2 ... J species 1 1 z12 1

    species 2 z21 1 ... z2J ... species n 1 zn2 znJ species n+1 z(n+1)1 z(n+1)2 z(n+1)J ... ... species N zN1 zN2 zNJ Wednesday, February 1, 12
  66. modeling occurrence p( | ) = (1 − zij ψij

    ψzij ij ψij )1−zij Bernoulli model Wednesday, February 1, 12
  67. modeling detection if zij=1 p( | = 1, ) =

    ( ) (1 − xij zij θij K xij θxij ij θij )K−xij (conditional) Wednesday, February 1, 12
  68. joint probability × (1 − ψzij ij ψij )1−zij p(

    , | , ) = xij zij ψij θij [( ) (1 − ] K xij θxij ij θij )K−xij zij Wednesday, February 1, 12
  69. marginal probability of observed species p( | , ) =

    ( ) (1 − xij ψij θij ψij K xij θxij ij θij )K−xij + (1 − )I( = 0) ψij xij Wednesday, February 1, 12
  70. models for detection and occupancy logit( ) = + ψij

    ui αj logit( ) = + θij vi βj Wednesday, February 1, 12
  71. from Dorazio and Royle 2005 Wednesday, February 1, 12

  72. from Dorazio and Royle 2005 Wednesday, February 1, 12

  73. 1. Many diversity measures ignore incomplete or heterogeneous detection 2.

    Detection and presence are often confounded 3. Repeated sampling is an efficient approach to allow detection and occupancy to be estimated 4. Occupancy modeling is a flexible approach for estimating diversity Take-home points Wednesday, February 1, 12