Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ph.D defense talk

elipapa
May 16, 2012

Ph.D defense talk

High-throughput experimental and computational tools for exploring immunity and the microbiomed - thesis defense MIT april 2012

elipapa

May 16, 2012
Tweet

More Decks by elipapa

Other Decks in Science

Transcript

  1. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion WHY STUDY THE MICROBIOME? Lean/obese mice studies suggest that gut microbiota affects energy balance Microbiota diversity is reduced by antibiotic therapy, leading to pathogenic infections antibiotic-associated diarrhea, salmonellosis, C.diff colitis Implicated in autoimmune diseases IBD, Diabetes
  2. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion BALANCE BETWEEN IMMUNITY AND GUT MICROBIOME The immune system is one of the main determinants of associated microbial diversity Innate Physical barriers limit microbes from reaching the epithelium APC trigger inflammation to reduce the bacterial load Adaptive B cells secrete polyreactive and antigen-specific IgA T cells mediate killing of specific microorganisms Microbiota influences both innate and adaptive immunity
  3. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion DYSBIOSIS Nature Reviews | Immunology Symbionts Commensals Pathobionts a Immunological equilibrium b Immunological dysequilibrium Regulation Inflammation Regulation Inflammation Pathogens nvolvement of gut bacteria s of animal models. Pre- biotics has been shown to al inflammation in several transgenic rats, IL-10- and conventional conditions ic colitis, whereas they do mation if raised in germ- el of colitis induced by the nic T cells into immuno- ined immunodeficient) or ting gene)) recipient mice, ntestinal pathogens such as und to exacerbate inflam- an be induced in healthy transfer of T cells that are mensal organisms50,60. m reported to be strongly ease is adherent-invasive at inflammatory responses IBD are directed towards sal organisms that have Helicobacter, Clostridium iously, these organisms are Figure 1 | Immunological dysregulation associated with dysbiosis of the microbiota. a | A healthy microbiota contains a balanced composition of many classes of bacteria. Symbionts are organisms with known health- promoting functions. Commensals are permanent REVIEWS Round et al. Nature Reviews Immunology (2009).
  4. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion DYSBIOSIS Nature Reviews | Immunology Symbionts Commensals Pathobionts a Immunological equilibrium b Immunological dysequilibrium Regulation Inflammation Regulation Inflammation Pathogens nvolvement of gut bacteria s of animal models. Pre- biotics has been shown to al inflammation in several transgenic rats, IL-10- and conventional conditions ic colitis, whereas they do mation if raised in germ- el of colitis induced by the nic T cells into immuno- ined immunodeficient) or ting gene)) recipient mice, ntestinal pathogens such as und to exacerbate inflam- an be induced in healthy transfer of T cells that are mensal organisms50,60. m reported to be strongly ease is adherent-invasive at inflammatory responses IBD are directed towards sal organisms that have Helicobacter, Clostridium iously, these organisms are Figure 1 | Immunological dysregulation associated with dysbiosis of the microbiota. a | A healthy microbiota contains a balanced composition of many classes of bacteria. Symbionts are organisms with known health- promoting functions. Commensals are permanent REVIEWS Nature Reviews | Immunology Symbionts Commensals Pathobionts a Immunological equilibrium b Immunological dysequilibrium Regulation Inflammation Regulation Inflammation Pathogens e involvement of gut bacteria dies of animal models. Pre- ntibiotics has been shown to inal inflammation in several 27-transgenic rats, IL-10- and d in conventional conditions ronic colitis, whereas they do ammation if raised in germ- odel of colitis induced by the ogenic T cells into immuno- mbined immunodeficient) or vating gene)) recipient mice, h intestinal pathogens such as found to exacerbate inflam- s can be induced in healthy ive transfer of T cells that are mmensal organisms50,60. sm reported to be strongly disease is adherent-invasive that inflammatory responses tal IBD are directed towards mensal organisms that have as Helicobacter, Clostridium Curiously, these organisms are a and are not typically patho- f all mammals contains these Figure 1 | Immunological dysregulation associated with dysbiosis of the microbiota. a | A healthy microbiota contains a balanced composition of many classes of bacteria. Symbionts are organisms with known health- promoting functions. Commensals are permanent residents of this complex ecosystem and provide no benefit or detriment to the host (at least to our knowledge). REVIEWS Round et al. Nature Reviews Immunology (2009).
  5. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion DYSBIOSIS /2 Nature Reviews | Immunology Host genetics Mutations in NOD2, IL23R, ATG16L and IGRM Lifestyle Diet Stress Disease  T H 1, T H 2 and T H 17 cells Health  T Reg cells Early colonization Birth in hospitals Altered exposure to microbes Medical practices Vaccination use Antibiotic Hygiene Dysbiosis an animal model of experimental colitis . As symbiotic bacteria seem to have evolved mechanisms to provide protection from colonization by pathobionts that are present in the microbiota, does disease result from the linking thes Western pop The bacteria with IBD is trols74. Howe cific pathoge inflammatio to intestinal healthy and i conclusively This raises th tion in IBD a onts that are Indeed, in 19 bacteria in t allergic child levels of colo els of coloniz allergic child studies have intestinal mi atopic eczem is not clear w disease, it se the gut micr developmen individuals. On these Figure 3 | Proposed causes of dysbiosis of the microbiota. We propose that the composition of the microbiota can shape a healthy immune response or predispose to disease. Many factors can contribute to dysbiosis, including host genetics, lifestyle, exposure to microorganisms and medical practices. Host genetics can potentially influence dysbiosis in many ways. An individual with mutations in genes involved in immune regulatory mechanisms or pro-inflammatory pathways could lead to unrestrained inflammation in the intestine. It is possible that inflammation alone influences the composition of the microbiota, skewing it in favour of pathobionts. Alternatively, a host could ‘select’ or exclude the colonization of particular organisms. This selection can be either active (as would be the case of an organism recognizing a particular receptor on the host) or passive (the host environment is more conducive to Round et al. Nature Reviews Immunology (2009).
  6. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion IMMUNITY AND MICROBIOTA ARE DEEPLY INTERLINKED Microbiota is required for the proper development of immune responses Microbial influence on immunity is rarely exerted in isolation
  7. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion IMMUNITY AND MICROBIOTA ARE DEEPLY INTERLINKED Microbiota is required for the proper development of immune responses Microbial influence on immunity is rarely exerted in isolation we need more systems-level data for all the players involved, measuring many variables at high resolution
  8. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HIGH-THROUGHPUT MAPPING OF ANTIBODY RESPONSE Profiling of immune responses traditionally relies on cell sorting or serum measurements No data on the secretions of single lymphocytes quantity and timing of secreted cytokines? antibody affinities?
  9. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROENGRAVING CHIP
  10. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROENGRAVING CHIP / CLOSER LOOK
  11. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROENGRAVING METHOD Glass slides coated with capture Ab Secreted Ab is captured Microengraving Glass slides with replicated microarrays of Ab PDMS Culture dish
  12. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROENGRAVING METHOD Antigen specific spot Non-specific spot anti-mouse Ig OVA (var.conc, Green) anti-mouse Ig (10 nM, Red) Glass slides coated with capture Ab Secreted Ab is captured Microengraving Glass slides with replicated microarrays of Ab PDMS Culture dish
  13. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROENGRAVING METHOD Antigen specific spot Non-specific spot anti-mouse Ig OVA (var.conc, Green) anti-mouse Ig (10 nM, Red) Glass slides coated with capture Ab Secreted Ab is captured Microengraving Glass slides with replicated microarrays of Ab PDMS Culture dish [OVA] [IgG] 10pM 100pM 1nM 10nM 100nM _B220 _IgM y t i n i f f A e p y t o s I l l e w o r c i M DNA IgM _
  14. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HIGH-THROUGHPUT AFFINITY MEASUREMENTS [OVA] Microwells Microarrays [IgG] a b 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.01 0.1 1 10 100 Concentration (OVA, nM) Ag/Ab Ratio n = 3461 10 ï 10 ï 100 101 10 0 0.1  0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ELISA Microengraving [OVA] Microwells Microarrays [IgG] a b 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.01 0.1 1 10 100 Concentration (OVA, nM) Ag/Ab Ratio n = 3461 10 ï 10 ï 100 101 10 0 0.1  0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ELISA Microengraving K app a b
  15. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HIGH-THROUGHPUT AFFINITY MEASUREMENTS 0 10 20 30 40 K d (nM) 1x boost 2x boost [OVA] Microwells Microarrays [IgG] a b 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.01 0.1 1 10 100 Concentration (OVA, nM) Ag/Ab Ratio n = 3461 10 ï 10 ï 100 101 10 0 0.1  0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ELISA Microengraving [OVA] Microwells Microarrays [IgG] a b 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0.01 0.1 1 10 100 Concentration (OVA, nM) Ag/Ab Ratio n = 3461 10 ï 10 ï 100 101 10 0 0.1  0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ELISA Microengraving K app a b
  16. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CELL IDENTIFICATION VIA AFFINITY Ratio (Ag/IgG) 0.001 0.01 0.1 1 10 100 1000 1.0 0.5 0.0 c127 c136 Y3 Microarray Microwells a b Concentration (tet H-2Kb, nM) Green = tet H-2Kb (1 nM) = c136 = Y3 = c127 Red = anti-mouse IgG (10 nM) Ratio (Ag/IgG) 0.001 0.01 0.1 1 10 100 1000 1.0 0.5 0.0 c127 c136 Y3 Microarray Microwells a Concentration (tet H-2Kb, nM) Green = tet H-2Kb (1 nM) = c136 = Y3 = c127 Red = anti-mouse IgG (10 nM)
  17. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CELL IDENTIFICATION VIA AFFINITY !1 !0.8 -0.6 !0.4 !0.2 0 0.2 0.4 0.6 0.8 First Principal Component Second Principal Component !1 !0.8 -0.6 !0.4 !0.2 0 0.2 0.4 0.6 -1.2 Cell staining C136 Y3 C127 Ratio (Ag/IgG) 0.001 0.01 0.1 1 10 100 1000 1.0 0.5 0.0 c127 c136 Y3 Microarray Microwells a b Concentration (tet H-2Kb, nM) Green = tet H-2Kb (1 nM) = c136 = Y3 = c127 Red = anti-mouse IgG (10 nM) Ratio (Ag/IgG) 0.001 0.01 0.1 1 10 100 1000 1.0 0.5 0.0 c127 c136 Y3 Microarray Microwells a Concentration (tet H-2Kb, nM) Green = tet H-2Kb (1 nM) = c136 = Y3 = c127 Red = anti-mouse IgG (10 nM)
  18. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion AFFINITY MAP - AN ALTERNATIVE REPRESENTATION 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) Normalized Ag/Ab ratio (a.u.) median a b 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) Antigen-specific / saturated Low affinity / non-specific Antigen-specific / unsaturated Antigen-specific / saturated Low affinity / non-specific Antigen-specific / unsaturated 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) Normalized Ag/Ab ratio (a.u.) median a b 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) 0.001 0.01 0.1 1 10 100 1000 Concentration (tet H-2Kb, nM) Antigen-specific / saturated Low affinity / non-specific Antigen-specific / unsaturated Antigen-specific / saturated Low affinity / non-specific Antigen-specific / unsaturated
  19. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion ISOLATE ANTIGEN-SPECIFIC CELLS ASC non specific (n=1196) non specific (n=708) 2x boost ASC n = 66 n = 32 increasing K D 1x boost 10pM 100pM 1nM 10nM 100nM −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 -1 10pM 100pM 1nM 10nM 100nM
  20. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion QUANTITATIVE PROFILE OF AN IMMUNIZATION Single cells (12457) BCR/B220+ (11112) Isotype+ (3119) Isotype+Kd + (2135) IgG2b : 71 IgG2a : 29 IgG1 : 17 IgM: 2018 12425 8987 2908 1135 Ag+ IgG1 : 1 IgM: 28 4 1 29 1101 18253 14199 3135 648 12 1 43 592 14 36 * days 0 5 10 15 20 25 30 35 * * * * = cells in culture/LPS 2x boost 1x boost Unimmunized Unimmunized 1x boost 2x boost = immunization * = sacrifice
  21. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUMMARY Quantitative profiles that detail the cellular origin, extent and diversity of the B cell response Flow cytometry and immunosorbant assays data correlated for each single cell Expandable to cytokine profiling, T cell profiling, primary splenocytes Allows cell retrieval
  22. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SINGLE LYMPHOCYTE STIMULATION Difficult to expose single naive lymphocytes to controlled stimuli (eg. bacteria) Capture of antigen by B cells is critical for antibody response studied by biochemical and imaging methods early dynamics ? quantitative ?
  23. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SINGLE LYMPHOCYTE “PULSE-CHASE” EXPERIMENTS Media Region of observation ] flow Pulse #1 Pulse #2 Pulse #1 (_IgM 568) C(t) time Pulse #2 (_IgM 647) b naive B cell
  24. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SINGLE LYMPHOCYTE “PULSE-CHASE” EXPERIMENTS Media Region of observation ] flow Pulse #1 Pulse #2 Pulse #1 (_IgM 568) C(t) time Pulse #2 (_IgM 647) b naive B cell a c Imposed Theory Imposed Exp 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1 time (s) C eff /C C eff /C
  25. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SINGLE LYMPHOCYTE “PULSE-CHASE” EXPERIMENTS Media Region of observation ] flow Pulse #1 Pulse #2 Pulse #1 (_IgM 568) C(t) time Pulse #2 (_IgM 647) b naive B cell a c Imposed Theory Imposed Exp 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1 time (s) C eff /C C eff /C d Tfn 568 Tfn 647 50 100 150 200 250 300 time (s) fluorescence Intensity (a.u.) 0
  26. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion TRACKING B CELL RECEPTOR MICROCLUSTERING 0 30s 60s 90s 120s 180s 210s 9min _IgM 568 labeling pulse e norm. fluor. intensity angle (e) 0 / 2/ 1 0 a b c time after pulse (min) 0 2 4 6 8 10 _IgM 568 labeling pulse norm. fluor intensity 0 1 0 30s 60s 90s 120s 180s 210s 9min _IgM 568 labeling pulse e norm. fluor. intensity angle (e) 0 / 2/ 1 0 a b c time after pulse (min) 0 2 4 6 8 10 _IgM 568 labeling pulse norm. fluor intensity 0 1 0 30s 60s 90s 120s 180s 210s 9min _IgM 568 labeling pulse e norm. fluor. intensity angle (e) 0 / 2/ 1 0 a b c time after pulse (min) 0 2 4 6 8 10 _IgM 568 labeling pulse norm. fluor intensity 0 1
  27. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion BCR DYNAMIC FOLLOWING REPEATED PULSES norm. fluor intensity 1 0 _IgM 647 pulse _IgM 568 pulse aIgM 647 time after pulse (min) 0 2 4 6 overlay aIgM 568 b time after pulse (s) % colocalizat 0 20 40 60 80 d e 60 120 180 240 300 36 # of cells 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 0 100 200 300 # of cells _IgM 647 _IgM 568 0 2 4 6 8 10 12 14 16 18 o568 - o647 (s) o - time after pulse (s)
  28. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MHC CLUSTERS RAPIDLY AND INDEPENDENTLY FROM BCR a norm. fluor intensity 1 0 time after pulse (min) 0 3 6 9 12 15 18 MHC II gfp _IgM 568 _IgM 647 0 3 6 9 12 15 18 0 3 6 9 12 15 18 colocalization 40 60 80 100 120 b
  29. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUMMARY Observe response to sequential doses of ligands in primary naïve B cells Measure early dynamic of labeled B cell receptors Expandable throughput with chip redesign
  30. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion 1. MACHINE LEARNING APPLIED TO MICROBIOME DATA Environmental shotgun 16S rRNA sequencing allows mapping of bacterial composition 16S rRNA phylogeny is a good approximation of microbes distribution (VonMering 2007) Gene content and phylogeny correlate well (Mueller 2011) Microbial compositional data is large and increasingly difficult to mine Cheaper sequencing means that analysis is becoming the limiting step Needed: routine extraction of patterns in microbial data
  31. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion WHAT IS MACHINE LEARNING? Machine learning algorithms use example data to learn and discover structure in datasets classify samples into distinct categories once learnt from example data, can predict Machine learning algorithms are object of extensive research applications in computing, finance, biology,etc.
  32. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion WHAT IS MACHINE LEARNING? Machine learning algorithms use example data to learn and discover structure in datasets classify samples into distinct categories once learnt from example data, can predict Machine learning algorithms are object of extensive research applications in computing, finance, biology,etc.
  33. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion WHAT IS MACHINE LEARNING? Machine learning algorithms use example data to learn and discover structure in datasets classify samples into distinct categories once learnt from example data, can predict Machine learning algorithms are object of extensive research applications in computing, finance, biology,etc.
  34. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion WHAT IS MACHINE LEARNING? Machine learning algorithms use example data to learn and discover structure in datasets classify samples into distinct categories once learnt from example data, can predict Machine learning algorithms are object of extensive research applications in computing, finance, biology,etc.
  35. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MICROBIOME DATA Each microbial taxa is a feature that can be used to discriminate between bacterial communities We want to: find automatically the taxa that discriminate best accurately classify communities according to metadata Taxa Sample1 Sample2 ... A 12 2 B 1 10 C 5 0
  36. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion PREPARING MICROBIOME DATA 16S DNA sequence reads quality filtering, chimera check (MOTHUR) RDP classification AGCTGCTCGA TAAGCTGCTCGA AGCTGCTCGATTCTG OTU Clustering (UCLUST) Representative sequences Taxa Sample1 Sample2 ... A 12 2 B 1 10 C 5 0 OTU table Phylogenetic tree
  37. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set
  38. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set build random forest model what are the best taxa?
  39. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set read taxa abundance from test set build random forest model what are the best taxa?
  40. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set predict test set or read taxa abundance from test set build random forest model what are the best taxa?
  41. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set predict test set or read taxa abundance from test set check prediction build random forest model what are the best taxa?
  42. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set predict test set or read taxa abundance from test set check prediction build random forest model what are the best taxa?
  43. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion LEARNING AND CLASSIFICATION Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set Taxa A B C D Sample1 12 1 5 0 Sample2 2 21 5 10 Sample3 12 11 3 2 Sample4 1 2 0 15 training set test set predict test set or read taxa abundance from test set check prediction build random forest model what are the best taxa? repeat the cross-validation and average
  44. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion RANDOM FOREST Taxa A B C D 12 1 5 0 2 21 5 10 12 11 3 2 pick a random sample build a decision tree pick at random mtry taxa 2 21 5 10 A,B A > 10 B < 11
  45. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion RANDOM FOREST Taxa A B C D 12 1 5 0 2 21 5 10 12 11 3 2 pick a random sample build a decision tree pick at random mtry taxa 2 21 5 10 A,B A > 10 B < 11 repeat Ntree times average / take votes
  46. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUPERVISED LEARNING WORKS WELL
  47. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUPERVISED LEARNING WORKS WELL OTU identity threshold error 0.10 0.15 0.20 0.25 0.30 0.35 habitat • • • • • • • • • 80 85 90 95 host • • • • • • • • • 80 85 90 95
  48. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUPERVISED LEARNING WORKS WELL OTU identity threshold error 0.10 0.15 0.20 0.25 0.30 0.35 habitat • • • • • • • • • 80 85 90 95 host • • • • • • • • • 80 85 90 95 number of samples area under ROC curve 0.6 0.8 1.0 40 60 80 100
  49. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUPERVISED LEARNING WORKS WELL OTU identity threshold error 0.10 0.15 0.20 0.25 0.30 0.35 habitat • • • • • • • • • 80 85 90 95 host • • • • • • • • • 80 85 90 95 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 NO3 predictions (feat selection: significance ranking) measured log[NO3] predicted log[NO3] number of samples area under ROC curve 0.6 0.8 1.0 40 60 80 100
  50. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion INCORPORATING PHYLOGENETICS | NM CD NM UC UC UC 2 3 4 5 1 Healthy / Sick Healthy / Crohn!s / Colitis p = 0.003 CD NM UC 0 0.2 0.4 0.6 0.8 1 CD NM UC 0 0.2 0.4 0.6 0.8 1 1 present absent Hierarchical decision tree outlining the classification of a patient as normal, crohn!s or colitis, depending on whether sequences are present at the given nodes in the phylogenetic tree. Average accuracy is 80%. Decision tree nodes are colored with respect to the hierarchical level. Tree branches are colored based on diagnosis. Bacterial groups in a normal patient are colored green; magenta for Crohn!s samples and cyan for colitis samples. 5 4 1 3 2
  51. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion 2. THE CASE OF IBD Inflammation of autoimmune origin Presenting symptoms: abdominal pain, diarrhea, vomiting, weight loss No known causative agent IBD seems to have a complex etiology: environmental - smoking, western diet ? genetic - autophagy loci (NOD2,ATG16) microbial - correlated with some bacteria, dysbiosis ?
  52. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion IBD TREATMENT IBD alternates between flares (active) and periods of remission (inactive). Long-term immunosuppressants to maintain remission Antibiotic therapy is used empirically to treat flare-ups When medical therapy fails, treatment is bowel resection
  53. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION CAN DISTINGUISH IBD AND HEALTHY Frank et al. survey ï6SHFLILFLW\ 6HQVLWLYLW\ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Frank (AUC = 0.73) Pediatric (AUC = 0.71)
  54. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION CAN DISTINGUISH IBD AND HEALTHY Frank et al. survey ï6SHFLILFLW\ 6HQVLWLYLW\ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Frank (AUC = 0.73) Pediatric (AUC = 0.71) Area under the ROC curve probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one
  55. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION CAN DISTINGUISH IBD AND HEALTHY Frank et al. survey ï6SHFLILFLW\ 6HQVLWLYLW\ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Frank (AUC = 0.73) Pediatric (AUC = 0.71) Pediatric case control ï6SHFLILFLW\ 6HQVLWLYLW\ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 IBD (AUC = 0.83) active IBD (AUC = 0.91) Area under the ROC curve probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one
  56. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MUCOSAL AND STOOL SAMPLE HAVE SIMILAR PROFILES mean difference in abundance (negative = control, positive = ibd) Clostridia (cls) Clostridiales (ordr) Firmicutes (phylum) Ruminococcaceae (family) Lachnospiraceae (family) NA (genus) Subdoligranulum (genus) Porphyromonadaceae (family) Rikenellaceae (family) Alistipes (genus) Coprococcus (genus) Streptococcaceae (family) Eubacterium (genus) Eubacteriaceae (family) Oscillibacter (genus) Odoribacter (genus) Butyricicoccus (genus) Parvimonas (genus) Incertae Sedis XIII (family) Anaerovorax (genus) Akkermansia (genus) Verrucomicrobiaceae (family) Verrucomicrobiae (cls) Citrobacter (genus) Anaerotruncus (genus) Roseburia (genus) Coriobacteriales (ordr) Pasteurellaceae (family) Pasteurellales (ordr) Lactobacillaceae (family) Actinobacteria (cls) Actinobacteria (phylum) Lactobacillales (ordr) Bacilli (cls) EscherLFKLDï6KLJHOOD JHnus) Enterobacteriaceae (family) Enterobacteriales (ordr) Gammaproteobacteria (cls) Proteobacteria (phylum) ï ï     Study Frank Pediatric
  57. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion DEEPER SEQUENCING ALSO HELPED Median relative abundance Anaerotruncus Clostridia Clostridiales Coprococcus Lachnospiraceae Peptococcus Ruminococcaceae NA Incertae Sedis XIII Peptococcaceae Akkermansia Coriobacteriaceae Coriobacteriales Verrucomicrobia Verrucomicrobiaceae Verrucomicrobiae Verrucomicrobiales Acetivibrio Escherichia−Shigella Parabacteroides Phascolarctobacterium Ruminococcus Collinsella Eubacterium Porphyromonadaceae Sporobacter Eubacteriaceae Odoribacter Enterobacteriaceae Enterobacteriales Anaerovorax Oscillibacter Butyricicoccus Proteobacteria Gammaproteobacteria Subdoligranulum Alistipes Rikenellaceae 10 −3 10 −2.5 10 −2 10 −1.5 10 −1 10 −0.5 log10(pvalue) 3 4 5 6
  58. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CHARACTERISTIC TAXA ARE ASSOCIATED WITH IBD !"! !"# !"$ !"% !"& '"! ()* +,+()* Tïvalue Verrucomicrobia Proteobacteria Verrucomicrobiae Gammaproteobacteria Verrucomicrobiales Enterobacteriales Peptococcaceae Verrucomicrobiaceae Enterobacteriaceae Incertae Sedis XIII Porphyromonadaceae Rikenellaceae Peptococcus Ethanoligenens Lawsonia Phascolarctobacterium Sporobacter Anaerotruncus Akkermansia Butyricicoccus Ruminococcus Odoribacter EscherLFKLDï6KLJHOOD Parabacteroides Oscillibacter Anaerovorax Subdoligranulum Alistipes '! ¦ '! ¦ '! ¦ -,+./,0 (+1-.(23 4(0* 4,*3/1.3 5323/3 1)6 +,71)6 (458 +,7(458 eff. size • • • • • • • • • • • • • • • • • • • • • • • • • • • • !"! !"# !"$ !"% phylum class order family genus 9!"# Control CD UC activity antibiotics immunosuppr. rel. abundance (% of max)
  59. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION IDENTIFIES ACTIVITY LEVELS !"#$%"& '#(!$')* +'&, +",*%($* -*)*%* Avg. % reads phylum class order family genus Tïvalue Verrucomicrobia Actinobacteria Proteobacteria Clostridia Bacilli Verrucomicrobiae Actinobacteria Gammaproteobacteria Clostridiales Bifidobacteriales Lactobacillales Verrucomicrobiales Coriobacteriales Enterobacteriales Peptococcaceae Lachnospiraceae Ruminococcaceae Incertae Sedis XIV Bifidobacteriaceae Peptostreptococcaceae Verrucomicrobiaceae Corynebacteriaceae Incertae Sedis XIII Coriobacteriaceae Porphyromonadaceae Eubacteriaceae Enterobacteriaceae Rikenellaceae Roseburia NA Veillonella Oribacterium Coprococcus Blautia Bifidobacterium Acetivibrio Akkermansia Varibaculum Anaerotruncus Atopobium Phascolarctobacterium Sporobacter Corynebacterium Parabacteroides Ruminococcus Odoribacter Anaerostipes EscherLFKLDï6KLJHOOD Collinsella Serratia Eubacterium Oscillibacter Anaerovorax Butyricicoccus Alistipes 6XEGROLJranulum 10 ï 10 ï Shannon div. index 1.0 1.5 2.0 2.5 inactive mild moderate severe control
  60. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION CAN DIFFERENTIATE CD AND UC ï6SHFLILFLW\ 6HQVLWLYLW\ 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 CD vs. UC One vs. all 0.0 0.2 0.4 0.6 0.8 1.0 CD vs. UC (AUC = 0.76) CD (AUC = 0.68) UC (AUC = 0.82) Control (AUC = 0.83) !" #! log 10 Tïvalue) Verrucomicrobia Bacteroidetes Gammaproteobacteria Bacilli Verrucomicrobiae Bacteroidia Lactobacillales Verrucomicrobiales Bacteroidales Verrucomicrobiaceae Bacteroidaceae Eubacteriaceae Alistipes Akkermansia Butyricimonas Coprobacillus Eggerthella Parasutterella Bacteroides Eubacterium $%$& $%$' Avg. % reads phylum class order family genus
  61. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion BLIND VALIDATION CONFIRMS ACCURACY OF THE MODEL classification between: ibd/nonibd Specificity Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 AUC = 0.848 Pediatric case control ï6SHFLILFLW\ 6HQVLWLYLW\ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 IBD (AUC = 0.83) active IBD (AUC = 0.91) Test set (n=68) Training set (n=91)
  62. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion CLASSIFICATION IS BETTER THAN FECAL CALPROTECTIN all IBD vs. control (training & validation) 1 − Specificity Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 calprotectin (AUC = 0.77) SLiME (AUC = 0.85) CD vs. UC classification (training & validation) 1 − Specificity Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 calprotectin (AUC = 0.50) SLiME (AUC = 0.69)
  63. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion SUMMARY Classification can distinguish healthy and IBD patients accurately Patients can be stratified according to activity Identified novel taxa associated with IBD and remission Validated blindly and by fecal calprotectin measurements Careful statistical design should be first step in larger studies
  64. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HOW DOES IT FIT INTO IBD PRACTICE? Clinical feasibility will depend on shrinking cost of sequencing Primary care screen Gastroenterologist review Serology Physician assessment Diagnosis suspected? Endoscopy Definitive diagnosis
  65. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HOW DOES IT FIT INTO IBD PRACTICE? Clinical feasibility will depend on shrinking cost of sequencing Primary care screen Gastroenterologist review Serology Physician assessment Diagnosis suspected? Endoscopy Definitive diagnosis fecal biomarkers & microbiome mapping
  66. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion HOW DOES IT FIT INTO IBD PRACTICE? Clinical feasibility will depend on shrinking cost of sequencing Primary care screen Gastroenterologist review Serology Physician assessment Diagnosis suspected? Endoscopy Definitive diagnosis fecal biomarkers & microbiome mapping
  67. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MACHINE LEARNING DEVELOPMENTS While working on this application, other uses of machine learning techniques for microbiome data appeared: Detect sequence samples mislabelings (Knights 2010) Track the source of microbial contamination (Knights 2011) Predicting response to diet in gnotobiotic mice (Faith 2011) Wastewater bioreactors (Werner 2011)
  68. Introduction Mapping antibody responses Single lymphocyte stimulation Machine learning for

    microbiome data IBD classification Discussion MOVING FORWARD Adapt machine learning methods to use additional data Integrate microbiome tools and immune tools Augment microbiome datasets with immune variables Furthermore, a recent exploratory study found that several host quantitative type, it may still be difficult to determine whether differences in ‘‘discriminating’’ Consequently, taxa that differ may be those that can tolerate inflammation in Figure 1. Processes for Microbial Signature Discovery The process begins with the collection of a large set of sequencing data from various bacterial communities associated with different environments or different host phenotypes. These sequences can serve directly as input to a machine-learning algorithm, or they can be transformed through a preprocessing step (data transformation). Although for microbial community analysis data transformation and supervised learning are typically performed as separate steps, we suggest that predictive models will be improved by the development of novel machine-learning techniques that are informed by the potential data transformations. For example, constructing a good predictive model using metabolic characterizations of metagenomics sequences might be easier if the algorithm has knowledge of the hierarchical relationships between metabolic functions. In the case of marker-gene surveys, a machine-learning algorithm may benefit from knowledge of the phylogenetic relationships of the observed lineages, or the network of average nucleotide similarities between the input sequences. These structures may allow models to share statistical strength across related independent variables in cases where there is high variability within a given environment or host phenotype (i.e., lack of a ‘‘core microbiome’’). Cell Host & Microbe Commentary Knigths et al. Cell Host & Microbe. (2011)
  69. Eric Alm & the Alm lab Chris Love & the

    Love lab Ploegh lab Athos Bousvaros & Dirk Gevers Lynn Bry Funding: HST, Poitras & NSERC THANK YOU! Any questions?