Bayesian classifier (Bayes’ rule) that makes a simplifying (naive) assumption about how the features interact. • Relies on very simple representation of document: • Bag of words Take for example 2 text samples: D1: The quick brown fox jumps over the lazy dog D2: Never jump over the lazy dog quickly The text samples then form a dictionary: Vectors are then formed to represent the count of each word: D1: [1,1,1,0,1,1,0,1,1,0,2] D2: [0,1,0,1,0,1,1,1,0,1,1] { 'brown': 0, 'dog': 1, 'fox': 2, 'jump': 3, 'jumps': 4, 'lazy': 5, 'never': 6, 'over': 7, 'quick': 8, 'quickly': 9, 'the': 10, }