Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Information Retrieval and Text Mining 2020 - Te...

Avatar for Krisztian Balog Krisztian Balog
September 07, 2020

Information Retrieval and Text Mining 2020 - Text Classification: Naive Bayes

University of Stavanger, DAT640, 2020 fall

Avatar for Krisztian Balog

Krisztian Balog

September 07, 2020
Tweet

More Decks by Krisztian Balog

Other Decks in Education

Transcript

  1. Text Classifica on: Naive Bayes [DAT640] Informa on Retrieval and

    Text Mining Krisz an Balog University of Stavanger September 7, 2020 CC BY 4.0
  2. Recap • Text classification ◦ Problem, binary and multiclass variants

    ◦ Evaluation measures ◦ Training text classifiers using words (terms) as features ◦ Term weighting (TFIDF) ◦ Text preprocessing (tokenization, stopwords removal, stemming) 2 / 10
  3. Today • A simple classifier, Naive Bayes, that is applied

    commonly to text classification 3 / 10
  4. Naive Bayes • Example of a generative classifier • Estimating

    the probability of document x belonging to class y P(y|x) = P(x|y)P(y) P(x) • P(x|y) is the class-conditional probability • P(y) is the prior probability • P(x) is the evidence (note: it’s the same for all classes) 4 / 10
  5. Naive Bayes classifier • Estimating the class-conditional probability P(y|x) ◦

    x is a vector of term frequencies {x1 , . . . , xn } P(x|y) = P(x1, . . . , xn|y) • “Naive” assumption: features (terms) are independent: P(x|y) = n i=1 P(xi|y) • Putting our choices together, the probability that x belongs to class y is estimated using: P(y|x) ∝ P(y) n i=1 P(xi|y) 5 / 10
  6. Es ma ng prior class probabili es • P(y) is

    the probability of each class label • It is essential when class labels are imbalanced 6 / 10
  7. Es ma ng feature distribu on • How to estimate

    P(xi|y)? • Maximum likelihood estimation: count the number of times a term occurs in a class divided by its total number of occurrences P(xi|y) = ci,y ci ◦ ci,y is the number of times term xi appears in class y ◦ ci is the total number of times term xi appears in the collection • But what happens if ci,y is zero?! 7 / 10
  8. Smoothing • Ensure that P(xi|y) is never zero • Simplest

    solution:1 Laplace (“add one”) smoothing P(xi|y) = ci,y + 1 ci + m ◦ m is the number of classes 1More advanced smoothing methods will follow later for Language Modeling 8 / 10
  9. Prac cal considera ons • In practice, probabilities are small,

    and multiplying them may result in numerical underflows • Instead, we perform the computations in the log domain log P(y|x) ∝ log P(y) + n i=1 log P(xi|y) 9 / 10