Information Retrieval and Text Mining 2020 - Text Classification: Naive Bayes

Text Classiﬁca on: Naive Bayes [DAT640] Informa on Retrieval and
Text Mining Krisz an Balog University of Stavanger September 7, 2020 CC BY 4.0

Recap • Text classification ◦ Problem, binary and multiclass variants
◦ Evaluation measures ◦ Training text classifiers using words (terms) as features ◦ Term weighting (TFIDF) ◦ Text preprocessing (tokenization, stopwords removal, stemming) 2 / 10

Today • A simple classifier, Naive Bayes, that is applied
commonly to text classification 3 / 10

Naive Bayes • Example of a generative classifier • Estimating
the probability of document x belonging to class y P(y|x) = P(x|y)P(y) P(x) • P(x|y) is the class-conditional probability • P(y) is the prior probability • P(x) is the evidence (note: it’s the same for all classes) 4 / 10

Naive Bayes classiﬁer • Estimating the class-conditional probability P(y|x) ◦
x is a vector of term frequencies {x1 , . . . , xn } P(x|y) = P(x1, . . . , xn|y) • “Naive” assumption: features (terms) are independent: P(x|y) = n i=1 P(xi|y) • Putting our choices together, the probability that x belongs to class y is estimated using: P(y|x) ∝ P(y) n i=1 P(xi|y) 5 / 10

Es ma ng prior class probabili es • P(y) is
the probability of each class label • It is essential when class labels are imbalanced 6 / 10

Es ma ng feature distribu on • How to estimate
P(xi|y)? • Maximum likelihood estimation: count the number of times a term occurs in a class divided by its total number of occurrences P(xi|y) = ci,y ci ◦ ci,y is the number of times term xi appears in class y ◦ ci is the total number of times term xi appears in the collection • But what happens if ci,y is zero?! 7 / 10

Smoothing • Ensure that P(xi|y) is never zero • Simplest
solution:1 Laplace (“add one”) smoothing P(xi|y) = ci,y + 1 ci + m ◦ m is the number of classes 1More advanced smoothing methods will follow later for Language Modeling 8 / 10

Prac cal considera ons • In practice, probabilities are small,
and multiplying them may result in numerical underflows • Instead, we perform the computations in the log domain log P(y|x) ∝ log P(y) + n i=1 log P(xi|y) 9 / 10

Reading • Text Data Management and Analysis (Zhai&Massung), Chapter 15
◦ Section 15.5.2 10 / 10

Information Retrieval and Text Mining 2020 - Te...

Information Retrieval and Text Mining 2020 - Text Classification: Naive Bayes

Krisztian Balog

More Decks by Krisztian Balog

Other Decks in Education

Featured

Transcript

Text Classiﬁca on: Naive Bayes [DAT640] Informa on Retrieval and

Recap • Text classification ◦ Problem, binary and multiclass variants

Today • A simple classifier, Naive Bayes, that is applied

Naive Bayes • Example of a generative classifier • Estimating

Naive Bayes classiﬁer • Estimating the class-conditional probability P(y|x) ◦

Es ma ng prior class probabili es • P(y) is

Es ma ng feature distribu on • How to estimate

Smoothing • Ensure that P(xi|y) is never zero • Simplest

Prac cal considera ons • In practice, probabilities are small,

Reading • Text Data Management and Analysis (Zhai&Massung), Chapter 15