Bayesian Sentiment Analysis with Ruby

Slide 1

Slide 1 text

BAYESIAN SENTIMENT ANALYSIS WITH RUBY ΛRUBY

Slide 2

Slide 2 text

BAYES’ THEOREM

Slide 3

Slide 3 text

Posterior: the probability of our hypothesis being true given the data collected BAYES’ THEOREM

Slide 4

Slide 4 text

Posterior: the probability of our hypothesis being true given the data collected Likelihood: The probability of collecting this data when our hypothesis is true BAYES’ THEOREM

Slide 5

Slide 5 text

Posterior: the probability of our hypothesis being true given the data collected Likelihood: The probability of collecting this data when our hypothesis is true Prior: The probability of the hypothesis being true before collecting any data BAYES’ THEOREM

Slide 6

Slide 6 text

Posterior: the probability of our hypothesis being true given the data collected Priori: The probability of collecting this data under all possible hypotheses Likelihood: The probability of collecting this data when our hypothesis is true Prior: The probability of the hypothesis being true before collecting any data BAYES’ THEOREM

Slide 7

Slide 7 text

BAYES’ THEOREM ON TWEETS P(positive | tweet) = P(tweet | positive) P(positive) P(tweet)

Slide 8

Slide 8 text

THE MATH P(positive | tweet) = P(tweet | positive) P(positive) P(tweet)

Slide 9

Slide 9 text

THE MATH P(positive | tweet) = P(tweet | positive) P(positive) P(tweet) tweet = “I really liked this movie” P(tweet) = This is a constant so we’ll disregard it for now. P(x) = Probability of x occurring P(positive) = 0.5 Since we only have 2 classes (negative and positive) there is a 50% chance that something is positive. P(tweet | positive) = number of times the words in the tweet were in a tweet marked as positive in the training set divided by the total number of words marked as positive in the training set. HOW DO WE GET THIS?

Slide 10

Slide 10 text

THE MATH P(tweet | positive) = number of times the words in the tweet were in a tweet marked as positive in the training set divided by the total number of words marked as positive in the training set. tweet = “I really liked this movie” Number of times ‘liked’ occurs in positive tweets = 1623 Number of words in all positive tweets = 2342 1623 / 2342 = 0.692 probability that a tweet that says ‘liked’ will be a positive tweet 69.2%

Slide 11

Slide 11 text

THE MATH P(positive | tweet) = P(tweet | positive) P(positive) P(tweet) tweet = “I really liked this movie” P(tweet) = This is a constant so we’ll disregard it for now. P(x) = Probability of x occurring P(positive) = 0.5 Since we only have 2 classes (negative and positive) there is a 50% chance that something is positive. P(tweet | positive) = 0.692 The Number of times the words in the tweet were in a tweet marked as positive in the the training set divided by the total number of words marked as positive in the training set. P(positive | tweet) = 0.692 x 0.5

Slide 12

Slide 12 text

THE MATH P(positive | tweet) = P(tweet | positive) P(positive) P(tweet) tweet = “I really liked this movie” P(tweet) = This is a constant so we’ll disregard it for now. P(x) = Probability of x occurring P(positive) = 0.5 Since we only have 2 classes (negative and positive) there is a 50% chance that something is positive. P(tweet | positive) = 0.692 The Number of times the words in the tweet were in a tweet marked as positive in the the training set divided by the total number of words marked as positive in the training set. 0.346 = 0.692 x 0.5 34.6% Probability that the tweet is positive