Talks #6 - Mihai Pricochi - Opinion mining

Opinion mining (Sentiment analysis)

What is opinion mining? • Application of natural language processing,
computational linguistics and text analytics • Aims to determine the attitude of a speaker/writer, or the polarity of a document

What can it do? • polarity classification • “beyond polarity”
classification such as emotion states (angry, sad, happy) • subjectivity/objectivity identification • feature/aspect-based sentiment analysis

How tools does it use? • Scaling systems • Latent
semantic analysis • Support vector machines • “bag of words”

Polarity classification, sounds simple? • How would you do it?

Contrast with standard fact-based textual analysis • Different from text
categorization, which can have many classes; Opinion mining has relatively few classes (positive/negative, 3/5 stars). • The extracted information is different. • The templates for opinion-interested information extraction often generalize well across different domains • Compared to topic, sentiment can often be expressed in a more subtle manner

What makes opinion mining difficult? • Sarcasm • Irony •
Generally, linguistic subtleties

Degrees of positivity • Let’s talk about phones

Degrees of positivity • Let’s talk about phones • Does
“long battery life” sound positive?

“long battery life” sound positive? • How about “the battery lasts 8 hours”?

“long battery life” sound positive? • How about “the battery lasts 8 hours”? • How about “the battery lasts only 8 hours”?

“long battery life” sound positive? • How about “the battery lasts 8 hours”? • How about “the battery lasts only 8 hours”? • Is only the key word here?

“long battery life” sound positive? • How about “the battery lasts 8 hours”? • How about “the battery lasts only 8 hours”? • Is only the key word here? • How about “the phone weights 125 grams” versus “the phone weights only 125 grams”?

Parts of speech • Adjectives seem very important. Are they
the most important?

the most important? • Nouns are important too (“this movie is a gem”)

the most important? • Nouns are important too (“this movie is a gem”) • Let’s not forget verbs (“I love this movie”)

Negation • “I like this book” and “I don’t like
this book” may sound very similar to a computer • But negation doesn’t necessary mean reversal of polarity, for example “No wonder this is considered one of the best”

Text structure • It’s been noticed, most of the times,
in reviews, the last phrase serves as a conclusion to the entire text • This may completely change the polarity of the review in some cases • “The lighting was bad, the music was out of sync with the events… But in the end I think it was a very good movie”

Dataset • Polarity dataset v2.0 – Movie Review Data •
1000 positive and 1000 negative processed reviews. Introduced in Pang/Lee ACL 2004. Released June 2004.

How good is good enough? • How does 70% accuracy
sound?

How good is good enough? • How does 70% accuracy
sound? • The inter-rater reliability is around 79% • So even 100% accuracy with the test datasets would mean raters/reviewers would disagree with the results about 20% of the times in real life situations

Best result until now? • 90.2% • Using: • Bag
of Words – relative frequencies of all words in the text • Appraisal Group by Attitude & Orientation -Total frequency of appraisal groups with each possible combination of Attitude and Orientation, normalized by total number of appraisal groups in the text.

Sounds complicated? How about something simpler? • How about a
system that requires little to no pre-processing?

system that requires little to no pre-processing? • How about a system that is language independent?

system that requires little to no pre-processing? • How about a system that is language independent? • How about a character-based analysis system?

How would it work? (short version) • We find all
character n-grams (groups of n characters) in all the texts • We compare all texts with each other based on the n-grams they have in common and we get a score for each one • We then apply a string kernel on the results and see if it works

Does it work?

Does it work? • If 87% in cross-validation sounds enough,
then yes, it does

Does it work? • If 87% in cross-validation sounds enough,
then yes, it does • And according to what was said earlier it should be more than enough • It’s not the best system, but with 87% accuracy it’s one of the best • And compared to other systems it’s much easier to implement and should work just as well on other training datasets, including ones written in other languages (something most systems don’t support without prior data collection)

But why does it work? • The top score was
obtained for 9-grams, that’s groups of 9 characters (including spaces, punctuation) • 9 characters means groups of about 2 words (in English), and these seem to be enough to extract information related to polarity in movie reviews • 7, 8, 10 n-grams also proved over 86% accuracy, and different n’s should work better or worse on different languages, mostly based on the average number of characters in a word in each language

I don’t really have an ending • But thanks for
listening 

Talks #6 - Mihai Pricochi - Opinion mining

Talks #6 - Mihai Pricochi - Opinion mining

Talks by Softbinator

More Decks by Talks by Softbinator

Other Decks in Programming

Featured

Transcript

Opinion mining (Sentiment analysis)

What is opinion mining? • Application of natural language processing,

What can it do? • polarity classification • “beyond polarity”

How tools does it use? • Scaling systems • Latent

Polarity classification, sounds simple? • How would you do it?

Contrast with standard fact-based textual analysis • Different from text

What makes opinion mining difficult? • Sarcasm • Irony •

Degrees of positivity • Let’s talk about phones

Degrees of positivity • Let’s talk about phones • Does

Degrees of positivity • Let’s talk about phones • Does

Degrees of positivity • Let’s talk about phones • Does

Degrees of positivity • Let’s talk about phones • Does

Degrees of positivity • Let’s talk about phones • Does

Parts of speech • Adjectives seem very important. Are they

Parts of speech • Adjectives seem very important. Are they

Parts of speech • Adjectives seem very important. Are they

Negation • “I like this book” and “I don’t like

Text structure • It’s been noticed, most of the times,

Dataset • Polarity dataset v2.0 – Movie Review Data •

How good is good enough? • How does 70% accuracy

How good is good enough? • How does 70% accuracy

Best result until now? • 90.2% • Using: • Bag

Sounds complicated? How about something simpler? • How about a

Sounds complicated? How about something simpler? • How about a

Sounds complicated? How about something simpler? • How about a

How would it work? (short version) • We find all

Does it work?

Does it work? • If 87% in cross-validation sounds enough,

Does it work? • If 87% in cross-validation sounds enough,

But why does it work? • The top score was

I don’t really have an ending • But thanks for