What is cyberbullying? A Natural Language Approach to Detecting Cyberbullying

What is cyberbullying? A Natural Language Approach to Detecting Cyberbullying
Lorena Mesa ACT-W @ Chicago http://bit.ly/2irmY5V

"We have been in touch with Ms. McGowan's team," Twitter
said in a tweet on Thursday. "We want to explain that her account was temporarily locked because one of her Tweets included a private phone number, which violates of our Terms of Service." Source: CNN

Since no one human or computer can sift through all
social media / online communication, what do you need to care about for margins of errors our statistical models? http://bit.ly/2irmY5V

Hi, I’m Lorena Mesa. http://bit.ly/2irmY5V

How I’ll approach today’s talk 1. What is machine learning?
2. Where does classification fit into machine learning? 3. How can I use Python to solve a classification problem? 4. Example of Python in action - classifying a comment as cyberbullying. http://bit.ly/2irmY5V

http://bit.ly/2irmY5V

What is cyberbullying? Carrie Brown-Breitwieser, PhD, specializes in youth psychiatry
and behavioral health at Sanford Health. http://bit.ly/2irmY5V

“The math-powered applications powering the data economy were based on
choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Machine Learning is a subfield of computer science [that] stud[ies]
pattern recognition and computational learning [in] artificial intelligence. [It] explores the construction and study of algorithms that can learn from and make predictions on data. http://bit.ly/2irmY5V

Put another way A computer program is said to learn
from experience (E) with respect to some task (T) and some performance measure (P), if its performance on T, as measured by P, improves with experience E. (Ch. 1 - Machine Learning Tom Mitchell ) http://bit.ly/2irmY5V

Human Experience Human Experience

Recorded Experience

Classification in Machine Learning

What is the question we want to answer? Identifying the
offender - Is the responding party a cyberbully? Identifying instances of cyberbullying - What is the likelihood a conversation is cyberbullying? - Is the conversation aggressive? Is it not? - At what level is a conversation deemed cyberbullying? http://bit.ly/2irmY5V

Task: Classify a text comment Is a comment an instance
of cyberbullying? http://bit.ly/2irmY5V

Experience: Labeled training data Comment_id | No Comment_id | Yes
http://bit.ly/2irmY5V

Performance Measurement: Is the label correct? Verify if the comment
is successful or not http://bit.ly/2irmY5V

Naive Bayes is a type of probabilistic classifier.

First, why Naive Bayes? 1. Requires a small amount of
training data to start making predictions! 2. Useful if only need to know what is most likely, not the actual percentage of likelihood 3. Can work with missing data! http://bit.ly/2irmY5V

Naive Bayes in stats theory The math for Naive Bayes
is based on Bayes theorem. It states that the likelihood of one event is independent of the likelihood of another event. Naive Bayes classifiers make use of this “naive” assumption. http://bit.ly/2irmY5V

Independent vs. Dependent Events

Assumption: Independent Events

Naive Bayes in Classifying Altruism Q: What is the probability
of a comment being cyberbullying or not? P(c|x) = P(x|c)P(c) / P(x) likelihood of predictor in the class e.g. 28 out of 50 comments have the word “stupid” prior probability of class e.g. 50 of all 150 requests are are cyberbullying prior probability of predictor e.g. 72 of 150 requests have “stupid” http://bit.ly/2irmY5V

Select the class with MAP Maximum a posterori probability (MAP)
label = argmax P(x|c)P(c) P(x) identical for all classes; don’t use it Q: Is P(c|x) bigger for class one (cyberbully) or two (not)? A: Pick the MAP! http://bit.ly/2irmY5V

Why Naive Bayes? There are other classifier algorithms you could
explore but the math behind Naive Bayes is much simpler and suites what we need to do just fine. http://bit.ly/2irmY5V

How do I use Python to detect an instance of
cyberbullying ? http://bit.ly/2irmY5V

Task: Cyberbullying Comment Classification - Source: Kaggle crawled dataset -
Formspring.me (now defunct) forum where users invite others to ask questions, the soliciting user can opt to respond or not. Anonymity is optional. - 858 questions (rows) - Data: Text of question, user asking question, (3) labeled answers indicating if see the answer as , severity score - Labeled with Mechanical Turk - approximately 7% responses cyberbullying

Python Tools for NLP sklearn Open source Python machine learning
library including classification, SVM, regression algorithms! pandas Open source Python data analysis tool with “expressive data structures”. nltk Natural language toolkit for Python, use to filter out stop words! jupyter Open source web app to create and share code, visualizations, explanatory text.

Example of text data in the Kaggle Formspring data set.
Question User Asking Hahah. Funny how u defend that beiber kid? aprilpooh15 Answer its also funny how u stalked my whole twitter! Nice goin! B$*&^!! User answering N/A Cyberbullying Yes Aggression 7 Notes B$*&^!

Task: Train the Cyberbullying Detector

Stemming words - treat words like “shop” and “shopping” alike.
By hand training Python classifer http://bit.ly/2irmY5V

Tokenize Text into a Bag of Words http://bit.ly/2irmY5V

Using sklearn to Train Classifier http://bit.ly/2irmY5V

Task: Classify the comments

http://bit.ly/2irmY5V Floating Point Underflow

Predicting! http://bit.ly/2irmY5V

Understanding the Performance Measurement

AUC Score General function that computes the area under the
ROC curve; tells us the general accuracy of the classifier over all thresholds (x) - false positive (y) - true positive rate General rule of thumb: .90-1 = very good (A) .80-.90 = good (B) .70-.80 = not so good ( C ) .60-.70 = poor (D) .50-.60 = fail (F) http://bit.ly/2irmY5V

False Positives Incorrectly labelled training data can lead to false
positives Question: Hahah. Funny how u defend that beiber kid? Answer: its also funny how u stalked my whole twitter! Nice goin! B$*&^!! Label_1: No Severity_1: 0 Comments_1: n/a ___________ Label_2: Yes Severity_2: 7 Comments_2: B$*&^! ___________ Label_3: Yes Severity_3: 4 Comments_3: B$*&^! http://bit.ly/2irmY5V

Naive Bayes limitations & challenges - Independence assumption is a
simplistic model of the world - Overestimates the probability of the label ultimately selected - Inconsistent labeling of data http://bit.ly/2irmY5V

Improving Predicting Power - More and/or better feature extraction -
Other possible features: Use of emoji / shorthand, is there a reporting policy, are users anonymous - Collecting more data - Does the data introduce a bias? If so, what? - What does the absence of data tell us? - Is there a class imbalance? Are we reflecting the accuracy of the underlying class representation? Can we resample the data to correct this? - Can we use a combo of models? Different performance measurements (e.g. recall, a measure of a classifier’s completeness)? http://bit.ly/2irmY5V

“This is a point I’ll be returning to in future
chapters: we’ve seen time and again that mathematical models can sift through data to locate people who are likely to face great challenges, whether from crime, poverty, or education. It’s up to society whether to use that intelligence to reject and punish them—or to reach out to them with the resources they need.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Types of Naive Bayes Algorithms Multinomial: Can we use samples
to represent the frequencies of classes? Bernoulli: Are the features representable as booleans? Gaussian: Are we working with continuous data? http://bit.ly/2irmY5V

Want to learn more? Kaggle for toy machine learning problems!
Introduction to Machine Learning With Python by Sarah Guido Your local Python user group! John Pavlopoulos et al, Deep Learning for User Comment Moderation Harry Zhang’s 2005 “The Optimality of Naive Bayes” Jake Vanderplas PyCon 2016, “Statistics for Hackers” Brian Lange’s PyData Chicago 2016, “It’s Not Magic: Explaining Classification Algorithms” Association of Computational Linguistics 2017: Accepted Papers Workshop on Abusive Language Online Cathy O’Neil Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy http://bit.ly/2irmY5V

Thank you! http://bit.ly/2irmY5V | Questions? Tweet me @loooorenanicole

What is cyberbullying? A Natural Language Appr...

What is cyberbullying? A Natural Language Approach to Detecting Cyberbullying

More Decks by Lorena Mesa

Other Decks in Technology

Featured

Transcript