Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is cyberbullying? A Natural Language Appr...

What is cyberbullying? A Natural Language Approach to Detecting Cyberbullying

Cyberbullying is a very real threat for community users today, from
incidents like Gamer Gate to tragic incidents that target vulnerable
youth cyberbullying has become an increasing danger for online
communities. In this talk we'll frame cyberbullying as a machine
learning problem, using the Python scientific stack and natural
language learning to determine what, if any, insights can be made to
detect cyberbullying with it.

Lorena Mesa

October 20, 2017
Tweet

More Decks by Lorena Mesa

Other Decks in Technology

Transcript

  1. What is cyberbullying? A Natural Language Approach to Detecting Cyberbullying

    Lorena Mesa ACT-W @ Chicago http://bit.ly/2irmY5V
  2. "We have been in touch with Ms. McGowan's team," Twitter

    said in a tweet on Thursday. "We want to explain that her account was temporarily locked because one of her Tweets included a private phone number, which violates of our Terms of Service." Source: CNN
  3. Since no one human or computer can sift through all

    social media / online communication, what do you need to care about for margins of errors our statistical models? http://bit.ly/2irmY5V
  4. How I’ll approach today’s talk 1. What is machine learning?

    2. Where does classification fit into machine learning? 3. How can I use Python to solve a classification problem? 4. Example of Python in action - classifying a comment as cyberbullying. http://bit.ly/2irmY5V
  5. What is cyberbullying? Carrie Brown-Breitwieser, PhD, specializes in youth psychiatry

    and behavioral health at Sanford Health. http://bit.ly/2irmY5V
  6. “The math-powered applications powering the data economy were based on

    choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
  7. Machine Learning is a subfield of computer science [that] stud[ies]

    pattern recognition and computational learning [in] artificial intelligence. [It] explores the construction and study of algorithms that can learn from and make predictions on data. http://bit.ly/2irmY5V
  8. Put another way A computer program is said to learn

    from experience (E) with respect to some task (T) and some performance measure (P), if its performance on T, as measured by P, improves with experience E. (Ch. 1 - Machine Learning Tom Mitchell ) http://bit.ly/2irmY5V
  9. What is the question we want to answer? Identifying the

    offender - Is the responding party a cyberbully? Identifying instances of cyberbullying - What is the likelihood a conversation is cyberbullying? - Is the conversation aggressive? Is it not? - At what level is a conversation deemed cyberbullying? http://bit.ly/2irmY5V
  10. Task: Classify a text comment Is a comment an instance

    of cyberbullying? http://bit.ly/2irmY5V
  11. First, why Naive Bayes? 1. Requires a small amount of

    training data to start making predictions! 2. Useful if only need to know what is most likely, not the actual percentage of likelihood 3. Can work with missing data! http://bit.ly/2irmY5V
  12. Naive Bayes in stats theory The math for Naive Bayes

    is based on Bayes theorem. It states that the likelihood of one event is independent of the likelihood of another event. Naive Bayes classifiers make use of this “naive” assumption. http://bit.ly/2irmY5V
  13. Naive Bayes in Classifying Altruism Q: What is the probability

    of a comment being cyberbullying or not? P(c|x) = P(x|c)P(c) / P(x) likelihood of predictor in the class e.g. 28 out of 50 comments have the word “stupid” prior probability of class e.g. 50 of all 150 requests are are cyberbullying prior probability of predictor e.g. 72 of 150 requests have “stupid” http://bit.ly/2irmY5V
  14. Select the class with MAP Maximum a posterori probability (MAP)

    label = argmax P(x|c)P(c) P(x) identical for all classes; don’t use it Q: Is P(c|x) bigger for class one (cyberbully) or two (not)? A: Pick the MAP! http://bit.ly/2irmY5V
  15. Why Naive Bayes? There are other classifier algorithms you could

    explore but the math behind Naive Bayes is much simpler and suites what we need to do just fine. http://bit.ly/2irmY5V
  16. How do I use Python to detect an instance of

    cyberbullying ? http://bit.ly/2irmY5V
  17. Task: Cyberbullying Comment Classification - Source: Kaggle crawled dataset -

    Formspring.me (now defunct) forum where users invite others to ask questions, the soliciting user can opt to respond or not. Anonymity is optional. - 858 questions (rows) - Data: Text of question, user asking question, (3) labeled answers indicating if see the answer as , severity score - Labeled with Mechanical Turk - approximately 7% responses cyberbullying
  18. Python Tools for NLP sklearn Open source Python machine learning

    library including classification, SVM, regression algorithms! pandas Open source Python data analysis tool with “expressive data structures”. nltk Natural language toolkit for Python, use to filter out stop words! jupyter Open source web app to create and share code, visualizations, explanatory text.
  19. Example of text data in the Kaggle Formspring data set.

    Question User Asking Hahah. Funny how u defend that beiber kid? aprilpooh15 Answer its also funny how u stalked my whole twitter! Nice goin! B$*&^!! User answering N/A Cyberbullying Yes Aggression 7 Notes B$*&^!
  20. Stemming words - treat words like “shop” and “shopping” alike.

    By hand training Python classifer http://bit.ly/2irmY5V
  21. AUC Score General function that computes the area under the

    ROC curve; tells us the general accuracy of the classifier over all thresholds (x) - false positive (y) - true positive rate General rule of thumb: .90-1 = very good (A) .80-.90 = good (B) .70-.80 = not so good ( C ) .60-.70 = poor (D) .50-.60 = fail (F) http://bit.ly/2irmY5V
  22. False Positives Incorrectly labelled training data can lead to false

    positives Question: Hahah. Funny how u defend that beiber kid? Answer: its also funny how u stalked my whole twitter! Nice goin! B$*&^!! Label_1: No Severity_1: 0 Comments_1: n/a ___________ Label_2: Yes Severity_2: 7 Comments_2: B$*&^! ___________ Label_3: Yes Severity_3: 4 Comments_3: B$*&^! http://bit.ly/2irmY5V
  23. Naive Bayes limitations & challenges - Independence assumption is a

    simplistic model of the world - Overestimates the probability of the label ultimately selected - Inconsistent labeling of data http://bit.ly/2irmY5V
  24. Improving Predicting Power - More and/or better feature extraction -

    Other possible features: Use of emoji / shorthand, is there a reporting policy, are users anonymous - Collecting more data - Does the data introduce a bias? If so, what? - What does the absence of data tell us? - Is there a class imbalance? Are we reflecting the accuracy of the underlying class representation? Can we resample the data to correct this? - Can we use a combo of models? Different performance measurements (e.g. recall, a measure of a classifier’s completeness)? http://bit.ly/2irmY5V
  25. “This is a point I’ll be returning to in future

    chapters: we’ve seen time and again that mathematical models can sift through data to locate people who are likely to face great challenges, whether from crime, poverty, or education. It’s up to society whether to use that intelligence to reject and punish them—or to reach out to them with the resources they need.” ― Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
  26. Types of Naive Bayes Algorithms Multinomial: Can we use samples

    to represent the frequencies of classes? Bernoulli: Are the features representable as booleans? Gaussian: Are we working with continuous data? http://bit.ly/2irmY5V
  27. Want to learn more? Kaggle for toy machine learning problems!

    Introduction to Machine Learning With Python by Sarah Guido Your local Python user group! John Pavlopoulos et al, Deep Learning for User Comment Moderation Harry Zhang’s 2005 “The Optimality of Naive Bayes” Jake Vanderplas PyCon 2016, “Statistics for Hackers” Brian Lange’s PyData Chicago 2016, “It’s Not Magic: Explaining Classification Algorithms” Association of Computational Linguistics 2017: Accepted Papers Workshop on Abusive Language Online Cathy O’Neil Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy http://bit.ly/2irmY5V