Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PHD_Comprehensive_Public

_themessier
September 28, 2022

 PHD_Comprehensive_Public

The public redacted version of my Ph.D. comprehensive exam's slides. The comprehensive presentation was held on 9th August 2022.

_themessier

September 28, 2022
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Mitigating the spread of hate on social media Presented by:

    Sarah Masud Advisors: Dr. Tanmoy Chakraborty, Dr. Vikram Goyal External Committee: Dr. Srinath Srinivasa (IIIT-B) Internal Committee: Dr. Md. Shad Akhtar (IIIT-D) Dr. Raghava Mutharaju (IIIT-D) Comprehensive Evaluation 9th August, 2022
  2. Outline • Introduction and Motivation • Major Contributions ◦ Detection

    of Hate ◦ Mitigation of Hate ◦ Diffusion of Hate ◦ Tooling and Visualization of Diffusion • Future Work • Student Profile Disclaimer: Subsequent content has extreme language (verbatim from social media), which does not reflect the opinions of myself or my collaborators. Reader’s discretion is advised.
  3. Hatred is an age old problem [1]: Wiki [2]: Youtube

    [3], [4]: Anti-Sematics Schooling [5]: Radio and Rawanda, Image Fig 1 : List of Extremist/Controversial SubReddits [1] Fig3, 4: Twitter hate Speech [3] Fig 2: Youtube Video Incident to Violence and Hate Crime [2] Fig 5: Rwanda Genocide, 1994 [5] “I will surely kill thee“ Story of Cain and Abel
  4. Internet’s policy w.r.t curbing Hate Moderated • Twitter • Facebook

    • Instagram • Youtube Semi- Moderated • Reddit Unmoderated • Gab • 4chan • BitChute • Parler • StormFront • Anonymity has lead to increase in anti-social behaviour [1], hate speech being one of them. • They can be studied at a macroscopic as well as microscopic level. [2] • Exists in various mediums. [1]: Super, John, CyberPsychology & Behavior, 2004 [2]: Luke Munn, Humanities and Social Sciences Communication, Article 53
  5. Definition of Hate Speech • Hate is subjective, temporal and

    cultural in nature. • UN defines hate speech as “any kind of communication that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are.” [1] • Need sensitisation of social media users. [1]: UN hate [2]: Pyramid of Hate Fig 1: Pyramid of Hate [2]
  6. Workflow for Analysing and Mitigating Hate Speech [1]: Tanmoy and

    Sarah, Nipping in the bud: detection, diffusion and mitigation of hate speech on social media, ACM SIGWEB Winter, Invited Publication Our Contributions so far
  7. Questions we ask • Question: Does spread of hate depend

    on the topic under consideration? ◦ Takeaway: Yes, topical information drives hate. ◦ Takeaway: Additionally, exogenous signals are as important as endogenous (in platform) signals to influence the spread of hate. • Question: Is there a middle ground to help users transition from extreme hate to non-hate? ◦ Takeaway: The way to curbing hate speech is more speech. ◦ Takeaway: Free speech and equal opportunity to speech are not same. • Question: How do different endogenous information help in detection of hate? ◦ Takeaway: Context matter in determining hatefulness. ◦ Takeaway: User’s recent history around a tweet captures similar psycho-linguistic patterns.
  8. Hate is the New Infodemic: A Topic-aware Modeling of Hate

    Speech Diffusion on Twitter Sarah Masud, Subhabrata Dutta, Sakshi Makkar , Chhavi Jain , Vikram Goyal , Amitava Das , Tanmoy Chakraborty Published at ICDE 2021
  9. Literature Overview: Hate Analysis Fig 2: Tweeting Behaviour of hateful

    users and their neighbours [1] Fig 1: Affinity of different user types [1] [1]: Ribeiro et al., Websci’18 • Employing Twitter retweet network. • Hateful neighbours tend to follow other hateful users. • Hateful users & their neighbours tend to tweet more and in short intervals, follow more. Fig 3: Centrality of hateful users and their neighbours [1]
  10. Literature Overview: Hate Analysis [1]: Ribeiro et al., WebSci’18 [2]:

    Mathew et al., WebSci '19 Fig 1: Belief Propagation to determine hatefulness of users [1] Fig 2: Repost DAG [2] • Source: GAB as it promotes “free speech”. • User and Network Level Features. • They curated their own list of hateful lexicons. • Initial hateful users were enlisted based on hate lexicon mapping of users. Fig 3: Difference in hateful and non-hateful cascades [2]
  11. Limitations of Existing Diffusion Analysis • Only exploratory analysis. •

    Consider the hate, non-hate to be separate groups. [1] • Generic Information Cascade models do not take content into account, only who follows whom. [2, 3] • How different topics can lead to generation and spread of hate speech in a user network? • How a hateful tweet diffuses via retweets? Motivativation [1]: Mathew et al., WebSci '19 [2]: Wang et al., ICDM’17 [3]: Yang et al., IJCAI,19
  12. Proposed Hate Diffusion Specific Dataset • Crawled a large-scale Twitter

    dataset. ◦ Timeline ◦ Follow network (2-hops) ◦ Meta data • Manually annotated a total of 17k tweets. • Trained a Hate Detection model for our dataset. • Additionally crawled online news articles (600k). [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  13. Hate Diffusion Specific Dataset Fig 1. #tag level information of

    RETINA [1] [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  14. Hate Diffusion Specific Dataset Fig 1. #tag level information of

    RETINA [1] [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  15. Some Interesting observations Fig 1: Hatefulness of different users towards

    different hashtags in RETINA [1] Fig 2: Retweet cascades for hateful and non-hate tweets in RETINA [1] • Different users show varying tendencies to engage in hateful content depending on the topic. • Hate speech spreads faster in a shorter period. [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  16. Problem Statement Given a hateful tweet and associated signals, at

    a given time window predict if the given user (a follower account) will retweet the given hateful tweet. [1] [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  17. Proposed Model: RETINA Fig 1: Exogenous Attention Mechanism [1] [1]:

    Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  18. Proposed Model: RETINA Fig 1: Exogenous Attention Mechanism [1] Fig

    2: Static Retweet Prediction [1] [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  19. Proposed Model: RETINA Fig 1: Exogenous Attention Mechanism [1] Fig

    2: Static Retweet Prediction [1] Fig 3: Dynamic Retweet Prediction [1] [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  20. Experimental Results: RETINA Fig 1: Baseline Comparisons [1] Fig 2:

    Behaviour of cascade for different baselines. Darker bars are hate [1]. [1]: Masud et al., Hate is the New Infodemic: A Topic-aware Modeling of Hate Speech Diffusion on Twitter, ICDE 2021
  21. Proactively Reducing the Hate Intensity of Online Posts via Hate

    Speech Normalization Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty Published at KDD 2022
  22. Hate Intensity • Intensity/Severity of hate speech captures the explicitness

    of hate speech. • High Intensity hate is more likely to contain offensive lexicon, and offensive spans, direct attacks and mentions of target entity. • Low intensity hate is more subtle, usually employing sarcasm and humour. Consuming Coffee is bad, I hate it! (the world can live with this opinion) Lets bomb every coffee shop and kill all coffee makers (this is a threat) Fig 1: Pyramid of Hate [1] [1]: Pyramid of Hate
  23. How to Combat Hate Speech Reactive countering When a hateful

    post has been made and we are intervening to prevent it further spreading. Proactive countering Intervene before the post goes public.
  24. Literature Overview: From Offense to Non-Offense [1]: Santos et al.,

    ACL’18 Fig 1: Unsupervised conversion of Offense to Neutral [1]
  25. Literature Overview: Intervention during Tweet creation • 200k users identified

    in the study. 50% randomly assigned to the control group • H1: Are prompted users less likely to post the current offensive content. • H2: Are prompted users less likely to post content in future. [1]: Katsaros et al., ICWSM ‘22 Fig 1: User behaviour statistics as a part of intervention study [1] Fig 2: Twitter reply test for offense replies. [1]
  26. NACL Dataset • Hateful samples collected from existing Hate Speech

    datasets. • Manually annotated for Hate intensity and hateful spans. • Hate Intensity is marked on a scale of 1-10. • Manual generation of normalised counter-part and its intensity. (k = 0.88) Fig 1: Original and Normalised Intensity Distribution [1] Fig 2: Dataset Stats [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  27. Motivation & Evidence • Reducing intensity is the stepping stone

    towards non-hate. • Does not force to change sentiment or opinion. • Evidently leads to less virality. Fig 1: Difference in predicted number of comments per set per iteration. [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  28. Problem Statement For a given hate sample 𝑡, our objective

    is to obtain its normalized (sensitised) form 𝑡` such that the intensity of hatred 𝜙𝑡 is reduced while the meaning still conveys. [1] 𝜙 𝑡` < 𝜙 𝑡 Fig [1]: Example of original high intensity vs normalised sentence [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  29. Proposed Method: NACL- Neural hAte speeCh normaLizer Hate Intensity Prediction

    (HIP) Hate Span Prediction (HSI) Hate Intensity Reduction (HIR) Fig 1: Flowchart of NACL [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 Extremely Hateful Input (ORIGINAL) Less Hateful Input (SUGGESTIVE) HATE NORMALIZATION Extremely Hateful Input (ORIGINAL) User’s Choice
  30. Hate Intensity Prediction (HIP) Fig 1: HIP + Framework [1]

    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 BERT + BiLSTM + Self Attention + Linear Activation
  31. Hate Intensity Prediction (HIP) Fig 1: Hate Intensity Prediction Baselines

    [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  32. Hate Span Identification (HSI) Fig 1: Hate Normalization Framework [1]

    [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022 ELMO + BiLSTM + Self Attention + CRF
  33. Hateful Span Identification (HSI) Fig 1: Hate Span Prediction Baselines

    [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  34. Hate Intensity Reduction Overall Loss Reward Fig 1: Hate Normalization

    Framework [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  35. Hate Intensity Reduction (HIR) Fig 1: Hate Intensity Reduction Module

    [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  36. Human Evaluation • Employ 20 diverse users to measure the

    quantitativeness of the generated texts. • Metric: ◦ Intensity ◦ Fluency ◦ Adequacy Fig 1: Results of Human Evaluation for NACL-HSR [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  37. Tool: Detects Hateful spans and suggests changes as you type

    Fig 1: Snapshot of NACL tool [1] [1]: Masud et al., Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, KDD 2022
  38. Literature Overview: Hate Dataset Dataset Source & Language (Modality) Year

    Labels Annotation Waseem & Hovy [1] Twitter, English, Texts 2016 R,S,N 16k, E, k = 0.84 Davidson et al. [2] Twitter, English, Texts 2017 H,O,N 25k, C, k = 0.92 Wulczyn et al. [3] Wikipedia comments, English, Texts 2017 PA, N 100k, C, k = 0.45 Gibert et al. [5] Stormfront, English, Texts 2018 H,N 10k, k = 0.62 Founta et al. [4] Twitter, English, Texts 2018 H,A,SM,N 70k, C, k = ? Albadi et al [6] Twitter, Arabic, Texts 2018 H, N 6k, C, k = 0.81 R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Wulczyn et al., WWW’17 [4]: Founta at al., WebSci’18 [5]: Gibert et al., ALW2’18 [6]: Albadi et al., ANLP’20 E- Internal Experts C- Crowd Sourced
  39. Dataset Source & Language (Modality) Year Labels Annotation Mathur et

    al. [1] Twitter, Hinglish, Texts 2018 H, O, N 3k, E, k = 0.83 Rizwan et al. [3] Twitter, Urdu (Roman Urdu), Texts 2020 A, S, L, P,N 10k, E, k=? Gomez et al. [4] Twitter, English, Memes 2020 H, N 150k, C, k = ? ElSherief et al. [11] Twitter, English, Texts 2021 I,E,N Literature Overview: Hate Dataset [1]: Mathur et al., AAAI’20 [3]: Rizwan et al., EMNLP’19 [4]: Gomez et al., WACv’20 • HASOC [5], Jigsaw Kaggle [6], SemEval [7], FB Hate-Meme Challenge [8], • WOAH [9], CONSTRAINT [10] [5]: HASOC [6]: Jigsaw Kaggle [7]: SemEval [8]: FB Hate-Meme [9]: WOAH [10]: CONSTRAINT [11]: ElSheried et al., EMNLP’21 E- Internal Experts C- Crowd Sourced R- Racism S- Sexism H- Hate PA- Personal Attack A- Abuse SM- Spam O- Offensive L- Religion N- Neither I- Implicit E- Explicit
  40. Literature Overview: Hate Detection • N-gram Tf-idf + LR/SVM [1,2]

    • Glove + CNN, RNN [3] • Transformer based ◦ Zero , Few Shot [4] ◦ Fine-tuning [5] ◦ HateBERT [6] • Generation for classification [7,11] • Multimodality ◦ Images [8] ◦ Historical Context [9] ◦ Network and Neighbours [10] ◦ News, Trends, Prompts [11] [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: Barjatiya et al., WWW’17 [4]: Pelican et al., EACL Hackashop’21 [5]: Timer et al. ,EMNLP’21 [6]: Caselli et al., WOAH’21 [7]: Ke-Li et al. [8]: Kiela et al., NeuIPS’20 [9]: Qian et al., NAACL’19 [10]: Mehdi et al., IJCA’20, Vol 13 [11]: Badr et al.,
  41. Limitations of Existing Datasets • A myopic approach for hate

    speech datasets using hate lexicons. [1, 2] • The hate speech in real world goes beyond hateful slurs. [3] • Limited Study in Hinglish context. [1]: Waseem & Hovy, NAACL’16 [2]: Davidson et al., WebSci’17 [3]: ElSheried et al., EMNLP’21 REDACTED INFORMATION FOR A WIP
  42. DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for

    Information Diffusion on Large Networks Dhruv Sahnan, Vasu Goel, Sarah Masud, Chhavi Jain, Vikram Goyal, Tanmoy Chakraborty Published at ACM TKDD 2022 Tool’s Demo Video: https://www.youtube.com/watch?v=yu1DsfnJk10
  43. Existing Issues with Diffusion Visualization Engineering Challenges • Closed-sourced desktop

    tools. • Inactive open-source tools. • Handling large graphs. • Level of modularity. Research Challenges • Test multiple hypotheses. • Experiments with the newer version of a diffusion model. • Reproducible and extendable results.
  44. User Requirement Survey Can you list out some limitations that

    you may find while using some of the said tools? • (C1): Lack of support for large networks. • (C2): Lack of support for different graph-input formats. • (C3): Resource and memory-intensive tools are hard to set up. • (C4): Lack of scriptability and customizability, and less interactive UI. According to your use cases, can you list three most important features that you feel must be present in any network diffusion visualization tool that you may use? • (F1): Easy to use, customized visualization of diffusion. • (F2): Spatio-temporal analysis of the information flow. • (F3): Availability of key network and diffusion statistics at a glance. • (F4): Ability to save and load checkpoints. Complete Survey Available At [1] [1]: Dhruv et al., DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks, Accepted in TKDD
  45. Proposed Tool: DiVA • Upload custom networks in multiple networkx

    supported formats. • Run standard as well as user defined epidemic models. • Dual diffusion analysis mode for comparative study • Provision of saving results as network, diffusion raw outputs, PDF, extensible to dashboard. • Web-based. [1]: Dhruv et al., DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks, Accepted in TKDD
  46. Performance Evaluation • Reloading of saved networks leads to improvements

    in time to load. [1]: Dhruv et al., DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks, Accepted in TKDD
  47. [1]: Dhruv et al., DiVA: A Scalable, Interactive and Customizable

    Visual Analytics Platform for Information Diffusion on Large Networks, Accepted in TKDD Dual Mode
  48. Usability Survey Fig 2: Comparative Study [1] Fig 1: SUS

    Study [1] [1]: Dhruv et al., DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks, Accepted in TKDD • Despite being more feature DiVA is pretty easy to use and rates high on overall capabilities. • Expert and novice SUS scores are 83.1 and 74.6
  49. Improving our understanding of Implicit Hate • Implicit has been

    established to be harder for the system to detect. • This is different from obfuscation. • As a result of improving our understanding of Implicit hate we can ◦ Improve hate speech classification via auxiliary modeling of explanation. ◦ Use the explanation as standalone service for practitioners and content moderators.
  50. Existing Implicit Hate Corpuses Dataset # Explicit # Implicit Gab

    [1] 6306 3900 Latent Hate [2] 1089 7100 AbuseEval [3] 2129 798 Samples from implicit hate corpuses: • Latent Hate: they don't look human. they look like a bunch of ugly monkeys, sorry monkeys r better looking !!! [2] • Gab: If you're white and like Niggers you've got serious fucking mental issues. [1] [1]: Kennedy et al. , PsyArXiv’18 [2]: ElSheried et al., EMNLP’21 [3]: Caselli et al., LREC’20 Fig 1: Intra-class JS Distance for I, E, N classes in Latent Hate [2]
  51. Course Work & TAship (Update) S.No. Course Name Credit CGPA

    1 Information Retrieval 4 9 2 Mining Large Networks 4 8 3 Machine Learning 4 7 4 Natural Language Processing 4 8 5 Deep Learning 4 9 6 Topics for AI in Software Eng. 4 10 7 Research Methods 2 9 8 Independent Study 4 *2 10 9 Thesis Credits 4 * 8 - Total credits completed 64 + 2 + 4 Current CGPA: 8.73: TA: • Mining Large Networks (Graduate Level) • Natural Language Processing (Graduate Level)
  52. Accepted Publications • Hate is the New Infodemic: A Topic-aware

    Modeling of Hate Speech Diffusion on Twitter (ICDE 2021) • Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization (KDD 2022) • DiVA: A Scalable, Interactive and Customizable Visual Analytics Platform for Information Diffusion on Large Networks (ACM TKDD 2022) Miscellaneous • Survey: Handling Bias in Toxic Speech Detection: A Survey (Under Review) • Essay: Nipping in the bud: detection, diffusion and mitigation of hate speech on social media (ACM SIGWEB Newsletter, Invited Publication) • Tutorials Conducted: Combating Online Hate Speech (WSDM 2021, ECML PKDD 2021) • Dashboard: RobinWatch (robinwatch.github.io)