Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Political Attack India

Political Attack India

_themessier

July 13, 2023
Tweet

More Decks by _themessier

Other Decks in Research

Transcript

  1. Analysing Social Media Text: A case study of political attacks

    in India Sarah Masud, Phd@LCS2, IIITD Visiting Researcher @TUM
  2. Background into Indian Politics • General elections and Assembly elections

    are held every 5 year at national and state level respectively. Both elect representatives in the national and state level assembly house. • At National level there are 2 major parties: Bharatiya Janata Party (BJP) and Indian National Congress (INC), they contest elections in all states either independently or via alliance. • There are other regional as well as national parties that contest elections in a fewer states like AAP (Delhi, Goa, Punjab), SP (UP mainly) and so on.
  3. Background into Indian Politics • Since 2014 BJP has been

    in power at national/central level. • In this study we look at Indian Assembly elections of February 2022 where 5 states contested elections. ◦ Uttar Pradesh (UP) ◦ Punjab ◦ Manipur ◦ Goa ◦ Uttrakhand • Interesting UP contributes a large number of seats to the central (80) compared to second highest (48).
  4. Use of Twitter to study Indian politics? • India has

    almost 24 M active users on Twitter (out of 1.4B population). • Most political parties and groups have official Twitter accounts, so do most politicians • Political attacks via social media posts, videos, memes is very common in India. (so are spread of fake news and lynching rumours…) • Support for Indic languages on social media has further accelerated the use of social media for political parties to communicate with general public all year round.
  5. Step 1 of text analysis: Data Curation • Collection •

    Annotation Social media text analysis: STEP 1 • Data Collection • Data Analysis • Data Preprocessing
  6. Data Curation • Twitter API and timeline crawler • The

    elections were held in February, we ran our data collection on a weekly basis from January to March 2022 (i.e before, during and after election). • We shortlisted 100 politicians active on Twitter associated with the states contesting elections. They cover 17 parties and political groups in total. • Employing general knowledge, Twitter bios and Wikipedia we mapped the politicians to their political groups.
  7. Data Curation • We scrape tweets with the hashtag <#STATENAME>Assembly

    Election 2022 • We also scrapped the official twitter handle of 6 parties. • We managed to collect 45k tweets in this process. ◦ 32k from politicians • Hindi, English and Punjabi are top-3 contributing languages.
  8. Data Analysis • BJP and INC are highly active •

    SP is 3rd highest • SP has more interaction per tweet than BJp and INC. Is this finding of any significance? • Are all REAL human followers?
  9. Data Preprocessing • Perform the usual preprocessing: ◦ Remove emojis

    ◦ Replace URLS and USER mentions ◦ Remove other special characters. • Challenges: ◦ Removal of special characters can impact the detection of hashtags… #<WORDS> ◦ Replacing USER mentions make it hard to understand who the target is in case of mud slandering. ◦ Hindi characters treated as special characters and punctuations and removed completely. ◦ How to handle code mixing?
  10. Data Preprocessing • Trade off between preprocessing and loss of

    information. ◦ Pick which aspects are critical ◦ Add separate preprocessing based on the detected language (lower casing for english) ◦ For code mixing how to generate embeddings? • Our approach: ◦ User mentions kept intact ◦ Punctuations removed after detection of #tags separately. ◦ Urls removed as we do not build systems that require searching (what if this was about fact checking and not hatefulness)?
  11. Step 1 of text analysis: Data Curation • Collection •

    Annotation Social media text analysis: STEP 2 • Data Annotation (manual) • Data Annotation (modeling)
  12. Manual Annotations of political attacks • Political attacks though offensive

    are not hate speech as political parties and politicians are not a protected class. • We manually annotate 1.7k tweets into explicit, implicit and none labels of attacks. • We also annotated for BJP, INC, SP and AAP whether the tweet is self promotion, a demotion of opposition or both.
  13. Large-scale annotations • We use the manual annotations to train

    two models and use the pseudo-labels generated from them for large scale annotation of rest of the tweets. • Here we tested 2 approaches: ◦ N-gram based logistic regression approach ◦ Multilingual large language modeling approach
  14. Large Scale Annotations Can be any Deep Learning based system

    that can generate numeric embeddings for words. Image Source: https://jalammar.github.io/illustrated-word2vec/
  15. Annotations • Upper row is manual annotation, lower is machine

    generated annotations. ◦ PA: Party handle ◦ PO: Politician • Overall # of neutral > explicit > implicit • Implicit is harder to detect • Explicit will catch the attention of reader faster. • The curated dataset has 695 (resp. 23, 838) neutral, 696 (resp. 17, 771) explicit, and 329 (resp. 4, 858) implicit instances of manually annotated (resp. Model predicted) samples of political attacks represented in pictogram A (resp. D).
  16. Patterns in Volume of Attack • Patterns from manual and

    machine annotations follow similar trend. • Increase in attacks during elections weeks when in person rallies were held. • Neutral promotional content majorly high even before and after elections. (hinting at round year activity of political parties)
  17. Patterns in Volume of Attack • Neutral to attack 3:2

    in manual annotation samples. • The ratio is 1:1 in predicted samples (over predicting attack maybe?) • Direct attacks in manual and predicted samples overshadow implicit ones by 2 : 1 and 3 : 1, respectively.
  18. Step 1 of text analysis: Data Curation • Collection •

    Annotation Social media text analysis: STEP 3 • Analysing and Findings
  19. Patterns in Volume of Attack • While based on significant

    testing explicit receive more retweets and likes than implicit in machine annotations, but the results are not significant manually annotated samples. • Can we trust one statistical test over other? • Probably not because the machine annotated samples are 20x in size.
  20. Power dynamics of promotion and demotion • Simply check the

    unique hashtags employed by each party and then manually assign them a promotion or demotion value. ◦ We observe that most parties use promotional hashtags more than demotion. ◦ This is most prominent for the incumbent BJP that operates from a position of comfort and can there use implicit demotion/challenging hashtags ◦ Opposition parties on the other hand employed more directly challenging hashtags as they are attacking the one in power.
  21. Power dynamics of promotion and demotion • Employ manual annotations

    to mark promotion and demotion among the 1.7k manually annotated samples. • INC the largest opposition party at center (in terms of resources) attacks BJP the most (most of the attacks are criticisms). • BJP focuses more on self-promotion. Among the parties it attacks the most after self-promotion, it is INC (no surprise).
  22. Power dynamics of promotion and demotion • Smaller parties like

    AAP and SP have to balance promotion and demotion. • SP was focused on elections in UP which is BJP’s strong hold hence it attacks BJP as much as it self-promotes. • AAP was focused on elections in Punjab where both INC and BJP have equal footing and we see that in the distribution of attacks by AAP.
  23. Conclusion • Political attacks help understand the power dynamics during

    a specific election season. • These dynamics change from one election to the next and from one state to the next. ◦ Recently in 2023 state elections similar promotional tactics of BJP did not help in Karnataka elections. ◦ While BJP came to power in Manipur, it has not been able to control the ongoing political tension and civic unrest. ◦ While SP has grown in popularity both online and offline (as visible from its vote share at state level), it does not indicate they will be able to retain the same in the general elections.
  24. Conclusion • Use of name calling can prove a boon

    or bane depending on the audience perceives it, and who wins the elections. • Political parties should engage in critical political attack without referring to gender, caste of politicians so as not to make the criticism hateful in nature. • We need information curated from multiple sources like Twitter, Facebook, Whatsapp, News articles to be able establish the overall sense of how politics shapes social discourse and vice-versa. Until then studies like ours remain an anecdotal commentary.