Slide 1

Slide 1 text

Analysing Social Media Text: A case study of political attacks in India Sarah Masud, Phd@LCS2, IIITD Visiting Researcher @TUM

Slide 2

Slide 2 text

Background into Indian Politics ● General elections and Assembly elections are held every 5 year at national and state level respectively. Both elect representatives in the national and state level assembly house. ● At National level there are 2 major parties: Bharatiya Janata Party (BJP) and Indian National Congress (INC), they contest elections in all states either independently or via alliance. ● There are other regional as well as national parties that contest elections in a fewer states like AAP (Delhi, Goa, Punjab), SP (UP mainly) and so on.

Slide 3

Slide 3 text

Background into Indian Politics ● Since 2014 BJP has been in power at national/central level. ● In this study we look at Indian Assembly elections of February 2022 where 5 states contested elections. ○ Uttar Pradesh (UP) ○ Punjab ○ Manipur ○ Goa ○ Uttrakhand ● Interesting UP contributes a large number of seats to the central (80) compared to second highest (48).

Slide 4

Slide 4 text

Image source: https://blog.google/intl/en-in/pollcheck-2022-digital-training-series-journalists-covering-upcoming-state-elections/

Slide 5

Slide 5 text

Use of Twitter to study Indian politics? ● India has almost 24 M active users on Twitter (out of 1.4B population). ● Most political parties and groups have official Twitter accounts, so do most politicians ● Political attacks via social media posts, videos, memes is very common in India. (so are spread of fake news and lynching rumours…) ● Support for Indic languages on social media has further accelerated the use of social media for political parties to communicate with general public all year round.

Slide 6

Slide 6 text

Step 1 of text analysis: Data Curation ● Collection ● Annotation Social media text analysis: STEP 1 ● Data Collection ● Data Analysis ● Data Preprocessing

Slide 7

Slide 7 text

Data Curation ● Twitter API and timeline crawler ● The elections were held in February, we ran our data collection on a weekly basis from January to March 2022 (i.e before, during and after election). ● We shortlisted 100 politicians active on Twitter associated with the states contesting elections. They cover 17 parties and political groups in total. ● Employing general knowledge, Twitter bios and Wikipedia we mapped the politicians to their political groups.

Slide 8

Slide 8 text

Data Curation ● We scrape tweets with the hashtag <#STATENAME>Assembly Election 2022 ● We also scrapped the official twitter handle of 6 parties. ● We managed to collect 45k tweets in this process. ○ 32k from politicians ● Hindi, English and Punjabi are top-3 contributing languages.

Slide 9

Slide 9 text

Data Analysis T: tweets U: Unique politicians R: Retweets L: Likes

Slide 10

Slide 10 text

Data Analysis ● BJP and INC are highly active ● SP is 3rd highest ● SP has more interaction per tweet than BJp and INC. Is this finding of any significance? ● Are all REAL human followers?

Slide 11

Slide 11 text

Data Preprocessing ● Perform the usual preprocessing: ○ Remove emojis ○ Replace URLS and USER mentions ○ Remove other special characters. ● Challenges: ○ Removal of special characters can impact the detection of hashtags… # ○ Replacing USER mentions make it hard to understand who the target is in case of mud slandering. ○ Hindi characters treated as special characters and punctuations and removed completely. ○ How to handle code mixing?

Slide 12

Slide 12 text

Data Preprocessing ● Trade off between preprocessing and loss of information. ○ Pick which aspects are critical ○ Add separate preprocessing based on the detected language (lower casing for english) ○ For code mixing how to generate embeddings? ● Our approach: ○ User mentions kept intact ○ Punctuations removed after detection of #tags separately. ○ Urls removed as we do not build systems that require searching (what if this was about fact checking and not hatefulness)?

Slide 13

Slide 13 text

Step 1 of text analysis: Data Curation ● Collection ● Annotation Social media text analysis: STEP 2 ● Data Annotation (manual) ● Data Annotation (modeling)

Slide 14

Slide 14 text

Manual Annotations of political attacks ● Political attacks though offensive are not hate speech as political parties and politicians are not a protected class. ● We manually annotate 1.7k tweets into explicit, implicit and none labels of attacks. ● We also annotated for BJP, INC, SP and AAP whether the tweet is self promotion, a demotion of opposition or both.

Slide 15

Slide 15 text

Annotations

Slide 16

Slide 16 text

Large-scale annotations ● We use the manual annotations to train two models and use the pseudo-labels generated from them for large scale annotation of rest of the tweets. ● Here we tested 2 approaches: ○ N-gram based logistic regression approach ○ Multilingual large language modeling approach

Slide 17

Slide 17 text

Large Scale Annotations Image Source: https://devopedia.org/n-gram-model

Slide 18

Slide 18 text

Large Scale Annotations Can be any Deep Learning based system that can generate numeric embeddings for words. Image Source: https://jalammar.github.io/illustrated-word2vec/

Slide 19

Slide 19 text

Annotations ● Upper row is manual annotation, lower is machine generated annotations. ○ PA: Party handle ○ PO: Politician ● Overall # of neutral > explicit > implicit ● Implicit is harder to detect ● Explicit will catch the attention of reader faster. ● The curated dataset has 695 (resp. 23, 838) neutral, 696 (resp. 17, 771) explicit, and 329 (resp. 4, 858) implicit instances of manually annotated (resp. Model predicted) samples of political attacks represented in pictogram A (resp. D).

Slide 20

Slide 20 text

Patterns in Volume of Attack ● Patterns from manual and machine annotations follow similar trend. ● Increase in attacks during elections weeks when in person rallies were held. ● Neutral promotional content majorly high even before and after elections. (hinting at round year activity of political parties)

Slide 21

Slide 21 text

Patterns in Volume of Attack ● Neutral to attack 3:2 in manual annotation samples. ● The ratio is 1:1 in predicted samples (over predicting attack maybe?) ● Direct attacks in manual and predicted samples overshadow implicit ones by 2 : 1 and 3 : 1, respectively.

Slide 22

Slide 22 text

Step 1 of text analysis: Data Curation ● Collection ● Annotation Social media text analysis: STEP 3 ● Analysing and Findings

Slide 23

Slide 23 text

Patterns in Volume of Attack ● While based on significant testing explicit receive more retweets and likes than implicit in machine annotations, but the results are not significant manually annotated samples. ● Can we trust one statistical test over other? ● Probably not because the machine annotated samples are 20x in size.

Slide 24

Slide 24 text

Power dynamics of promotion and demotion ● Simply check the unique hashtags employed by each party and then manually assign them a promotion or demotion value. ○ We observe that most parties use promotional hashtags more than demotion. ○ This is most prominent for the incumbent BJP that operates from a position of comfort and can there use implicit demotion/challenging hashtags ○ Opposition parties on the other hand employed more directly challenging hashtags as they are attacking the one in power.

Slide 25

Slide 25 text

Power dynamics of promotion and demotion ● Employ manual annotations to mark promotion and demotion among the 1.7k manually annotated samples. ● INC the largest opposition party at center (in terms of resources) attacks BJP the most (most of the attacks are criticisms). ● BJP focuses more on self-promotion. Among the parties it attacks the most after self-promotion, it is INC (no surprise).

Slide 26

Slide 26 text

Power dynamics of promotion and demotion ● Smaller parties like AAP and SP have to balance promotion and demotion. ● SP was focused on elections in UP which is BJP’s strong hold hence it attacks BJP as much as it self-promotes. ● AAP was focused on elections in Punjab where both INC and BJP have equal footing and we see that in the distribution of attacks by AAP.

Slide 27

Slide 27 text

Conclusion ● Political attacks help understand the power dynamics during a specific election season. ● These dynamics change from one election to the next and from one state to the next. ○ Recently in 2023 state elections similar promotional tactics of BJP did not help in Karnataka elections. ○ While BJP came to power in Manipur, it has not been able to control the ongoing political tension and civic unrest. ○ While SP has grown in popularity both online and offline (as visible from its vote share at state level), it does not indicate they will be able to retain the same in the general elections.

Slide 28

Slide 28 text

Conclusion ● Use of name calling can prove a boon or bane depending on the audience perceives it, and who wins the elections. ● Political parties should engage in critical political attack without referring to gender, caste of politicians so as not to make the criticism hateful in nature. ● We need information curated from multiple sources like Twitter, Facebook, Whatsapp, News articles to be able establish the overall sense of how politics shapes social discourse and vice-versa. Until then studies like ours remain an anecdotal commentary.

Slide 29

Slide 29 text

Paper Link [email protected]

Slide 30

Slide 30 text

Thank You Q&A