Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Detecting Irony on Greek Political Tweets: A Text Mining Approach

Dimitris Spathis
September 25, 2015

Detecting Irony on Greek Political Tweets: A Text Mining Approach

Mining Humanistic Data Workshop
Engineering Applications of Neural Networks (EANN 2015 conference)

Published by ACM http://dl.acm.org/citation.cfm?id=2797183

Dimitris Spathis

September 25, 2015
Tweet

More Decks by Dimitris Spathis

Other Decks in Research

Transcript

  1. Detecting Irony on Greek Political Tweets: A Text Mining Approach

    BASILIS CHARALAMPAKIS DIMITRIS SPATHIS ELIAS KOUSLIS KATIA KERMANIDIS DEPARTMENT of INFORMATICS EANN 2015, RHODES IONIAN UNIVERSITY, GREECE MHDW WORKSHOP
  2. Data* 44.438 GREEK POLITICAL TWEETS MAY 2012 COLLECTED ONE WEEK

    BEFORE AND AFTER ELECTIONS 2 SUB-DATASETS OF TWEETS MENTIONING PARTIES & LEADERS 126 TWEETS MANUAL TRAIN SET *dataset avaılable at dı.ıonıo.gr/hılab
  3. Cleaning Eliminating diplotypes and artifacts (~20k) Labelling Manual labelling 126

    tweets as ironic or not Feature scoring Score every tweet of the dataset with feature scores Training Machine learning using the labelled tweets as training set and the rest dataset as test Data workflow Counting Estimating the irony percentage per party Correlating Trend between the ironic tweets that received every party before elections and their election results.
  4. The Literature focuses on irony detection & computational linguistics Detecting

    irony by examining the following features: signatures, unexpectedness, style, emotional scenarios, frequent trigrams and pleasantness. Reyes et al (2013 & 2011) Hashtag analysis method creates noisy results with low accuracy. Gonzalez-Ibanez et al (2011) Same dataset. Positive - negative distinction. Alignment between actual political results and web sentiment. Kermanidis & Maragoudakis (2013) High accuracy combining Amazon reviews, tweets and semi- supervised learning. Gonzalez-Ibanez et al (2011)
  5. The Features take into consideration structural sentence formations and unexpectedness

    occurrences — Spoken spoken style applied in writings — Rarity most rare words — Meanings No of Wordnet synsets as a measure of ambiguity — Lexical punctuation, prosodic repeated letters, metaphors — Emoticons smiley faces etc
  6. Spoken verbal irony as daily chats using dashes (-) and

    asterisks (*) -γεια σου -τι γίνεται; *χειραψία* use of spoken language is often related to unexpectedness
  7. Rarity created a frequency dictionary of the dataset’s words isolated

    the most rare words limited the upper bound up to three occurrences
  8. Meanings Balkanet (Wordnet) wn.synsets('χώρος',lang='ell') [Synset('expanse.n.03'), Synset('space.n.02'), Synset ('precinct.n.01'), Synset('space.n.03'), Synset

    ('vacuum.n.03'), Synset('apartment.n.01'), Synset('space.n.01'), Synset ('area.n.02'), Synset('space-time.n.01'), Synset('proportion.n.02'), Synset('size.n.01'), Synset('acreage.n.01')] “We noticed that ironic tweets present words with more meanings (more WordNet synsets)” -Barbıerı et al
  9. “Ο Καμμένος σαν λιοκαμένος μου φαίνεται” Spoken Score 0 Rarity

    Score 1,67 Meaning Score 0 Lexical Score 1 Emoticon Score 0 EXAMPLE
  10. Hypothesis Evaluation No direct correlation with the election results. The

    fluctuation percent shows a trend on the edges, so that the 'loser' parties of ND and PASOK are getting the most ironic tweets as well as the 'winner' parties SYRIZA and XA.
  11. Limitations Average user YOUNG EDUCATED MOSTLY LIBERAL 3.7% OF GREECE

    POPULATION USES TWITTER Further work STEMMER LEMMATIZER GRAMMATICAL CONJUGATION MORE TRAINING DATA SEMI-SUPERVISED LEARNING