How to find spam
on Twitter?
Mourjo Sen
Under the guidance of
Arnaud Legout, Maksym Gabielkov
Slide 2
Slide 2 text
Outline
◎ Background, problem statement, workflow
◎ Definition of our metric of trust
◎ Spam detection methodology
◎ Testing our method
◎ Conclusion: Next Steps
2
Slide 3
Slide 3 text
3
#JeSuisCharlie
Mentioned 6,500 times per minute
3.4 million times in a day
Slide 4
Slide 4 text
The dark side of social media
◎ A hacker starts an online rumour about a
plane crash on Twitter
◎ The “news” goes viral and the airline’s
stock plummets
◎ The hacker makes a fortune on stock short
sales
4
Slide 5
Slide 5 text
5
Slide 6
Slide 6 text
Real-world influence of Twitter
◎ Political campaigns
◎ Marketing campaigns + promotions
◎ Stock markets
◎ Journalism: TV, Books, Newspapers…
◎ Customer satisfaction
◎ Awareness programs
6
A strong incentive to manipulate tweets
Slide 7
Slide 7 text
The problem
◎ No one knows if tweets can be trusted
◎ Not even Twitter themselves
○ Researchers from Twitter
○ Discussion with Vigiglobe
◎ Goal: Robust, on-the-fly spam detection
7
Slide 8
Slide 8 text
The workflow
Master 1
PFE ✓
Master 2
PFE ✓
Master 2
Internship
Defining a
metric of
trust
Analyzing
the trust
metric
Classifying
tweets by
using the
metric
8
Slide 9
Slide 9 text
Outline
◎ Background, problem statement, workflow
◎ Definition of our metric of trust
◎ Spam detection methodology
◎ Testing our method
◎ Conclusion: Next Steps
9
Slide 10
Slide 10 text
Do we need a trust metric?
◎ Twitter has manually verified ~ 113 K users
◎ But 99.99 % users are not verified
10
Slide 11
Slide 11 text
The trust score
11
Slide 12
Slide 12 text
Outline
◎ Background, problem statement, workflow
◎ Definition of our metric of trust
◎ Spam detection methodology
◎ Testing our method
◎ Conclusion: Next Steps
12
Slide 13
Slide 13 text
Retweet chain
13
Creator of the tweet
Retweeters
Slide 14
Slide 14 text
The power of retweets
◎ Non-repudiation: Public statement of one’s
approval of the content
◎ Not duplication: Gives credit to the original
content publisher
◎ Long retweet chain = High visibility =
Affects a lot of people
◎ Public opinion: Retweets influence trends
14
Slide 15
Slide 15 text
Spam detection method: Quality of retweets
15
Trusted users in the retweet chain indicates
authenticity of the tweet
✗
✓
Slide 16
Slide 16 text
How is it robust and on the fly?
◎ Easy to send many tweets
◎ Difficult to change the follow-relationship
◎ If we have the tweet, we can obtain the list
of retweets, i.e. the retweet chain
16
Slide 17
Slide 17 text
Outline
◎ Background, problem statement, workflow
◎ Definition of our metric of trust
◎ Spam detection methodology
◎ Testing our method
◎ Conclusion: Next Steps
17
Slide 18
Slide 18 text
Testing our method of spam detection
18
◎ No test set
◎ Manual verification too slow
◎ Need other methods
1. Suspicious keywords
2. Periodic tweets
3. Content copying
Method 2: Finding periodic tweets
23
◎ Twitter bots often tweet periodically
◎ Difficult to detect periods in a large
collection of tweets
Slide 24
Slide 24 text
Method 3: Replication of tweet content
24
◎ Some tweets have the same content
◎ Same content → Spam property
◎ Retweets → Non-spam property
Slide 25
Slide 25 text
Method 3: Replication of tweet content
25
Set 2: Tweets
which have not
been retweeted
Set 1: Tweets
with more
retweets than
copies
All tweets in the dataset
Non-spam
set
Spam set
Slide 26
Slide 26 text
Method 3: Replication of tweet content
26
Number of trusted users in retweet chain (log scale)
CDF
Slide 27
Slide 27 text
Method 3: Replication of tweet content
27
Number of copies
Number of users in retweet chain
Slide 28
Slide 28 text
Outline
◎ Background, problem statement, workflow
◎ Definition of our metric of trust
◎ Spam detection methodology
◎ Testing our method
◎ Conclusion: Next Steps
28
Slide 29
Slide 29 text
Next steps: Plan for the next two months
◎ Testing our method in other datasets
◎ Correlation with other methods
◎ Spam detection as a service/API
29
Slide 30
Slide 30 text
Conclusion
30
◎ On-the-fly spam detection
◎ Help prevent manipulation of public
opinion on Twitter
Making social networks safer and more
authentic
Slide 31
Slide 31 text
How to find spam
on Twitter?
Mourjo Sen
Under the guidance of
Arnaud Legout, Maksym Gabielkov
Thank you!