Detecting Fraudulent Skype Users via Machine Learning

Slide 1

Slide 1 text

Detecting Fraudulent Skype Users via Machine Learning Presentation by Kevin Markham March 17, 2014 Based on the Research Paper: “Early Security Classification of Skype Users via Machine Learning” http://research.microsoft.com/pubs/205472/aisec10- leontjeva.pdf Paper and figures are copyright 2013 ACM

Slide 2

Slide 2 text

What is Skype? • Tool for: – Voice-over-IP calls – Webcam videos – Instant messaging • Released in 2003, Microsoft bought in 2011 • At least 250 million monthly users

Slide 3

Slide 3 text

Fraud on Skype • Credit card fraud • Online payment fraud • Spam instant messages • etc.

Slide 4

Slide 4 text

Detecting Fraud on Skype Skype already employs techniques for detecting fraud: • “Majority of fraudulent users are detected within one day” Some challenges in fraud detection: • Legitimate accounts get hijacked and don’t necessarily “look” fraudulent • Sparse data

Slide 5

Slide 5 text

Improving Fraud Detection Why is it worth improving? • Manual fraud detection is very expensive Who wrote this paper? • Team from Microsoft Research What was their goal? • “Detect stealthy fraudulent users” that fool Skype’s existing defenses for a long period of time

Slide 6

Slide 6 text

Classification • Classification problem: Predicting whether a user is fraudulent (yes or no) • Data consists of features (or “variables” or “predictors”) and a response • Contrasts with regression problem: Predicting a continuous response like stock price

Slide 7

Slide 7 text

Data Used in the Study • Anonymized snapshot provided by Skype • “Does not contain information about individual calls and their contents”

Slide 8

Slide 8 text

Classification Workflow

Slide 9

Slide 9 text

Feature Type 1: Profile Information • Gender • Age • Country • OS platform • etc.

Slide 10

Slide 10 text

Feature Type 2: Skype Product Usage • Activity logs: – Connected days – Audio call days – Video call days – Chat days • Data is not “rich”: – Only indicates the number of days per month that the user performed that activity – Does not distinguish which pair of users communicated, number of calls per day, etc.

Slide 11

Slide 11 text

Feature Type 3: Local Social Activity • Activity logs (graph data): – Adding a user – Being added by a user – Deleting a user – Being deleted by a user • Number of connections in their list • Acceptance rate of outbound friend requests

Slide 12

Slide 12 text

Type 4: Global Social Activity • “PageRank” and “local clustering coefficient” computed for each user

Slide 13

Slide 13 text

Classification Workflow • Pre-processing is unnecessary for profile info, but necessary for other feature types

Slide 14

Slide 14 text

Pre-processing Activity Logs • Why? – Activity logs are time series data – Doesn’t make sense to use every data point as a feature – Makes more sense to “compress” the data into a single number • How? – For a given feature (e.g., audio calls), build a model of what “normal” user activity looks like and another model of what fraudulent activity looks like – For each user, score them based upon which model they are closer to – This is called computing “log-likelihood ratios”

Slide 15

Slide 15 text

Computing Global Social Scores • PageRank: – Invented by Google – Give users a high score if they have many connections and if they have connections from other high-scoring users • Local clustering coefficient: – Measure of how well your connections are connected to one another

Slide 16

Slide 16 text

Classification Workflow

Slide 17

Slide 17 text

Choosing a Classifier • Trained several classifiers: Random Forests, support vector machines, logistic regression • Estimated prediction accuracy using cross- validation • Chose Random Forests because it had the best initial performance

Slide 18

Slide 18 text

Rating Model Accuracy • ROC curve: Plots “true positive” rate vs “false positive” rate • Ideal classifier hugs top left corner

Slide 19

Slide 19 text

Rating Model Accuracy (cont’d) • Best result is obtained by using all four feature types • At a false positive rate of 5%, true positive rate was 68% • Acceptable false positive rate is a business decision

Slide 20

Slide 20 text

Projected Model Effects

Slide 21

Slide 21 text

Performance on Different Fraud Types • Fraud types are defined by Skype but are not public • Type II is most common, and the classifier works best on that type

Slide 22

Slide 22 text

Possible Model Improvements • Optimize separate models for each fraud type • Attempt to detect points in time when accounts are hijacked • Prevent fraudsters from evading the model

Slide 23

Slide 23 text

Other Possible Applications • Predicting credit card fraud • Predicting failure in data center disks • Any environment in which user behavior can be monitored and fraudulent behavior “looks different” from normal behavior

Slide 24

Slide 24 text

Thank You! Research Paper: http://research.microsoft.com/pubs/205472/aisec10- leontjeva.pdf My blog post: http://www.dataschool.io/detecting-fraudulent-skype- users-via-machine-learning/