Markham March 17, 2014 Based on the Research Paper: “Early Security Classification of Skype Users via Machine Learning” http://research.microsoft.com/pubs/205472/aisec10- leontjeva.pdf Paper and figures are copyright 2013 ACM
fraud detection is very expensive Who wrote this paper? • Team from Microsoft Research What was their goal? • “Detect stealthy fraudulent users” that fool Skype’s existing defenses for a long period of time
Connected days – Audio call days – Video call days – Chat days • Data is not “rich”: – Only indicates the number of days per month that the user performed that activity – Does not distinguish which pair of users communicated, number of calls per day, etc.
series data – Doesn’t make sense to use every data point as a feature – Makes more sense to “compress” the data into a single number • How? – For a given feature (e.g., audio calls), build a model of what “normal” user activity looks like and another model of what fraudulent activity looks like – For each user, score them based upon which model they are closer to – This is called computing “log-likelihood ratios”
– Give users a high score if they have many connections and if they have connections from other high-scoring users • Local clustering coefficient: – Measure of how well your connections are connected to one another