BIForum 2017 - Keystroke Analysis for Fraud Detection

Keystroke Behavioural Analysis For Fraud Detection Valerio Maggio @leriomaggio Data
Scientist and Researcher Fondazione Bruno Kessler (FBK)  Trento, Italy

Two Common forms of Frauds Account Hijacking Card Faking

Account Hijacking Account Hijacking User Identiﬁcation

User Identiﬁcation

Keystroke Dynamics Keystroke dynamics consists in analysing the way a
user types by monitoring keyboard inputs thousand of times per second, and processing this data through an algorithm, which then deﬁnes a pattern for future comparison Identifying an individual based on their way of typing on a physical or virtual keyboard

Keystroke Dynamic Analysis Time between two key pressures   Down-Down
Time Time between one pressure and one release-   Dwell Time Time between one release and one pressure  Flight Time Time between two key release  Up-Up Time

Keystroke Patterns Leaning : State of the Art

Data Pipeline: (1) Data Collection Time between two key pressures
  Down-Down Time Time between one pressure and one release-   Dwell Time Time between one release and one pressure  Flight Time Time between two key release  Up-Up Time

Data Pipeline: (2) Feature Extraction Time between two key pressures
Time between one pressure and one release Time between one release and one pressure Time between two key release TimeShifting key-presses - if deletions happen Only Data leading to a Successful Login

Feature Analysis & Data Preparation

Feature Analysis & Data Preparation 1. Analyse Feature Distribution 2.
Rank users accordingly

Up-Up Time - Username Field - web vs app

Up-Up Time - Password Field - web vs app

Dwell Time - Username Field - web vs app

Dwell Time - Password Field - web vs app

Data Cleaning Complexity-Invariant Distance Measure

Feature Scaling Original   Feature Data MinMax Scaling Standard Scaling

Data Preparation All feature Combinations HDF5 format

Data Analysis Protocol (DAP) Reduce the   Selection Bias!! 80%
20% Use separately for   HyperParams Search Don’t Mix

Keystroke Patterns (Classical) Machine Learning … …

Deep Keystroke Learning Deep AutoEncoder Encoder Decoder … Classiﬁcation Deep
Network One AutoEncoder + FC Network Outlier Detector (per user)

Deep Keystroke Learning  User Identification Deep AutoEncoder Encoder Decoder …
Classification Deep Network Confusion Classification Matrix Avg. Accuracy Score: 0.999090 One AutoEncoder + FC Network Outlier Detector (per user) Avg. FPR: 0.002246

Outlier Detection

Feature Importance rf.fit(X,y_DL)

Conclusions and Take Aways • Data Processing and Cleaning is
never painless • 80% of the time for Data Science Processing • 20% is for Machine/Deep Learning Code • 90% of which is looking for Optimum HyperParameters   (exp. for Deep Learning) • Use Unsupervised Approaches to get useful insights on the data • Feature Scaling is paramount • Beware of the Selection Bias (Multiple Time K-Fold CV) • DL is not silver bullet

Thanks a lot for your kind attention +ValerioMaggio [email protected] it.linkedin.com/in/valeriomaggio
@leriomaggio

BIForum 2017 - Keystroke Analysis for Fraud Det...

BIForum 2017 - Keystroke Analysis for Fraud Detection

Valerio Maggio

More Decks by Valerio Maggio

Other Decks in Research

Featured

Transcript

Keystroke Behavioural Analysis For Fraud Detection Valerio Maggio @leriomaggio Data

Two Common forms of Frauds Account Hijacking Card Faking

Account Hijacking Account Hijacking User Identiﬁcation

User Identiﬁcation

Keystroke Dynamics Keystroke dynamics consists in analysing the way a

Keystroke Dynamic Analysis Time between two key pressures   Down-Down

Keystroke Patterns Leaning : State of the Art

Data Pipeline: (1) Data Collection Time between two key pressures

Data Pipeline: (2) Feature Extraction Time between two key pressures

Feature Analysis & Data Preparation

Feature Analysis & Data Preparation

Feature Analysis & Data Preparation 1. Analyse Feature Distribution 2.

Up-Up Time - Username Field - web vs app

Up-Up Time - Password Field - web vs app

Dwell Time - Username Field - web vs app

Dwell Time - Password Field - web vs app

Data Cleaning Complexity-Invariant Distance Measure

Data Cleaning Complexity-Invariant Distance Measure

Feature Scaling Original   Feature Data MinMax Scaling Standard Scaling

Data Preparation All feature Combinations HDF5 format

Data Analysis Protocol (DAP) Reduce the   Selection Bias!! 80%

Keystroke Patterns (Classical) Machine Learning … …

Deep Keystroke Learning Deep AutoEncoder Encoder Decoder … Classiﬁcation Deep

Deep Keystroke Learning  User Identiﬁcation Deep AutoEncoder Encoder Decoder …

Outlier Detection

Feature Importance rf.fit(X,y_DL)

Conclusions and Take Aways • Data Processing and Cleaning is

Thanks a lot for your kind attention +ValerioMaggio [email protected] it.linkedin.com/in/valeriomaggio