Learning from Biometric Fingerprints to prevent Cyber Security Threats

LOREM I P S U M LEARNING FROM BIOMETRICS To
prevent #CyberSecurity threats Valerio Maggio @leriomaggio Data Scientist & Pythonistas @ FBK [email protected]

DOLOR S I T A M E T SORRY, WHO?
• Post Doc Researcher • Background in CS • Interested in Machine & Deep Learning • Core in Biomedicine & Environment  here We’re looking for students for   Internship & (PhD) Thesis • Applied Machine Learning (a.k.a. Data Science) https://mpbalab.fbk.eu

DONEC F I N I B U S A C
• Geek & Nerd • Fellow Pythonista since 2006 this is a better me !-) SORRY, WHO? 100K points if you get this pun !-) github.com/leriomaggio

Machine Learning

BUZZW O RDS

NULLA C O N G U E S A P
I E N WHAT THE CLOUDS SAY

VITAE A U G U E C O N S
E C T E T U R WHAT THE CLOUDS KEEP SAYING…

AT CONVALLIS M I A U C T O R
. WHAT THE CLOUDS STILL SAY…

FUSCE F E U G I A T WHAT THE
CLOUDS   FINALLY SAY! Learning from Data for future predictions

ACHINE LEARNING LAVOURS

SED SUSCIPIT I N E L I T M O
L L I S SUPERVISED SETTING • Input Data are accompanied with labels the ML model can learn from • i.o.w. labels are reference for the model to estimate the expected outcomes

DIGITS CLASSIFICATION Labels are Categories

HOUSE PRICES ESTIMATION Labels are Real numbers

FRINGILLA M A E C E N A S G
R A V I D A S UNSUPERVISED SETTING • No label is provided • Learning directly from data • e.g. Clustering

CLUSTERING

FUSCE F E U G I A T WHAT THE
CLOUDS   FINALLY SAY!

EU TURPIS V O L U P T A T

Let’s play with all of this!

IPSUM E G E T A U C T O
R APPLIED ML IN 5 STEPS • Collect the Data 1. Look at the Data & Clean the Data 2. Prepare the data 3. Train your model(s) 4. Predict using your best model using unseen data   (namely: data NOT used in training) 5. Deploy your system in production

TWO COMMON FRAUDS Account Hijacking Card Faking

TWO COMMON FRAUDS Account Hijacking User Identiﬁcation

USER IDENTIFICATION

KEYSTROKE DYNAMICS Keystroke dynamics consists in analysing the way a
user types by monitoring keyboard inputs thousand of times per second, and processing this data through an algorithm, which then deﬁnes a pattern for future comparison Identifying an individual based on their way of typing on a physical or virtual keyboard

KEYSTROKE DYNAMICS Time between two key pressures Time between one
pressure and one release Time between one release and one pressure Time between two key release Intuition:   Users have unique ways to type on keyboards  (i.e. typing patterns)

KEYSTROKE DYNAMIC Time between two key pressures   Down-Down Time
Time between one pressure and one release-   Dwell Time Time between one release and one pressure  Flight Time Time between two key release  Up-Up Time

LOOKING FOR ANOMALIES

DATA COLLECTION Time between two key pressures   Down-Down Time
Time between one pressure and one release-   Dwell Time Time between one release and one pressure  Flight Time Time between two key release  Up-Up Time • Dataset Statistics: • 50 different users • 450+ patterns each

DONEC M E N U S U R N A
STEP 1: LOOK AT THE DATA AND CLEAN THEM

UP-UP TIME - USERNAME FIELD - WEB VS APP

UP-UP TIME - PASSWORD FIELD - WEB VS APP

DWELL TIME - USERNAME FIELD - WEB VS APP

DWELL TIME - PASSWORD FIELD - WEB VS APP

DATA CLEANING Complexity-Invariant Distance Measure

FEATURE SCALING (NORMALISATION) Original   Feature Data MinMax Scaling Standard
Scaling

PULVINAR V I T A E E L I T
. STEP 2:PREPARE THE DATA TRAIN-TEST CUT

WHAT WE DO

WHAT WE REALLY DO K-Fold Cross Validation

VIVAMUS F I N I B U S R I
S U S STEP 3-4:TRAIN AND TEST ML MODEL

Deep AutoEncoder Encoder Decoder … Classiﬁcation Deep Network One AutoEncoder
+ FC Network Outlier Detector (per user) DEEPKS

Deep AutoEncoder Encoder Decoder DEEPKS 1. AUTOENCODER Trained on genuine
keystroke patterns Unsupervised Machine (Deep) Learning

Deep AutoEncoder Encoder Decoder DEEPKS 2. DISCRIMINATOR Trained on genuine
& adversarial patterns

EVALUATION METRICS Confusion Matrix over ~5200 samples

SAMPLE SIZE TEST Q: How many patterns would I need
to be confident about the accuracy of the model ?

Feature Importance rf.fit(X,y_DL)

NON DIAM B L A N D I T F
E R M E N T U M . STEP 5:DEPLOY YOUR SOLUTION

Models Database Model Service Feature Database Data Collector Feature Detection
Orchestration Model Training Service Feature Extraction Alarms Dashboard Models Models Features + Labels Features Features Raw Data Alarm Prediction Request Labels 1 2 3 9 SOC Alarms Database 4 5 6 7 Score Conﬁrmation/ Rejection Features 8 10 11 12

API Engine Feature extractor DL Model {json} Raw data, features,
predictions

SHAMELESS  PLUG

pydata.it pycon.it

EUROSCIPY 2018 Fondazione Bruno Kessler | Associazione Python Italia  University
of Trento Northern Italy | Trentino Region Tentative dates:   Aug. 28 - Sept. 01 2018 Be posted on euroscipy.org

trento.python.it Next Meetup: Feb, 22 2018 - h19:00 ➡ @Clab

SHAMELESS SELF PROMOTION https://github.com/leriomaggio/deep-learning-keras-tensorflow

THANK YOU! Now it’s time for Cheers @leriomaggio [email protected]

Learning from Biometric Fingerprints to prevent...

Learning from Biometric Fingerprints to prevent Cyber Security Threats

More Decks by Valerio Maggio

Other Decks in Research

Featured

Transcript