Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anomaly Detection using Autoencoders by Naledi Modise & Angela Lai King

Pycon ZA
October 11, 2019

Anomaly Detection using Autoencoders by Naledi Modise & Angela Lai King

Finding anomalous behaviour can be similar to finding a needle in a haystack. This information can be very useful for fraud detection or identifying unusual behavior. Machine Learning techniques such as autoencoders can assist in this process.

We will present a jupyter notebook followed by a visualisation which indicates anomalous activity using an open source credit card dataset. The anomalous activity will be compared to known fraudulent activity within the dataset. The technologies used for visualisation is Qliksense and the python implementation of autoencoders is the h2o deeplearning estimator package.

Pycon ZA

October 11, 2019
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. C2 General ANOMALY DETECTION USING AUTOENCODERS AUTOENCODER AI DEEP LEARNING

    ANOMALY Presented By: Naledi Modise, Angela Lai King PYCON 2019 ML
  2. C2 General WHO ARE WE? Data Scientist @ Telco [email protected]

    ANGELA LAI KING NALEDI MODISE Data Scientist @ Telco [email protected]
  3. C2 General TABLE OF CONTENTS The available Python packages 04

    Introduction to Fraud example, and walkthrough of Jupyter Notebook using H2O 05 Showcase what anomalous behaviour looks like 06 Introduction to what autoencoders are, and a brief history 01 How autoencoders are useful, and why we use them. 02 RBM and Overcomplete & Undercomplete Models 03 Introduction to Autoencoders Uses of Autoencoders Types of Autoencoders Python Libraries Jupyter Notebook QlikSense Visualisation
  4. C2 General Neural Networks Architecture Family of Neural Networks Feed

    Forward ANN Convolutional Neural Network Generative Adversarial Networks Recurrent Neural Networks Autoencoders 01
  5. C2 General AUTOENCODERS Represented as a whole by OBJECTIVE: Minimise

    the loss function ) Trains using back propagation Input Layer: Hidden Layer: 1 2 3 4 h 1 h 2 h 3 1 2 3 4 Encoder Reconstruction Layer: Decoder Autoencoders have been around for decades (LeCun, 1987; Hinton and Zemel,1994)
  6. C2 General OVER COMPLETE TECHNIQUES Involves adding a sparsity regularisation

    function This regularisation function is applied on the activation functions Similar to denoising, however, the reconstruction function can resist large noise added to the input due to the derivative of each activation function in the hidden layer being calculated. Noise is added to input variables. , where is the same as however with noise. The reconstruction function can resist small noise to the input SPARSE CONTRACTIVE DENOISING 03
  7. C2 General FRAUD noun Wrongful or criminal deception intended to

    result in financial or personal gain. 05
  8. C2 General Examples of Fraudulent Acts SIM SWAP DELIVERY ADDRESS

    STOCK TAKING ONLINE PURCHAsES CREDIT CARD ATM MOBILE APP Banking Sector Telecoms Retail 05
  9. C2 General Japan ATM Scam SA Standard Bank estimated total

    loss of $19.25m R295m 05 m1 m2 m3 0 20 40 60 80 100 120 Rule Based vs Emerging Fraud Month Frequency
  10. C2 General Anomaly 05 Card_No Password_change_occurance 4548 **** **** ****

    **** 2 4549 **** **** **** **** 0 4550 **** **** **** **** 0 4551 **** **** **** **** 50 4552 **** **** **** **** 3 4553 **** **** **** **** 1
  11. C2 General Key Takeaways from Experience Iteratively determine the best

    threshold. Set it according to what business can handle Too many features make it difficult to understand cause of anomalous behaviour MAINTAINABILITY DATA REPRESENTATION FEATURE INTERPRATIBILITY FEEDBACK LOOP Feedback from stakeholders on anomalous results is limited to capacity Our H2o framework met the standards of our automated Productionse workflow. THRESHOLD KMEANS VS AUTOENCODER Autoencoders showed clearer seperations between anomalies. If there is no underlying patterns then your output winds up being obscured rather than clarifying it
  12. C2 General —@computerfacts “Concerned parent: If all your friends jumped

    off a bridge, would you follow them? Machine Learning algorithm: Yes.”
  13. C2 General RESOURCES PAPERS: • (LeCun, 1987; Bourlard and Kamp,

    1988; Hinton and Zemel,1994) • (Hinton 2006) Science Paper WEBSITES: - https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f - https://www.bbc.com/news/world-asia-36357182