Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anomaly Detection using Autoencoders by Naledi Modise & Angela Lai King

7b0645f018c0bddc8ce3900ccc3ba70c?s=47 Pycon ZA
October 11, 2019

Anomaly Detection using Autoencoders by Naledi Modise & Angela Lai King

Finding anomalous behaviour can be similar to finding a needle in a haystack. This information can be very useful for fraud detection or identifying unusual behavior. Machine Learning techniques such as autoencoders can assist in this process.

We will present a jupyter notebook followed by a visualisation which indicates anomalous activity using an open source credit card dataset. The anomalous activity will be compared to known fraudulent activity within the dataset. The technologies used for visualisation is Qliksense and the python implementation of autoencoders is the h2o deeplearning estimator package.

7b0645f018c0bddc8ce3900ccc3ba70c?s=128

Pycon ZA

October 11, 2019
Tweet

Transcript

  1. C2 General ANOMALY DETECTION USING AUTOENCODERS AUTOENCODER AI DEEP LEARNING

    ANOMALY Presented By: Naledi Modise, Angela Lai King PYCON 2019 ML
  2. C2 General WHO ARE WE? Data Scientist @ Telco Naledz@gmail.com

    ANGELA LAI KING NALEDI MODISE Data Scientist @ Telco Angela12682@gmail.com
  3. C2 General TABLE OF CONTENTS The available Python packages 04

    Introduction to Fraud example, and walkthrough of Jupyter Notebook using H2O 05 Showcase what anomalous behaviour looks like 06 Introduction to what autoencoders are, and a brief history 01 How autoencoders are useful, and why we use them. 02 RBM and Overcomplete & Undercomplete Models 03 Introduction to Autoencoders Uses of Autoencoders Types of Autoencoders Python Libraries Jupyter Notebook QlikSense Visualisation
  4. C2 General Neural Networks Architecture Family of Neural Networks Feed

    Forward ANN Convolutional Neural Network Generative Adversarial Networks Recurrent Neural Networks Autoencoders 01
  5. C2 General INTRODUCTION TO AUTOENCODERS 01

  6. C2 General AUTOENCODERS Represented as a whole by OBJECTIVE: Minimise

    the loss function ) Trains using back propagation Input Layer: Hidden Layer: 1 2 3 4 h 1 h 2 h 3 1 2 3 4 Encoder Reconstruction Layer: Decoder Autoencoders have been around for decades (LeCun, 1987; Hinton and Zemel,1994)
  7. C2 General Uses of Autoencoders DIMENSIONALITY REDUCTION 02

  8. C2 General Uses of Autoencoders DIMENSIONALITY REDUCTION DENOISING DATA 02

  9. C2 General Uses of Autoencoders DIMENTIONALITY REDUCTION ANOMOLY DETECTION DENOISING

    DATA 02
  10. C2 General Uses of Autoencoders DIMENTIONALITY REDUCTION ANOMOLY DETECTION DENOISING

    DATA FEATURE EXTRACTION 02
  11. C2 General Restricted Boltzmann Machine 03 Simplest Autoencoder. (Hinton 2006)

    Science Paper
  12. C2 General UNDER COMPLETE vs OVER COMPLETE 03

  13. C2 General OVER COMPLETE TECHNIQUES Involves adding a sparsity regularisation

    function This regularisation function is applied on the activation functions Similar to denoising, however, the reconstruction function can resist large noise added to the input due to the derivative of each activation function in the hidden layer being calculated. Noise is added to input variables. , where is the same as however with noise. The reconstruction function can resist small noise to the input SPARSE CONTRACTIVE DENOISING 03
  14. C2 General PYTHON LIBRARIES 04

  15. C2 General FRAUD noun Wrongful or criminal deception intended to

    result in financial or personal gain. 05
  16. C2 General Examples of Fraudulent Acts SIM SWAP DELIVERY ADDRESS

    STOCK TAKING ONLINE PURCHAsES CREDIT CARD ATM MOBILE APP Banking Sector Telecoms Retail 05
  17. C2 General Japan ATM Scam SA Standard Bank estimated total

    loss of $19.25m R295m 05 m1 m2 m3 0 20 40 60 80 100 120 Rule Based vs Emerging Fraud Month Frequency
  18. C2 General VENN DIAGRAM 05 ANOMALOUS FRAUDULENT

  19. C2 General Anomaly 05 Card_No Password_change_occurance 4548 **** **** ****

    **** 2 4549 **** **** **** **** 0 4550 **** **** **** **** 0 4551 **** **** **** **** 50 4552 **** **** **** **** 3 4553 **** **** **** **** 1
  20. C2 General KAGGLE DATASET 05

  21. C2 General JUPYTER NOTEBOOK

  22. C2 General JUPYTER NOTEBOOK

  23. C2 General JUPYTER NOTEBOOK

  24. C2 General JUPYTER NOTEBOOK Anonymised

  25. C2 General JUPYTER NOTEBOOK

  26. C2 General JUPYTER NOTEBOOK

  27. C2 General JUPYTER NOTEBOOK https://www.kaggle.com/mlg-ulb/creditcardfraud

  28. C2 General Recap 06

  29. C2 General QlikSense Showcase anomalous behaviour through QlikSense. 06

  30. C2 General Key Takeaways from Experience Iteratively determine the best

    threshold. Set it according to what business can handle Too many features make it difficult to understand cause of anomalous behaviour MAINTAINABILITY DATA REPRESENTATION FEATURE INTERPRATIBILITY FEEDBACK LOOP Feedback from stakeholders on anomalous results is limited to capacity Our H2o framework met the standards of our automated Productionse workflow. THRESHOLD KMEANS VS AUTOENCODER Autoencoders showed clearer seperations between anomalies. If there is no underlying patterns then your output winds up being obscured rather than clarifying it
  31. C2 General —@computerfacts “Concerned parent: If all your friends jumped

    off a bridge, would you follow them? Machine Learning algorithm: Yes.”
  32. C2 General THANKS FOR LISTENING!  Does anyone have any

    question?
  33. C2 General RESOURCES PAPERS: • (LeCun, 1987; Bourlard and Kamp,

    1988; Hinton and Zemel,1994) • (Hinton 2006) Science Paper WEBSITES: - https://towardsdatascience.com/deep-inside-autoencoders-7e41f319999f - https://www.bbc.com/news/world-asia-36357182
  34. C2 General CREDITS • Presentation template by Slidesgo • Icons

    by Flaticon