Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Series Anomaly Detection [Learning Lab 18]

Matt Dancho
September 11, 2019

Time Series Anomaly Detection [Learning Lab 18]

Detecting anomalies in time series can signify critical events and can improve forecasting results substantially. In this lab, you'll learn how to implement scalable time series anomaly detection with the anomalize R package.

Matt Dancho

September 11, 2019
Tweet

More Decks by Matt Dancho

Other Decks in Business

Transcript

  1. For Time Series Matt Dancho & David Curry Business Science

    Learning Lab Difficulty: Intermediate Anomaly Detection
  2. Success Story Masatake Hirono - Took 201 - Completed the

    10-Week Course - Landed a Job at one of the most Prestigious Management Consulting Firms “This course showed me how to place data analytics in real business settings.” #Business Science Success
  3. Agenda • Business Case Study ◦ Google Analytics ◦ Web

    Traffic ◦ Anomalies • Anomalies ◦ 3 Types ◦ Algorithms • Time Series Anomaly Detection ◦ Anomalize Software ◦ 80/20 Concepts ◦ 3 Key Functions ◦ Game Plan • 30-Min Demo ◦ Web traffic ◦ data.table ◦ anomalize • Pro-Tips: ◦ Tactics to Improve Forecasts
  4. Learning Labs PRO Every 2-Weeks 1-Hour Course Recordings + Code

    + Slack $19/month university.business-science.io Lab 17 Anomaly Detection with H2O Machine Learning Lab 16 R’s Optimization Toolchain, Part 2 - Nonlinear Programming Lab 15 R’s Optimization Toolchain, Part 1 - Linear Programming Lab 14 Customer Churn Survival Analysis Lab 13 Wrangling 4.6M Rows of Financial Data w/ data.table Lab 12 How I built anomalize Continuous Learning Jet Fuel for your Brain
  5. Anomalous Web Traffic Revenue & Web Traffic Consumers spend billions

    online every year Web-based companies can use web-traffic to forecast cashflow Anomalies: • Can flag important events • Can impact the forecast accuracy
  6. Anomalous Web Traffic Key Issues 1. 1000’s of web pages

    2. Scaling the data preparation (cleaning of anomalies) for forecasting 3. Linking events to anomalous data
  7. Point Anomalies Single Point Types of Anomalies Contextual Time Series

    Collective Cluster of Points 1 2 3 H2O Isolation Forest H2O K-Means H2O Isolation Forest H2O K-Means ???
  8. Point Anomalies Single Point Types of Anomalies Contextual Time Series

    Collective Cluster of Points 1 2 3 H2O Isolation Forest H2O K-Means H2O Isolation Forest H2O K-Means ???
  9. Time Series Anomaly Detection Anomalize enables a simple workflow for

    scalable anomaly detection for time series https://business-science.github.io/anomalize/
  10. 1. STL Method Algorithm Internal Process • Uses STL Decomposition

    to decompose time series into seasonal, trend & remainder • The key is the remainders (residuals) • Uses IQR or GESD to detect anomalies Key Concept Outliers have have abnormal residuals (remainders) Observed Seasonal Component Trend Component Remaining Component (remainder) - -
  11. 2. Twitter Method Algorithm Internal Process • Uses Piecewise Medians

    Decomposition to decompose time series into seasonal, trend & remainder • The key is the remainders (residuals) • Uses IQR or GESD to detect anomalies Key Concept Only difference is using Piecewise Medians vs LOESS Trend Observed Seasonal Component Median Component Remaining Component (remainder) - -
  12. Implementation 3-Step Process: 1. time_decompose() Uses STL or Twitter to

    decompose time series into seasonal, trend & remainder 2. anomalize() Uses IQR or GESD to detect anomalies 3. time_recompose() Calculates outlier boundaries
  13. Web Forecasting Workflow Start Finish 1 2 3 data.table &

    ggplot2 Exploratory Data Analysis anomalize Anomaly Detection Data Cleaning parsnip, purrr & ggplot2 Forecast Web Traffic
  14. Pro Tip Clean Your Anomalies Option 2 Clean Anomalies Replace

    Anomaly Values with Trend + Seasonal Components Pro Improves Forecasting Performance Con Doesn’t predict well when future has anomalies Option 1 Flag Anomalies Just add the “Anomaly (Y/N)” as a Flag in your model Pro Predicts well when future has anomalies that are similar to past anomalies Con May reduce forecasting accuracy
  15. Pro Tip Clean Your Anomalies Option 2 Clean Anomalies Replace

    Anomaly Values with Trend + Seasonal Components Option 1 Flag Anomalies Just add the “Anomaly (Y/N)” as a Flag in your model 51% Improvement
  16. Web Forecasting Step-By-Step Start Finish 1 2 3 data.table &

    ggplot2 Exploratory Data Analysis anomalize Anomaly Detection Data Cleaning parsnip, purrr & ggplot2 Forecast Web Traffic 101 & Lab 13 101 & 201 Lab 18
  17. Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling

    Advanced Data Science Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science For Business with R (DS4B 201-R) R Shiny Web Apps For Business (DS4B 102-R) Web Apps Data Science Foundations 7 Weeks Machine Learning & Business Consulting 10 Weeks Web Application Development 4 Weeks -TRACK Project-Based Courses with Business Application Business Science University R-Track 3-Course R-Track System
  18. Key Benefits - Fundamentals - Weeks 1-5 (25 hours of

    Video Lessons) - Data Manipulation (dplyr) - Time series (lubridate) - Text (stringr) - Categorical (forcats) - Visualization (ggplot2) - Programming & Iteration (purrr) - 3 Challenges - Machine Learning - Week 6 (8 hours of Video Lessons) - Clustering (3 hours) - Regression (5 hours) - 2 Challenges - Learn Business Reporting - Week 7 - RMarkdown & plotly - 2 Project Reports: 1. Product Pricing Algo 2. Customer Segmentation Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science Foundations 7 Weeks
  19. Key Benefits Understanding the Problem & Preparing Data - Weeks

    1-4 - Project Setup & Framework - Business Understanding / Sizing Problem - Tidy Evaluation - rlang - EDA - Exploring Data -GGally, skimr - Data Preparation - recipes - Correlation Analysis - 3 Challenges Machine Learning - Weeks 5, 6, 7 - H2O AutoML - Modeling Churn - ML Performance - LIME Feature Explanation Return-On-Investment - Weeks 7, 8, 9 - Expected Value Framework - Threshold Optimization - Sensitivity Analysis - Recommendation Algorithm Data Science For Business (DS4B 201-R) Machine Learning & Business Consulting 10 Weeks Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling Advanced Data Science End-to-End Churn Project
  20. Key Benefits Learn Shiny & Flexdashboard - Build Applications -

    Learn Reactive Programming - Integrate Machine Learning App #1: Predictive Pricing App - Model Product Portfolio - XGBoost Pricing Prediction - Generate new products instantly App #2: Sales Dashboard with Demand Forecasting - Model Demand History - Segment Forecasts by Product & Customer - XGBoost Time Series Forecast - Generate new forecasts instantly Shiny Apps for Business (DS4B 102-R) Web Application Development 4 Weeks Web Apps Machine Learning
  21. Success Story Masatake Hirono - Took DS4B 201-R - Completed

    the 10-Week Course - Landed a Job at one of the most Prestigious Management Consulting Firms “This course showed me how to place data analytics in real business settings.” #Business Science Success