Slide 1

Slide 1 text

How I Built Anomalize Programming Case Study Matt Dancho & David Curry Business Science Learning Lab Special Topics: tidyverse and rlang (tidy eval)

Slide 2

Slide 2 text

Learning Lab Structure ● Presentation (20 min) ● Demo’s (15 min) ● Presentation (20 mins)

Slide 3

Slide 3 text

Success Story Bryan Clark - Data Scientist at H&M - Took DS4B 201-R - Building Apps that integrate R Shiny - Analyzed Risk Model of… - Wedding!!! (Congrats!) “This is legit a milestone in my development.” #BusinessScienceSuccess

Slide 4

Slide 4 text

Shiny Web App Course Is Open!!! PROMO CODE LEARNINGLABS 15% OFF

Slide 5

Slide 5 text

Agenda ● Case Study ○ Time Series Anomaly Detection ○ Business Value ● The Solution ○ Anomalize ○ 3 min Demo ● Functional Programming ○ How I made anomalize ○ Tools in my Toolkit ● R Demo ○ Rlang in 15 minutes ● Becoming a Complete Data Scientist ○ Learn the skills that transform you & your organization

Slide 6

Slide 6 text

Business Case Study Detecting Time Series Anomalies

Slide 7

Slide 7 text

Web Traffic Business Objectives Understand when web traffic was abnormally changing Identify cause-effect relationships Thousands of time series Anomaly Detection

Slide 8

Slide 8 text

Step 1: What Software Existed? Twitter AnomalyDetection Pros: ● Open Source R Package ● Innovative method for detecting anomalies ● Piecewise Median for trend removal ● GESD for outlier detection Cons: ● Not scalable to many time series ● User interface was non-tidy ● Growth presented challenge https://github.com/twitter/AnomalyDetection

Slide 9

Slide 9 text

Step 2 Improve on Twitter’s Algorithm R Package Goals Scalability to many time series Simple user interface (tidyverse-compliant) Handle growth (trend) in time series

Slide 10

Slide 10 text

The Solution anomalize

Slide 11

Slide 11 text

Anomalize: Tidy anomaly detection https://business-science.github.io/anomalize/

Slide 12

Slide 12 text

Scalable https://www.kaggle.com/c/web-traffic-time-series-forecasting

Slide 13

Slide 13 text

Good at detecting anomalies Game of Thrones (season 6) From Wikipedia, the free encyclopedia The sixth season of the fantasy drama television series Game of Thrones premiered on HBO on April 24, 2016, and concluded on June 26, 2016. GOT Season 6 Kit Harington’s Views on wikipedia spike https://en.wikipedia.org/wiki/Game_of_Thrones_(season_6)

Slide 14

Slide 14 text

Many uses Malicious Behavior Is someone (or some organization) attacking our organization? https://www.business-science.io/business/2018/06/10/infosec-anom alize-threat-hunting.html

Slide 15

Slide 15 text

Quick Demo (3 min) anomalize

Slide 16

Slide 16 text

Functional Programming How I did it

Slide 17

Slide 17 text

3 Simple Steps 1 2 3 1. Remove Trend & Seasonality 2. Analyze residuals & detect outliers 3. Recompose for visualization

Slide 18

Slide 18 text

Simple Scalability Scaling is the same process. Just replace filter() with group_by() 4

Slide 19

Slide 19 text

Tools Needed A tool for developing tidyverse functions Functional Programming Week 5 - DS4B 101-R Course Foundational Data Science Tidyverse Compliance - rlang Week 2 - DS4B 201-R Course Advanced ML & Business Consulting

Slide 20

Slide 20 text

Functional Programming Make functions that do things Key Concepts How to program in R Learn the 2 most common design patterns 1. Vector Functions 2. Data Functions Learn how to scale analysis with purrr DS4B 101-R: Week 5 Functional Programming & Iteration

Slide 21

Slide 21 text

rlang A tool for developing tidyverse functions Key Concepts Metaprogramming Capture expressions before they are evaluated by R Evaluate when we are ready Make functions that comply with dplyr and ggplot2 API DS4B 201-R: Week 2 Understanding the Business

Slide 22

Slide 22 text

Capturing Expressions If R code hits an expression, it’s automatically evaluated Inside tidyverse functions, we need to pass column names (pointers to a column) If R sees them, it evaluates them. This causes an error. User enters some column name R tries to evaluate looking in the Global Environment ERROR: Object “value” not found

Slide 23

Slide 23 text

Capturing Expressions Rlang provides tools to enable the tidyverse style of programming enquo(): Captures the expression !! Tells R to evaluate it in the correct context (in relation to the data that the column belongs to) Capture the column before R has a chance to evaluate Evaluate the captured column in the correct context IT WORKS!!

Slide 24

Slide 24 text

Demo rlang in 15 minutes

Slide 25

Slide 25 text

1,000 Per company My estimate of the number of unique business problems that repeat every day, week, month, or quarter

Slide 26

Slide 26 text

Functional Programming Solve unique + recurring business problems 1. Automate repetitive tasks 2. Scale from 1 problem to Many problems

Slide 27

Slide 27 text

Employee Churn Report ● Your department is charged with monitoring employee churn ● Every month, you report which departments have the highest churn, rate of change, impact (+/-) of churn prevention tactics

Slide 28

Slide 28 text

Calculating Employee Churn Cost

Slide 29

Slide 29 text

Calculating Employee Churn Cost Automates the process of doing repetitive tasks

Slide 30

Slide 30 text

Becoming a Complete Data Scientist Learn the skills that transform you & your business

Slide 31

Slide 31 text

Are you like us? ✓ Desire to learn? ✓ As fast as possible? ✓ Focus on business? ✓ Deliver results? ✓ Provide value?

Slide 32

Slide 32 text

If Yes, I can help The Business Science difference ● Project-Based Courses ● Business Focus ● Repeatable Methodologies ● Tool Integration ● Mentorship

Slide 33

Slide 33 text

YOUR Transformation Start Finish Everything is Taken Care of For You in Our Platform Do Projects Climb the Hill Build Production-Ready Web Apps 1-Hour Courses Domain Analysis & Tool-Specific Courses R-Track: 101 + 201 R-Track: 102 + 202 Learning Labs PRO 1 2 3

Slide 34

Slide 34 text

Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling Advanced Data Science Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science For Business with R (DS4B 201-R) R Shiny Web Apps For Business (DS4B 102-R) Web Apps Data Science Foundations 7 Weeks Machine Learning & Business Consulting 10 Weeks Web Application Development 4 Weeks -TRACK Project-Based Courses with Business Application Business Science University R-Track Courses

Slide 35

Slide 35 text

Key Benefits - Fundamentals - Weeks 1-5 (25 hours of Video Lessons) - Data Manipulation (dplyr) - Time series (lubridate) - Text (stringr) - Categorical (forcats) - Visualization (ggplot2) - Programming & Iteration (purrr) - 3 Challenges - Machine Learning - Week 6 (8 hours of Video Lessons) - Clustering (3 hours) - Regression (5 hours) - 2 Challenges - Learn Business Reporting - Week 7 - RMarkdown & plotly - 2 Project Reports: 1. Product Pricing Algo 2. Customer Segmentation Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science Foundations 7 Weeks

Slide 36

Slide 36 text

Key Benefits Understanding the Problem & Preparing Data - Weeks 1-4 - Project Setup & Framework - Business Understanding / Sizing Problem - Tidy Evaluation - rlang - EDA - Exploring Data -GGally, skimr - Data Preparation - recipes - Correlation Analysis - 3 Challenges Machine Learning - Weeks 5, 6, 7 - H2O AutoML - Modeling Churn - ML Performance - LIME Feature Explanation Return-On-Investment - Weeks 7, 8, 9 - Expected Value Framework - Threshold Optimization - Sensitivity Analysis - Recommendation Algorithm Data Science For Business (DS4B 201-R) Machine Learning & Business Consulting 10 Weeks Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling Advanced Data Science Solve End-to-End Churn Project

Slide 37

Slide 37 text

Key Benefits Learn Shiny & Flexdashboard - Build Applications - Learn Reactive Programming - Integrate Machine Learning App #1: Predictive Pricing App - Model Product Portfolio - XGBoost Pricing Prediction - Generate new products instantly App #2: Sales Dashboard with Demand Forecasting - Model Demand History - Segment Forecasts by Product & Customer - XGBoost Time Series Forecast - Generate new forecasts instantly Shiny Apps for Business (DS4B 102-R) Web Application Development 4 Weeks Web Apps Machine Learning

Slide 38

Slide 38 text

Testimonials “I can already apply a lot of the early gains from the course to current working projects.” -Adam Mitchell, Data Analyst with Eurostar “Your program allowed me to cut down to 50% of the time to deliver solutions to my clients.” -Rodrigo Prado, Managing Partner Big Data Analytics & Strategy at Genesis Partners “My work became 10X easier. I can spend quality time asking questions rather than wasting time trying to figure out syntax.” -Mohana Chittor, Data Scientist with Kabbage, Inc Achieve Results that Matter to the Business

Slide 39

Slide 39 text

-TRACK BUNDLE MSRP: $234/mo 6 Low Monthly Payments $199/mo Save: $35/mo PROMO Code: learninglabs

Slide 40

Slide 40 text

Learning Labs PRO 12 Courses & Growing ● Recordings ● Code ● Resources Community- Driven Learning Labs PRO $19/Month Time Series - Web Scraping - Machine Learning & more!

Slide 41

Slide 41 text

Begin Learning Today university.business-science.io