Upgrade to Pro — share decks privately, control downloads, hide ads and more …

R Programming with Tidy Eval and Rlang [Learnin...

R Programming with Tidy Eval and Rlang [Learning Lab 12]

I wrote anomalize - an R package for time series anomaly detection. In the process, I made heavy use of Tidy Eval and Rlang - techniques for programming in the tidyverse.

This lesson goes over the fundamentals of rlang and tidy eval so you can program using the same tools I used to make the anomalize. Learn them, and become super powerful for developing tidy-compliant functions!

Matt Dancho

June 19, 2019
Tweet

More Decks by Matt Dancho

Other Decks in Business

Transcript

  1. How I Built Anomalize Programming Case Study Matt Dancho &

    David Curry Business Science Learning Lab Special Topics: tidyverse and rlang (tidy eval)
  2. Success Story Bryan Clark - Data Scientist at H&M -

    Took DS4B 201-R - Building Apps that integrate R Shiny - Analyzed Risk Model of… - Wedding!!! (Congrats!) “This is legit a milestone in my development.” #BusinessScienceSuccess
  3. Agenda • Case Study ◦ Time Series Anomaly Detection ◦

    Business Value • The Solution ◦ Anomalize ◦ 3 min Demo • Functional Programming ◦ How I made anomalize ◦ Tools in my Toolkit • R Demo ◦ Rlang in 15 minutes • Becoming a Complete Data Scientist ◦ Learn the skills that transform you & your organization
  4. Web Traffic Business Objectives Understand when web traffic was abnormally

    changing Identify cause-effect relationships Thousands of time series Anomaly Detection
  5. Step 1: What Software Existed? Twitter AnomalyDetection Pros: • Open

    Source R Package • Innovative method for detecting anomalies • Piecewise Median for trend removal • GESD for outlier detection Cons: • Not scalable to many time series • User interface was non-tidy • Growth presented challenge https://github.com/twitter/AnomalyDetection
  6. Step 2 Improve on Twitter’s Algorithm R Package Goals Scalability

    to many time series Simple user interface (tidyverse-compliant) Handle growth (trend) in time series
  7. Good at detecting anomalies Game of Thrones (season 6) From

    Wikipedia, the free encyclopedia The sixth season of the fantasy drama television series Game of Thrones premiered on HBO on April 24, 2016, and concluded on June 26, 2016. GOT Season 6 Kit Harington’s Views on wikipedia spike https://en.wikipedia.org/wiki/Game_of_Thrones_(season_6)
  8. Many uses Malicious Behavior Is someone (or some organization) attacking

    our organization? https://www.business-science.io/business/2018/06/10/infosec-anom alize-threat-hunting.html
  9. 3 Simple Steps 1 2 3 1. Remove Trend &

    Seasonality 2. Analyze residuals & detect outliers 3. Recompose for visualization
  10. Tools Needed A tool for developing tidyverse functions Functional Programming

    Week 5 - DS4B 101-R Course Foundational Data Science Tidyverse Compliance - rlang Week 2 - DS4B 201-R Course Advanced ML & Business Consulting
  11. Functional Programming Make functions that do things Key Concepts How

    to program in R Learn the 2 most common design patterns 1. Vector Functions 2. Data Functions Learn how to scale analysis with purrr DS4B 101-R: Week 5 Functional Programming & Iteration
  12. rlang A tool for developing tidyverse functions Key Concepts Metaprogramming

    Capture expressions before they are evaluated by R Evaluate when we are ready Make functions that comply with dplyr and ggplot2 API DS4B 201-R: Week 2 Understanding the Business
  13. Capturing Expressions If R code hits an expression, it’s automatically

    evaluated Inside tidyverse functions, we need to pass column names (pointers to a column) If R sees them, it evaluates them. This causes an error. User enters some column name R tries to evaluate looking in the Global Environment ERROR: Object “value” not found
  14. Capturing Expressions Rlang provides tools to enable the tidyverse style

    of programming enquo(): Captures the expression !! Tells R to evaluate it in the correct context (in relation to the data that the column belongs to) Capture the column before R has a chance to evaluate Evaluate the captured column in the correct context IT WORKS!!
  15. 1,000 Per company My estimate of the number of unique

    business problems that repeat every day, week, month, or quarter
  16. Functional Programming Solve unique + recurring business problems 1. Automate

    repetitive tasks 2. Scale from 1 problem to Many problems
  17. Employee Churn Report • Your department is charged with monitoring

    employee churn • Every month, you report which departments have the highest churn, rate of change, impact (+/-) of churn prevention tactics
  18. Are you like us? ✓ Desire to learn? ✓ As

    fast as possible? ✓ Focus on business? ✓ Deliver results? ✓ Provide value?
  19. If Yes, I can help The Business Science difference •

    Project-Based Courses • Business Focus • Repeatable Methodologies • Tool Integration • Mentorship
  20. YOUR Transformation Start Finish Everything is Taken Care of For

    You in Our Platform Do Projects Climb the Hill Build Production-Ready Web Apps 1-Hour Courses Domain Analysis & Tool-Specific Courses R-Track: 101 + 201 R-Track: 102 + 202 Learning Labs PRO 1 2 3
  21. Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling

    Advanced Data Science Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science For Business with R (DS4B 201-R) R Shiny Web Apps For Business (DS4B 102-R) Web Apps Data Science Foundations 7 Weeks Machine Learning & Business Consulting 10 Weeks Web Application Development 4 Weeks -TRACK Project-Based Courses with Business Application Business Science University R-Track Courses
  22. Key Benefits - Fundamentals - Weeks 1-5 (25 hours of

    Video Lessons) - Data Manipulation (dplyr) - Time series (lubridate) - Text (stringr) - Categorical (forcats) - Visualization (ggplot2) - Programming & Iteration (purrr) - 3 Challenges - Machine Learning - Week 6 (8 hours of Video Lessons) - Clustering (3 hours) - Regression (5 hours) - 2 Challenges - Learn Business Reporting - Week 7 - RMarkdown & plotly - 2 Project Reports: 1. Product Pricing Algo 2. Customer Segmentation Visualization Data Cleaning & Manipulation Functional Programming & Modeling Business Reporting Business Analysis with R (DS4B 101-R) Data Science Foundations 7 Weeks
  23. Key Benefits Understanding the Problem & Preparing Data - Weeks

    1-4 - Project Setup & Framework - Business Understanding / Sizing Problem - Tidy Evaluation - rlang - EDA - Exploring Data -GGally, skimr - Data Preparation - recipes - Correlation Analysis - 3 Challenges Machine Learning - Weeks 5, 6, 7 - H2O AutoML - Modeling Churn - ML Performance - LIME Feature Explanation Return-On-Investment - Weeks 7, 8, 9 - Expected Value Framework - Threshold Optimization - Sensitivity Analysis - Recommendation Algorithm Data Science For Business (DS4B 201-R) Machine Learning & Business Consulting 10 Weeks Advanced Visualization Advanced Data Wrangling Advanced Functional Programming & Modeling Advanced Data Science Solve End-to-End Churn Project
  24. Key Benefits Learn Shiny & Flexdashboard - Build Applications -

    Learn Reactive Programming - Integrate Machine Learning App #1: Predictive Pricing App - Model Product Portfolio - XGBoost Pricing Prediction - Generate new products instantly App #2: Sales Dashboard with Demand Forecasting - Model Demand History - Segment Forecasts by Product & Customer - XGBoost Time Series Forecast - Generate new forecasts instantly Shiny Apps for Business (DS4B 102-R) Web Application Development 4 Weeks Web Apps Machine Learning
  25. Testimonials “I can already apply a lot of the early

    gains from the course to current working projects.” -Adam Mitchell, Data Analyst with Eurostar “Your program allowed me to cut down to 50% of the time to deliver solutions to my clients.” -Rodrigo Prado, Managing Partner Big Data Analytics & Strategy at Genesis Partners “My work became 10X easier. I can spend quality time asking questions rather than wasting time trying to figure out syntax.” -Mohana Chittor, Data Scientist with Kabbage, Inc Achieve Results that Matter to the Business
  26. Learning Labs PRO 12 Courses & Growing • Recordings •

    Code • Resources Community- Driven Learning Labs PRO $19/Month Time Series - Web Scraping - Machine Learning & more!