Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prototyping with R packages

isteves
January 16, 2020

Prototyping with R packages

At Riskified, we sometimes come across fraud patterns in eCommerce that our regular models cannot detect. While retraining models on newer data can solve some of these problems, some patterns require entirely new analyses. In this talk, we'll discuss how we tackled fraudsters taking advantage of "Buy Online Pay in Store" policies, and how R packages helped us gradually transition from analyses to production-ready code.

isteves

January 16, 2020
Tweet

More Decks by isteves

Other Decks in Research

Transcript

  1. Prototyping with R packages Irene Steves & Yogev Herz @i_steves

    @yogevmh 2020-01-16, R Meet-Up, Tel Aviv
  2. • Data Science Analytics team at Riskified • Ecology &

    evolutionary biology background • Fans of Bob the dog About us
  3. Recover Auth Rate Optimization Bank Relationships Deco Account Protection Chargeback

    Guarantee Representment Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine learning models to prevent fraud throughout the shopping journey
  4. Data Science Analytics • ~ 20 team members • Key

    responsibilities: ◦ Model training & retraining ◦ Feature engineering ◦ Research on recurring fraud themes (e.g. Account/Email Takeovers) ◦ New product POC’s • ❤ R
  5. You don’t know what the end result will be You

    have an end result in mind What does it mean to put into production?
  6. Using scripts 1. Load needed packages 2. Functions & constants

    3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &
  7. Using scripts Functions file • Runs first • Functions &

    constants • Load (and install) necessary libraries Scripts • Consistent names, numbered • Sequential
  8. Setting up a research project in R R-Project Queries &

    Data Functions & Scripts End goal: Reproducible research report
  9. Case study: Fighting BOPS fraud • Buy Online Pickup in

    Store • Offered by many e-commerce merchants • Appealing to customers because it is fast, frictionless and free
  10. How does BOPS fraud work? BILLING NAME John Smith SHIPPING

    NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface
  11. How much of the BOPS fraud is recurring fraud? William

    Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley
  12. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington
  13. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz
  14. Matching names to identities 1 2 3 4 5 6

    7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51
  15. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity
  16. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity • Insights into fraud patterns
  17. Using scripts Challenges • Documentation via comments • Dependencies on

    external packages not rigorously checked • Often shared via copy & paste • Filepath issues • Usually not maintained
  18. Why a package? • Easy to get started, especially with

    devtools & usethis helpers • Accessible documentation • Keeps functions & dependencies organized • Installable! • Testing infrastructure
  19. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package Packaging a research project
  20. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package Packaging a research project
  21. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package • Handle namespaces Packaging a research project
  22. • Start simple: run locally and manually to test effects

    ☕ • When we feel confident: send it to a remote machine to run automatically Deploying the package
  23. Research to production Exploratory analyses • Understand business value •

    Example outputs Packaging code • Add documentation, tests, etc Deploy • Start with weekly/daily basis • Offline rather than online • Not optimized for speed/scale Gradual ramp-up Iterate & evaluate
  24. R for prototyping Analysis → build mode involves shifting mindsets

    -- not necessarily new tools! New insights Flexibility Re-use Stability
  25. Thank you for your time! Irene Steves @i_steves Yogev Herz

    @yogevmh Check out our tech blog! https://medium.com/riskified-technology