Prototyping with R packages

Eeb35f6d0b2e114853442d7575fe80af?s=47 isteves
January 16, 2020

Prototyping with R packages

At Riskified, we sometimes come across fraud patterns in eCommerce that our regular models cannot detect. While retraining models on newer data can solve some of these problems, some patterns require entirely new analyses. In this talk, we'll discuss how we tackled fraudsters taking advantage of "Buy Online Pay in Store" policies, and how R packages helped us gradually transition from analyses to production-ready code.

Eeb35f6d0b2e114853442d7575fe80af?s=128

isteves

January 16, 2020
Tweet

Transcript

  1. Prototyping with R packages Irene Steves & Yogev Herz @i_steves

    @yogevmh 2020-01-16, R Meet-Up, Tel Aviv
  2. • Data Science Analytics team at Riskified • Ecology &

    evolutionary biology background • Fans of Bob the dog About us
  3. Recover Auth Rate Optimization Bank Relationships Deco Account Protection Chargeback

    Guarantee Representment Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine learning models to prevent fraud throughout the shopping journey
  4. Data Science Analytics • ~ 20 team members • Key

    responsibilities: ◦ Model training & retraining ◦ Feature engineering ◦ Research on recurring fraud themes (e.g. Account/Email Takeovers) ◦ New product POC’s • ❤ R
  5. You don’t know what the end result will be You

    have an end result in mind What does it mean to put into production?
  6. Research to production Exploratory analyses Packaging code Deploy Buy Online

    Pickup in Store
  7. Exploratory analyses

  8. Using scripts 1. Load needed packages 2. Functions & constants

    3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &
  9. Using scripts Functions file • Runs first • Functions &

    constants • Load (and install) necessary libraries Scripts • Consistent names, numbered • Sequential
  10. Setting up a research project in R R-Project Queries &

    Data Functions & Scripts End goal: Reproducible research report
  11. Case study: Fighting BOPS fraud • Buy Online Pickup in

    Store • Offered by many e-commerce merchants • Appealing to customers because it is fast, frictionless and free
  12. How does BOPS fraud work? BILLING NAME John Smith SHIPPING

    NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface
  13. How much of the BOPS fraud is recurring fraud? William

    Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley
  14. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington
  15. Matching names to identities

  16. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz
  17. Matching names to identities 1 2 3 4 5 6

    7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51
  18. Matching names to identities

  19. Matching names to identities

  20. BOPS research task results • A method for reliably clustering

    names into entities
  21. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity
  22. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity • Insights into fraud patterns
  23. Packaging code

  24. Using scripts Challenges • Documentation via comments • Dependencies on

    external packages not rigorously checked • Often shared via copy & paste • Filepath issues • Usually not maintained
  25. Why a package? • Easy to get started, especially with

    devtools & usethis helpers • Accessible documentation • Keeps functions & dependencies organized • Installable! • Testing infrastructure
  26. Goal: Create functions to detect BOPS fraud How to package?

    Packaging a research project
  27. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package Packaging a research project
  28. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package Packaging a research project
  29. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package • Handle namespaces Packaging a research project
  30. Deploy

  31. • Start simple: run locally and manually to test effects

    ☕ • When we feel confident: send it to a remote machine to run automatically Deploying the package
  32. Research to production Exploratory analyses • Understand business value •

    Example outputs Packaging code • Add documentation, tests, etc Deploy • Start with weekly/daily basis • Offline rather than online • Not optimized for speed/scale Gradual ramp-up Iterate & evaluate
  33. R for prototyping Analysis → build mode involves shifting mindsets

    -- not necessarily new tools! New insights Flexibility Re-use Stability
  34. Thank you for your time! Irene Steves @i_steves Yogev Herz

    @yogevmh Check out our tech blog! https://medium.com/riskified-technology
  35. https://xkcd.com/2054/