Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prototyping with R Packages, R-Ladies Amsterdam

isteves
August 19, 2020

Prototyping with R Packages, R-Ladies Amsterdam

At Riskified, we sometimes come across fraud patterns in eCommerce that our regular models cannot detect. While retraining models on newer data can solve some of these problems, some patterns require entirely new analyses. In this talk, we'll discuss how we tackled fraudsters taking advantage of "Buy Online Pay in Store" policies, and how R packages helped us gradually transition from analyses to production-ready code.

isteves

August 19, 2020
Tweet

More Decks by isteves

Other Decks in Research

Transcript

  1. Prototyping with R packages Irene Steves & Yogev Herz @i_steves

    @yogevmh 2020-08-19, R-Ladies Amsterdam
  2. • Data Science & Research department at Riskified, based in

    Tel Aviv • Ecology & evolutionary biology background • Fans of Bob the dog About us Theoretical overview Fraud case-study
  3. Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine

    learning models to prevent fraud throughout the shopping process
  4. Research Code that answers questions and delivers ideas What does

    it mean to put into production? Development Code that takes an input and consistently produces an output
  5. Using scripts 1. Load needed packages 2. Functions & constants

    3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &
  6. Using scripts Functions file • Runs first • Functions &

    constants • Load (and install) necessary libraries Scripts • Consistent names, numbered • Sequential
  7. Setting up a research project in R R-Project Queries &

    Data Functions & Scripts End goal: Reproducible research report
  8. Case study: Fighting BOPS fraud • Buy Online Pickup in

    Store • Offered by many e-commerce merchants • Appealing to customers because it is fast, frictionless and free
  9. How does BOPS fraud work? BILLING NAME John Smith SHIPPING

    NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface
  10. How much of the BOPS fraud is recurring fraud? William

    Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley
  11. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington
  12. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz
  13. Matching names to identities 1 2 3 4 5 6

    7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51
  14. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity
  15. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity • Insights into fraud patterns
  16. Using scripts Challenges • Documentation via comments • Dependencies on

    external packages not rigorously checked • Often shared via copy & paste • Filepath issues • Usually not maintained
  17. Why a package? • Easy to get started, especially with

    devtools & usethis helpers • Accessible documentation • Keeps functions & dependencies organized • Testing infrastructure • Installable!
  18. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package Packaging a research project
  19. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand that other people will use your package Packaging a research project
  20. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package • Handle namespaces Packaging a research project
  21. • Start simple: run locally and manually to test effects

    ☕ • When we feel confident: send it to a remote machine to run automatically Deploying the package
  22. Research to production Packaging code Add documentation, tests, etc Deploy

    Start with weekly/daily basis Offline rather than online Not optimized for speed/scale Exploratory analyses Understand biz value Produce example outputs Gradual ramp-up Iterate & evaluate
  23. Research Prioritizes new insights, flexibility R for prototyping Development Prioritizes

    re-use, stability, scalability, speed Analysis → build mode involves shifting mindsets -- not necessarily new tools!
  24. Thank you for your time! Irene Steves @i_steves Yogev Herz

    @yogevmh Check out our tech blog! https://medium.com/riskified-technology