Prototyping with R Packages, R-Ladies Amsterdam

Eeb35f6d0b2e114853442d7575fe80af?s=47 isteves
August 19, 2020

Prototyping with R Packages, R-Ladies Amsterdam

At Riskified, we sometimes come across fraud patterns in eCommerce that our regular models cannot detect. While retraining models on newer data can solve some of these problems, some patterns require entirely new analyses. In this talk, we'll discuss how we tackled fraudsters taking advantage of "Buy Online Pay in Store" policies, and how R packages helped us gradually transition from analyses to production-ready code.

Eeb35f6d0b2e114853442d7575fe80af?s=128

isteves

August 19, 2020
Tweet

Transcript

  1. Prototyping with R packages Irene Steves & Yogev Herz @i_steves

    @yogevmh 2020-08-19, R-Ladies Amsterdam
  2. • Data Science & Research department at Riskified, based in

    Tel Aviv • Ecology & evolutionary biology background • Fans of Bob the dog About us Theoretical overview Fraud case-study
  3. Riskified e-Commerce fraud prevention for online merchants: verify orders at

    checkout and take liability for bad decisions
  4. Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine

    learning models to prevent fraud throughout the shopping process
  5. What does it mean to put into production?

  6. Research Code that answers questions and delivers ideas What does

    it mean to put into production? Development Code that takes an input and consistently produces an output
  7. Exploratory analyses Research to production Packaging code Deploy

  8. Exploratory analyses

  9. Using scripts 1. Load needed packages 2. Functions & constants

    3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &
  10. Using scripts Functions file • Runs first • Functions &

    constants • Load (and install) necessary libraries Scripts • Consistent names, numbered • Sequential
  11. Setting up a research project in R R-Project Queries &

    Data Functions & Scripts End goal: Reproducible research report
  12. Case study: Fighting BOPS fraud • Buy Online Pickup in

    Store • Offered by many e-commerce merchants • Appealing to customers because it is fast, frictionless and free
  13. How does BOPS fraud work? BILLING NAME John Smith SHIPPING

    NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface
  14. How much of the BOPS fraud is recurring fraud? William

    Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley
  15. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington
  16. Matching names to identities

  17. Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue

    Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz
  18. Matching names to identities 1 2 3 4 5 6

    7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51
  19. Matching names to identities

  20. Matching names to identities

  21. BOPS research task results • A method for reliably clustering

    names into entities
  22. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity
  23. BOPS research task results • A method for reliably clustering

    names into entities • An estimate of problem severity • Insights into fraud patterns
  24. Packaging code

  25. Using scripts Challenges • Documentation via comments • Dependencies on

    external packages not rigorously checked • Often shared via copy & paste • Filepath issues • Usually not maintained
  26. Why a package? • Easy to get started, especially with

    devtools & usethis helpers • Accessible documentation • Keeps functions & dependencies organized • Testing infrastructure • Installable!
  27. Goal: Create functions to detect BOPS fraud How to package?

    Packaging a research project
  28. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package Packaging a research project
  29. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand that other people will use your package Packaging a research project
  30. Goal: Create functions to detect BOPS fraud How to package?

    • Understand who will use the package • Understand people will use your package • Handle namespaces Packaging a research project
  31. Into the riskiverse

  32. Deploy

  33. • Start simple: run locally and manually to test effects

    ☕ Deploying the package
  34. • Start simple: run locally and manually to test effects

    ☕ • When we feel confident: send it to a remote machine to run automatically Deploying the package
  35. Research to production Packaging code Add documentation, tests, etc Deploy

    Start with weekly/daily basis Offline rather than online Not optimized for speed/scale Exploratory analyses Understand biz value Produce example outputs Gradual ramp-up Iterate & evaluate
  36. Research Prioritizes new insights, flexibility R for prototyping Development Prioritizes

    re-use, stability, scalability, speed Analysis → build mode involves shifting mindsets -- not necessarily new tools!
  37. Thank you for your time! Irene Steves @i_steves Yogev Herz

    @yogevmh Check out our tech blog! https://medium.com/riskified-technology
  38. https://xkcd.com/2054/