Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prototyping with R Packages, R-Ladies Amsterdam

isteves
August 19, 2020

Prototyping with R Packages, R-Ladies Amsterdam

At Riskified, we sometimes come across fraud patterns in eCommerce that our regular models cannot detect. While retraining models on newer data can solve some of these problems, some patterns require entirely new analyses. In this talk, we'll discuss how we tackled fraudsters taking advantage of "Buy Online Pay in Store" policies, and how R packages helped us gradually transition from analyses to production-ready code.

isteves

August 19, 2020
Tweet

More Decks by isteves

Other Decks in Research

Transcript

  1. Prototyping with
    R packages
    Irene Steves & Yogev Herz
    @i_steves @yogevmh
    2020-08-19, R-Ladies Amsterdam

    View Slide

  2. ● Data Science & Research department
    at Riskified, based in Tel Aviv
    ● Ecology & evolutionary biology
    background
    ● Fans of Bob the dog
    About us
    Theoretical
    overview
    Fraud
    case-study

    View Slide

  3. Riskified
    e-Commerce fraud prevention for online merchants:
    verify orders at checkout and take liability for bad decisions

    View Slide

  4. Fraud Review
    Checkout Authorization Capture/Decline
    Login
    Riskified
    We use machine learning models to prevent fraud
    throughout the shopping process

    View Slide

  5. What does it mean to put into production?

    View Slide

  6. Research
    Code that answers
    questions and
    delivers ideas
    What does it mean to put into production?
    Development
    Code that takes an
    input and consistently
    produces an output

    View Slide

  7. Exploratory
    analyses
    Research to production
    Packaging
    code
    Deploy

    View Slide

  8. Exploratory
    analyses

    View Slide

  9. Using scripts
    1. Load needed packages
    2. Functions & constants
    3. Data ingest
    4. Wrangling, plotting, stats
    FLEXIBLE
    FAST &

    View Slide

  10. Using scripts
    Functions file
    ● Runs first
    ● Functions & constants
    ● Load (and install) necessary libraries
    Scripts
    ● Consistent names, numbered
    ● Sequential

    View Slide

  11. Setting up a research project in R
    R-Project
    Queries & Data
    Functions & Scripts
    End goal:
    Reproducible research report

    View Slide

  12. Case study:
    Fighting BOPS fraud
    ● Buy Online Pickup in Store
    ● Offered by many e-commerce
    merchants
    ● Appealing to customers
    because it is fast, frictionless
    and free

    View Slide

  13. How does BOPS fraud work?
    BILLING NAME
    John Smith
    SHIPPING NAME
    Jane Smith
    Legitimate
    Order
    PICKUP
    Jane Smith
    BILLING NAME
    John Smith
    SHIPPING NAME
    Fraudy
    McFraudface
    Fraud
    PICKUP
    Fraudy
    McFraudface
    BILLING NAME
    John Smith
    SHIPPING NAME
    Frauddie J.
    McFrraudddface
    Recurring
    Fraud
    PICKUP
    Fraudy
    McFraudface

    View Slide

  14. How much of the BOPS fraud is recurring fraud?
    William Bartley William Barrtleyy William Barrttley William Bartkey
    William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley
    William Basrtley William Beartley William Bertley William Vartley

    View Slide

  15. Matching names to identities
    Troy Holmes
    Ernick Rodrigue
    Ernick Rodrigue
    Troy J Holmes
    Troy Jesus Holmes
    Nickki Washington
    Nicxole Washington
    Troy Junior Holm
    Troy Junior Holme
    Ernick Roddrifuez
    Nickole Washington
    Troy Jr. Holmes
    Nickii Washington
    Troyy Holmes
    Ernick Rodriguex
    Ernickk Rodriguz
    Ernick Rodrigue
    Ernick Rodrigue
    Ernick Roddrifuez
    Ernick Rodriguex
    Ernickk Rodriguz
    Troy Holmes
    Troy J Holmes
    Troy Jesus Holmes
    Troy Junior Holm
    Troy Junior Holme
    Troy Jr. Holmes
    Troyy Holmes
    Nickki Washington
    Nicxole Washington
    Nickole Washington
    Nickii Washington

    View Slide

  16. Matching names to identities

    View Slide

  17. Matching names to identities
    Troy Holmes
    Ernick Rodrigue
    Ernick Rodrigue
    Troy J Holmes
    Troy Jesus Holmes
    Nickki Washington
    Nicxole Washington
    Troy Junior Holm
    Troy Junior Holme
    Ernick Roddrifuez
    Nickole Washington
    Troy Jr. Holmes
    Nickii Washington
    Troyy Holmes
    Ernick Rodriguex
    Ernickk Rodriguz

    View Slide

  18. Matching names to identities
    1 2 3 4 5 6 7 8 9 10
    2 0.54
    3 0.54 0.00
    4 0.05 0.56 0.56
    5 0.21 0.50 0.50 0.17
    6 0.63 0.42 0.42 0.61 0.51
    7 0.51 0.47 0.47 0.50 0.52 0.20
    8 0.24 0.51 0.51 0.20 0.22 0.63 0.60
    9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02
    10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51
    11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51

    View Slide

  19. Matching names to identities

    View Slide

  20. Matching names to identities

    View Slide

  21. BOPS research task results
    ● A method for reliably clustering names
    into entities

    View Slide

  22. BOPS research task results
    ● A method for reliably clustering names
    into entities
    ● An estimate of problem severity

    View Slide

  23. BOPS research task results
    ● A method for reliably clustering names
    into entities
    ● An estimate of problem severity
    ● Insights into fraud patterns

    View Slide

  24. Packaging code

    View Slide

  25. Using scripts
    Challenges
    ● Documentation via comments
    ● Dependencies on external packages not rigorously
    checked
    ● Often shared via copy & paste
    ● Filepath issues
    ● Usually not maintained

    View Slide

  26. Why a package?
    ● Easy to get started, especially with devtools
    & usethis helpers
    ● Accessible documentation
    ● Keeps functions & dependencies organized
    ● Testing infrastructure
    ● Installable!

    View Slide

  27. Goal: Create functions to detect BOPS fraud
    How to package?
    Packaging a research project

    View Slide

  28. Goal: Create functions to detect BOPS fraud
    How to package?
    ● Understand who will use the package
    Packaging a research project

    View Slide

  29. Goal: Create functions to detect BOPS fraud
    How to package?
    ● Understand who will use the package
    ● Understand that other people will use
    your package
    Packaging a research project

    View Slide

  30. Goal: Create functions to detect BOPS fraud
    How to package?
    ● Understand who will use the package
    ● Understand people will use your
    package
    ● Handle namespaces
    Packaging a research project

    View Slide

  31. Into the riskiverse

    View Slide

  32. Deploy

    View Slide

  33. ● Start simple: run locally and manually
    to test effects ☕
    Deploying the package

    View Slide

  34. ● Start simple: run locally and manually
    to test effects ☕
    ● When we feel confident: send it to a
    remote machine to run automatically
    Deploying the package

    View Slide

  35. Research to production
    Packaging code
    Add documentation,
    tests, etc
    Deploy
    Start with weekly/daily basis
    Offline rather than online
    Not optimized for speed/scale
    Exploratory analyses
    Understand biz value
    Produce example outputs Gradual ramp-up
    Iterate &
    evaluate

    View Slide

  36. Research
    Prioritizes new
    insights, flexibility
    R for prototyping
    Development
    Prioritizes re-use, stability,
    scalability, speed
    Analysis → build mode involves shifting mindsets -- not necessarily new tools!

    View Slide

  37. Thank you for
    your time!
    Irene Steves @i_steves
    Yogev Herz @yogevmh
    Check out our tech blog! https://medium.com/riskified-technology

    View Slide

  38. https://xkcd.com/2054/

    View Slide