Prototyping with
R packages
Irene Steves & Yogev Herz
@i_steves @yogevmh
2020-08-19, R-Ladies Amsterdam
Slide 2
Slide 2 text
● Data Science & Research department
at Riskified, based in Tel Aviv
● Ecology & evolutionary biology
background
● Fans of Bob the dog
About us
Theoretical
overview
Fraud
case-study
Slide 3
Slide 3 text
Riskified
e-Commerce fraud prevention for online merchants:
verify orders at checkout and take liability for bad decisions
Slide 4
Slide 4 text
Fraud Review
Checkout Authorization Capture/Decline
Login
Riskified
We use machine learning models to prevent fraud
throughout the shopping process
Slide 5
Slide 5 text
What does it mean to put into production?
Slide 6
Slide 6 text
Research
Code that answers
questions and
delivers ideas
What does it mean to put into production?
Development
Code that takes an
input and consistently
produces an output
Slide 7
Slide 7 text
Exploratory
analyses
Research to production
Packaging
code
Deploy
Slide 8
Slide 8 text
Exploratory
analyses
Slide 9
Slide 9 text
Using scripts
1. Load needed packages
2. Functions & constants
3. Data ingest
4. Wrangling, plotting, stats
FLEXIBLE
FAST &
Setting up a research project in R
R-Project
Queries & Data
Functions & Scripts
End goal:
Reproducible research report
Slide 12
Slide 12 text
Case study:
Fighting BOPS fraud
● Buy Online Pickup in Store
● Offered by many e-commerce
merchants
● Appealing to customers
because it is fast, frictionless
and free
Slide 13
Slide 13 text
How does BOPS fraud work?
BILLING NAME
John Smith
SHIPPING NAME
Jane Smith
Legitimate
Order
PICKUP
Jane Smith
BILLING NAME
John Smith
SHIPPING NAME
Fraudy
McFraudface
Fraud
PICKUP
Fraudy
McFraudface
BILLING NAME
John Smith
SHIPPING NAME
Frauddie J.
McFrraudddface
Recurring
Fraud
PICKUP
Fraudy
McFraudface
Slide 14
Slide 14 text
How much of the BOPS fraud is recurring fraud?
William Bartley William Barrtleyy William Barrttley William Bartkey
William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley
William Basrtley William Beartley William Bertley William Vartley
Slide 15
Slide 15 text
Matching names to identities
Troy Holmes
Ernick Rodrigue
Ernick Rodrigue
Troy J Holmes
Troy Jesus Holmes
Nickki Washington
Nicxole Washington
Troy Junior Holm
Troy Junior Holme
Ernick Roddrifuez
Nickole Washington
Troy Jr. Holmes
Nickii Washington
Troyy Holmes
Ernick Rodriguex
Ernickk Rodriguz
Ernick Rodrigue
Ernick Rodrigue
Ernick Roddrifuez
Ernick Rodriguex
Ernickk Rodriguz
Troy Holmes
Troy J Holmes
Troy Jesus Holmes
Troy Junior Holm
Troy Junior Holme
Troy Jr. Holmes
Troyy Holmes
Nickki Washington
Nicxole Washington
Nickole Washington
Nickii Washington
Slide 16
Slide 16 text
Matching names to identities
Slide 17
Slide 17 text
Matching names to identities
Troy Holmes
Ernick Rodrigue
Ernick Rodrigue
Troy J Holmes
Troy Jesus Holmes
Nickki Washington
Nicxole Washington
Troy Junior Holm
Troy Junior Holme
Ernick Roddrifuez
Nickole Washington
Troy Jr. Holmes
Nickii Washington
Troyy Holmes
Ernick Rodriguex
Ernickk Rodriguz
BOPS research task results
● A method for reliably clustering names
into entities
Slide 22
Slide 22 text
BOPS research task results
● A method for reliably clustering names
into entities
● An estimate of problem severity
Slide 23
Slide 23 text
BOPS research task results
● A method for reliably clustering names
into entities
● An estimate of problem severity
● Insights into fraud patterns
Slide 24
Slide 24 text
Packaging code
Slide 25
Slide 25 text
Using scripts
Challenges
● Documentation via comments
● Dependencies on external packages not rigorously
checked
● Often shared via copy & paste
● Filepath issues
● Usually not maintained
Slide 26
Slide 26 text
Why a package?
● Easy to get started, especially with devtools
& usethis helpers
● Accessible documentation
● Keeps functions & dependencies organized
● Testing infrastructure
● Installable!
Slide 27
Slide 27 text
Goal: Create functions to detect BOPS fraud
How to package?
Packaging a research project
Slide 28
Slide 28 text
Goal: Create functions to detect BOPS fraud
How to package?
● Understand who will use the package
Packaging a research project
Slide 29
Slide 29 text
Goal: Create functions to detect BOPS fraud
How to package?
● Understand who will use the package
● Understand that other people will use
your package
Packaging a research project
Slide 30
Slide 30 text
Goal: Create functions to detect BOPS fraud
How to package?
● Understand who will use the package
● Understand people will use your
package
● Handle namespaces
Packaging a research project
Slide 31
Slide 31 text
Into the riskiverse
Slide 32
Slide 32 text
Deploy
Slide 33
Slide 33 text
● Start simple: run locally and manually
to test effects ☕
Deploying the package
Slide 34
Slide 34 text
● Start simple: run locally and manually
to test effects ☕
● When we feel confident: send it to a
remote machine to run automatically
Deploying the package
Slide 35
Slide 35 text
Research to production
Packaging code
Add documentation,
tests, etc
Deploy
Start with weekly/daily basis
Offline rather than online
Not optimized for speed/scale
Exploratory analyses
Understand biz value
Produce example outputs Gradual ramp-up
Iterate &
evaluate
Slide 36
Slide 36 text
Research
Prioritizes new
insights, flexibility
R for prototyping
Development
Prioritizes re-use, stability,
scalability, speed
Analysis → build mode involves shifting mindsets -- not necessarily new tools!
Slide 37
Slide 37 text
Thank you for
your time!
Irene Steves @i_steves
Yogev Herz @yogevmh
Check out our tech blog! https://medium.com/riskified-technology