Slide 1

Slide 1 text

Prototyping with R packages Irene Steves & Yogev Herz @i_steves @yogevmh 2020-08-19, R-Ladies Amsterdam

Slide 2

Slide 2 text

● Data Science & Research department at Riskified, based in Tel Aviv ● Ecology & evolutionary biology background ● Fans of Bob the dog About us Theoretical overview Fraud case-study

Slide 3

Slide 3 text

Riskified e-Commerce fraud prevention for online merchants: verify orders at checkout and take liability for bad decisions

Slide 4

Slide 4 text

Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine learning models to prevent fraud throughout the shopping process

Slide 5

Slide 5 text

What does it mean to put into production?

Slide 6

Slide 6 text

Research Code that answers questions and delivers ideas What does it mean to put into production? Development Code that takes an input and consistently produces an output

Slide 7

Slide 7 text

Exploratory analyses Research to production Packaging code Deploy

Slide 8

Slide 8 text

Exploratory analyses

Slide 9

Slide 9 text

Using scripts 1. Load needed packages 2. Functions & constants 3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &

Slide 10

Slide 10 text

Using scripts Functions file ● Runs first ● Functions & constants ● Load (and install) necessary libraries Scripts ● Consistent names, numbered ● Sequential

Slide 11

Slide 11 text

Setting up a research project in R R-Project Queries & Data Functions & Scripts End goal: Reproducible research report

Slide 12

Slide 12 text

Case study: Fighting BOPS fraud ● Buy Online Pickup in Store ● Offered by many e-commerce merchants ● Appealing to customers because it is fast, frictionless and free

Slide 13

Slide 13 text

How does BOPS fraud work? BILLING NAME John Smith SHIPPING NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface

Slide 14

Slide 14 text

How much of the BOPS fraud is recurring fraud? William Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley

Slide 15

Slide 15 text

Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington

Slide 16

Slide 16 text

Matching names to identities

Slide 17

Slide 17 text

Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz

Slide 18

Slide 18 text

Matching names to identities 1 2 3 4 5 6 7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51

Slide 19

Slide 19 text

Matching names to identities

Slide 20

Slide 20 text

Matching names to identities

Slide 21

Slide 21 text

BOPS research task results ● A method for reliably clustering names into entities

Slide 22

Slide 22 text

BOPS research task results ● A method for reliably clustering names into entities ● An estimate of problem severity

Slide 23

Slide 23 text

BOPS research task results ● A method for reliably clustering names into entities ● An estimate of problem severity ● Insights into fraud patterns

Slide 24

Slide 24 text

Packaging code

Slide 25

Slide 25 text

Using scripts Challenges ● Documentation via comments ● Dependencies on external packages not rigorously checked ● Often shared via copy & paste ● Filepath issues ● Usually not maintained

Slide 26

Slide 26 text

Why a package? ● Easy to get started, especially with devtools & usethis helpers ● Accessible documentation ● Keeps functions & dependencies organized ● Testing infrastructure ● Installable!

Slide 27

Slide 27 text

Goal: Create functions to detect BOPS fraud How to package? Packaging a research project

Slide 28

Slide 28 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package Packaging a research project

Slide 29

Slide 29 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package ● Understand that other people will use your package Packaging a research project

Slide 30

Slide 30 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package ● Understand people will use your package ● Handle namespaces Packaging a research project

Slide 31

Slide 31 text

Into the riskiverse

Slide 32

Slide 32 text

Deploy

Slide 33

Slide 33 text

● Start simple: run locally and manually to test effects ☕ Deploying the package

Slide 34

Slide 34 text

● Start simple: run locally and manually to test effects ☕ ● When we feel confident: send it to a remote machine to run automatically Deploying the package

Slide 35

Slide 35 text

Research to production Packaging code Add documentation, tests, etc Deploy Start with weekly/daily basis Offline rather than online Not optimized for speed/scale Exploratory analyses Understand biz value Produce example outputs Gradual ramp-up Iterate & evaluate

Slide 36

Slide 36 text

Research Prioritizes new insights, flexibility R for prototyping Development Prioritizes re-use, stability, scalability, speed Analysis → build mode involves shifting mindsets -- not necessarily new tools!

Slide 37

Slide 37 text

Thank you for your time! Irene Steves @i_steves Yogev Herz @yogevmh Check out our tech blog! https://medium.com/riskified-technology

Slide 38

Slide 38 text

https://xkcd.com/2054/