Slide 1

Slide 1 text

Prototyping with R packages Irene Steves & Yogev Herz @i_steves @yogevmh 2020-01-16, R Meet-Up, Tel Aviv

Slide 2

Slide 2 text

● Data Science Analytics team at Riskified ● Ecology & evolutionary biology background ● Fans of Bob the dog About us

Slide 3

Slide 3 text

Recover Auth Rate Optimization Bank Relationships Deco Account Protection Chargeback Guarantee Representment Fraud Review Checkout Authorization Capture/Decline Login Riskified We use machine learning models to prevent fraud throughout the shopping journey

Slide 4

Slide 4 text

Data Science Analytics ● ~ 20 team members ● Key responsibilities: ○ Model training & retraining ○ Feature engineering ○ Research on recurring fraud themes (e.g. Account/Email Takeovers) ○ New product POC’s ● ❤ R

Slide 5

Slide 5 text

You don’t know what the end result will be You have an end result in mind What does it mean to put into production?

Slide 6

Slide 6 text

Research to production Exploratory analyses Packaging code Deploy Buy Online Pickup in Store

Slide 7

Slide 7 text

Exploratory analyses

Slide 8

Slide 8 text

Using scripts 1. Load needed packages 2. Functions & constants 3. Data ingest 4. Wrangling, plotting, stats FLEXIBLE FAST &

Slide 9

Slide 9 text

Using scripts Functions file ● Runs first ● Functions & constants ● Load (and install) necessary libraries Scripts ● Consistent names, numbered ● Sequential

Slide 10

Slide 10 text

Setting up a research project in R R-Project Queries & Data Functions & Scripts End goal: Reproducible research report

Slide 11

Slide 11 text

Case study: Fighting BOPS fraud ● Buy Online Pickup in Store ● Offered by many e-commerce merchants ● Appealing to customers because it is fast, frictionless and free

Slide 12

Slide 12 text

How does BOPS fraud work? BILLING NAME John Smith SHIPPING NAME Jane Smith Legitimate Order PICKUP Jane Smith BILLING NAME John Smith SHIPPING NAME Fraudy McFraudface Fraud PICKUP Fraudy McFraudface BILLING NAME John Smith SHIPPING NAME Frauddie J. McFrraudddface Recurring Fraud PICKUP Fraudy McFraudface

Slide 13

Slide 13 text

How much of the BOPS fraud is recurring fraud? William Bartley William Barrtleyy William Barrttley William Bartkey William Barrtley William BartleyyWilliam BartsleyWilliam Barttsley William Basrtley William Beartley William Bertley William Vartley

Slide 14

Slide 14 text

Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz Ernick Rodrigue Ernick Rodrigue Ernick Roddrifuez Ernick Rodriguex Ernickk Rodriguz Troy Holmes Troy J Holmes Troy Jesus Holmes Troy Junior Holm Troy Junior Holme Troy Jr. Holmes Troyy Holmes Nickki Washington Nicxole Washington Nickole Washington Nickii Washington

Slide 15

Slide 15 text

Matching names to identities

Slide 16

Slide 16 text

Matching names to identities Troy Holmes Ernick Rodrigue Ernick Rodrigue Troy J Holmes Troy Jesus Holmes Nickki Washington Nicxole Washington Troy Junior Holm Troy Junior Holme Ernick Roddrifuez Nickole Washington Troy Jr. Holmes Nickii Washington Troyy Holmes Ernick Rodriguex Ernickk Rodriguz

Slide 17

Slide 17 text

Matching names to identities 1 2 3 4 5 6 7 8 9 10 2 0.54 3 0.54 0.00 4 0.05 0.56 0.56 5 0.21 0.50 0.50 0.17 6 0.63 0.42 0.42 0.61 0.51 7 0.51 0.47 0.47 0.50 0.52 0.20 8 0.24 0.51 0.51 0.20 0.22 0.63 0.60 9 0.20 0.46 0.46 0.17 0.20 0.64 0.60 0.02 10 0.55 0.08 0.08 0.57 0.51 0.46 0.54 0.56 0.51 11 0.51 0.43 0.43 0.50 0.52 0.16 0.04 0.60 0.60 0.51

Slide 18

Slide 18 text

Matching names to identities

Slide 19

Slide 19 text

Matching names to identities

Slide 20

Slide 20 text

BOPS research task results ● A method for reliably clustering names into entities

Slide 21

Slide 21 text

BOPS research task results ● A method for reliably clustering names into entities ● An estimate of problem severity

Slide 22

Slide 22 text

BOPS research task results ● A method for reliably clustering names into entities ● An estimate of problem severity ● Insights into fraud patterns

Slide 23

Slide 23 text

Packaging code

Slide 24

Slide 24 text

Using scripts Challenges ● Documentation via comments ● Dependencies on external packages not rigorously checked ● Often shared via copy & paste ● Filepath issues ● Usually not maintained

Slide 25

Slide 25 text

Why a package? ● Easy to get started, especially with devtools & usethis helpers ● Accessible documentation ● Keeps functions & dependencies organized ● Installable! ● Testing infrastructure

Slide 26

Slide 26 text

Goal: Create functions to detect BOPS fraud How to package? Packaging a research project

Slide 27

Slide 27 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package Packaging a research project

Slide 28

Slide 28 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package ● Understand people will use your package Packaging a research project

Slide 29

Slide 29 text

Goal: Create functions to detect BOPS fraud How to package? ● Understand who will use the package ● Understand people will use your package ● Handle namespaces Packaging a research project

Slide 30

Slide 30 text

Deploy

Slide 31

Slide 31 text

● Start simple: run locally and manually to test effects ☕ ● When we feel confident: send it to a remote machine to run automatically Deploying the package

Slide 32

Slide 32 text

Research to production Exploratory analyses ● Understand business value ● Example outputs Packaging code ● Add documentation, tests, etc Deploy ● Start with weekly/daily basis ● Offline rather than online ● Not optimized for speed/scale Gradual ramp-up Iterate & evaluate

Slide 33

Slide 33 text

R for prototyping Analysis → build mode involves shifting mindsets -- not necessarily new tools! New insights Flexibility Re-use Stability

Slide 34

Slide 34 text

Thank you for your time! Irene Steves @i_steves Yogev Herz @yogevmh Check out our tech blog! https://medium.com/riskified-technology

Slide 35

Slide 35 text

https://xkcd.com/2054/