Where's my money?

1 WHERE’S MY MONEY? MAY 8, 2018 PYDATA MEETUP AMSTERDAM
RUBEN VAN DE GEER QINGCHEN WANG SANDJAI BHULAI

Vrije Universiteit Amsterdam 2 Beta Faculteit / Analytics & Optimization
/ Where’s my money? INTRODUCTION: ABOUT THE AUTHORS § Final year PhD candidate @ Analytics & Optimization group Department of Mathematics, Vrije Universiteit Amsterdam § This talk concerns the paper: “Data-driven Consumer Debt Collection via Machine Learning and Approximate Dynamic Programming” § Joint work with Qingchen Wang and Prof. dr. Sandjai Bhulai.

/ Where’s my money? BACKGROUND § The percentage of households with overdue debt rose from 8.3% in 2008 to 17.9% in 2014 in the Netherlands. § Companies that rely on installment payments must manage the collection of payments efficiently. § Cole (1986): “Collection work would be easier and the results better, if there were some magic way in which each account could be immediately and accurately classified (...)”

/ Where’s my money? HOW DEBT COLLECTION WORKS § Typically, a client that has overdue payments for a long period of time (8-26 weeks) is placed “in collections”. § Can use a variety of tools to persuade debtors to settle: letters, e-mails, calls. § Ultimately, if the debtor still refuses to pay, a legal procedure can be invoked.

/ Where’s my money? STANDARD OPERATING PROCEDURE § Each letter communicates with increasing urgency the need to pay. § Phone calls can me made at any time. > Negotiate payment plans > Deferred payments § Day 0 Day 7 Day 21 Day 28 Day 14

/ Where’s my money? THE DEBT COLLECTOR’S PROBLEM At any point in time, with limited calling capacity: Which debtors should be called to maximize the total amount of debt collected in the long run? § Calling someone who was going to pay anyway wasted effort. § Not calling someone who is deliberating to pay non-repayment.

/ Where’s my money? OUR PAPER 1. Presents a framework that allows for data-driven optimization of the scheduling of outbound calls made by debt collectors. 2. Formulates the problem as a Markov Decision Process (MDP). 3. Uses gradient-boosted decision trees (lightgbm) for value function approximation of this MDP. 4. Identifies insights that can be linked to a more efficient collection process. 5. Validates the policy induced by our model in a controlled field experiment.

/ Where’s my money? OUR PAPER … is not about debt collection really, but about: Domain Knowledge + Data Supervised Learning Problem Optimization Problem Evaluation Deployment

/ Where’s my money? DATA DESCRIPTION I § Dataset provided by a Dutch collection agency > Handles +/- 250,000 collection cases each year (equivalent of +/- 120 mio. Euros) > Information on 80,138 debtors that arrived between January 1, 2014 and September 30, 2016. > All these debtors are clients of the same insurance company. § Current calling policy: static (business rules) § The dataset comprises four different data sources: > Debtor-specific information > for example: type of insurance, date of arrival, postal code, original debt, collection fee. > Log of communication with the debtor. > Log of incoming payments. > Log of status and substatus changes.

/ Where’s my money? DATA DESCRIPTION II

/ Where’s my money? MARKOV DECISION PROCESS § Markov decision processes provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker (Wiki) § xi = “state of debtor i” = { # phone calls made so far, $ debt outstanding, # days since arrival } § For example, at arrival: x i = {0, $100, 0}. Day State x i Action 0 {0, $100, 0} - 1 {0, $100, 1} - 2 {1, $100, 2} ☏ 3 {1, $0, 3} -

/ Where’s my money? STATE SPACE DEFINITION

/ Where’s my money? VALUE FUNCTION APPROXIMATION § Assume that “debtor C” has been in the process for T days. § Assume that “debtor A” and “debtor B” once were T days in the process as well [but are now closed cases]. # phone calls made until T $ debt outstanding at T outcome Debtor A after T days 4 100 1 Debtor B after T days 3 150 0 Debtor C after T days 3 (+1?) 125 ? X y § We train gradient-boosted decision trees (GBDT) using lightgbm.

/ Where’s my money? MODEL TRAINING § Split dataset into training and validation sets by debtor. > debtors who arrived between January 1st 2014 and July 19th 2015 are used for training > debtors who arrived between July 20th 2015 and September 30th 2016 are used for validation § For each debtor we consider their repayment as whether they have settled the debt in full prior to the stage of legal action (76.1%). § Train one GBDT model for each day since arrival T = 1,2,…,50. > Data becomes sparse after 50 days.

/ Where’s my money? DEBT REPAYMENT PREDICTION PERFORMANCE

/ Where’s my money? GBDT-INDUCED POLICY § Recap: using lightgbm we can quantify the value of a debtor being in a particular state. § Without much effort we can compute how this value changes if we make an extra phone call. § The marginal value of a phone call depends on the state, for example: > picked up the phone before? > # days since last outgoing call > promised to pay? > product type > … § Call the debtors which have the highest marginal value per phone call.

/ Where’s my money? CONTROLLED FIELD EXPERIMENT § Assigned 921 debtors that arrived in Jan-Feb 2018 randomly to either: > the incumbent (static) policy [466 assigned] > the data-driven policy [455 assigned] § Results: [incumbent vs data-driven] > +3.6% fully collected cases [275 cases vs 285 cases] > -1.89 days time until recovery [22.16 days vs 20.27 days] > +14.0% average recovery rate [57.0% vs 65.0%] > -21.4% phone calls made [1,355 calls vs 1,064 calls] > +45.6% amount collected per call [EUR/call 31.80 vs EUR/call 46.30]

/ Where’s my money? MANAGERIAL INSIGHTS § The data-driven policy outperforms the incumbent policy substantially. § How do the policies differ? > In general: the data-driven approach assigns more resources to “difficult” debtors. § The data-driven policy: > Calls more often to debtors that have been in the collection process longer. > Calls more often to debtors that have not been called recently. > Calls more often to debtors that have picked up the phone in the past. > Calls more often to debtors that have contacted the collector themselves in the past.

/ Where’s my money? CONCLUSION § Used machine learning to approximate a complex high-dimensional decision problem. § Provided the business with easy-to-understand tool. > Ranking based on “predicted repayment probability” § Validated the model by means of a controlled field experiment. § Will be submitting to INFORMS Management Science soon. § Future research: include textual comments added by the collector’s agents. § Thank you.

Where's my money?

Where's my money?

pydata

More Decks by pydata

Other Decks in Technology

Featured

Transcript

1 WHERE’S MY MONEY? MAY 8, 2018 PYDATA MEETUP AMSTERDAM

Vrije Universiteit Amsterdam 2 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 3 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 4 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 5 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 6 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 7 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 8 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 9 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 10 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 12 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 13 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 14 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 15 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 16 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 17 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 18 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 19 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 20 Beta Faculteit / Analytics & Optimization

Vrije Universiteit Amsterdam 21 Beta Faculteit / Analytics & Optimization