Scoring Models and Weight of Evidence (WoE)

Slide 1

Slide 1 text

Scoring Modelling WoE (Weight of Evidence) Jurek Gurycz, PhD, Senior Solutions Consultant, Dell Nuno Cruz Antonio, Manager and Data Scientist, Dell Data Science London Meetup, 18.09.2014

Slide 2

Slide 2 text

2 Agenda 7. Questions And Answers 1. Brief Overview 2. (Credit) Scoring 3. Financial Process 4. Statistical Methods 5.New Methods 6. Case Studies

Slide 3

Slide 3 text

3 Scoring 3

Slide 4

Slide 4 text

4 Problem statement • Good guys & Bad guys – Default/Non-Default, Fraud/Non-Fraud, Dead/Alive, Response/No-Response • Decision to make • Uncertain outcome • Something to lose, something to gain, and something to optimize • Data driven, historical behaviour available • Time, change, stability, adjustments • Model using scores assigned to cases

Slide 5

Slide 5 text

5 What do you use Scorecards for? • Credit Scoring – Are applicants a good credit risk or a bad credit risk? • Insurance – Is a claim fraudulent? • Marketing – Is a contact likely to produce a sale? • Medicine – Is a patient likely to be readmitted to the hospital? • Churn Analysis – Will a customer leave or stay?

Slide 6

Slide 6 text

6 1941 40’s 1950s 1963 1960s 1974 “unless empirically derived and statistically valid” Research Linear discrimination applied to credit Rules for Finance Houses Credit Cards (UK) 62 Diners Club 63 Amex 66 Barclaycard B. Fair E. Isaac consultancy Computing power A: Adoption of empirical based brute force instead of rules R: default rates drop US Equal Credit Opportunity Act(s) “no discrimination on sex, religion, race 1980 Logistic regression applied to credit scoring Scoring introduced to consumer lending 1980s 2004 UK card expenditure exceeds cash 1980s 90s Scoring models applied to other industries Big Data Social Data 2010s Boosting Reason Scores Uplift Modelling 1960s History of (consumer) credit scoring Trees NeuralNets Nonlinear Retail Credit Company (1899) changes name to Equifax 1975 Next Challenge

Slide 7

Slide 7 text

7 Big Data Projects (study of 600) – who uses big data ? Source: Operationalizing the Buzz Report 2013 Enterprise Management Associates, Shawn Rogers Full report: http://www.9sight.com/BigData_2013_Survey.pdf

Slide 8

Slide 8 text

8 Four typical decision processes that benefit from predictive analytics. Financial Services Process with Scoring Components

Slide 9

Slide 9 text

9 Model and Decision Lifecycle Data Preparation Model Building Model Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring

Slide 10

Slide 10 text

10 Tools, Tools, Tools are needed to build and evaluate models.. … don’t forget the engine .. Data Manipulation Management & Operationalization Visualization Advanced analytics Queries, Data Management Toolboxes Libraries Ideas and Algorithms Statistics!

Slide 11

Slide 11 text

11 Why do we do that ? • Fame ? • Fun ? • Save the world? • … Fortune .. • Customer Insight .. • Business Insight ..

Slide 12

Slide 12 text

12 Why do we do that ?

Slide 13

Slide 13 text

13 Building predictive (score) models • Sophisticated data mining tool with modern algorithms • Predictive modeling with use of text and data mining for use for analytical purposes and build statistical & policy models (credit, collection, insurance, churn etc.) Integration Administration of reports, models flows, users, dashboards Data and model management Scoring (calculation) Deploy score models Monitoring Reporting • Complete model flows managed by Risk Management resources (without IT resources) • Administrate models flows through environments: Development, Test, acceptance and production • Real time (1 by 1) • Near real time (small batches) • Batch (Big Data) • Automatic monitoring of processes and critical parameters at the same time. •Basel reports, WoE, capital allocation, Pillar II&III •Scorecard monitoring reports •Portfolio reporting (delinquency, roll rate etc) • Webservice/SOAP/ • OLEDB to databases • Access control (different users different access) • Version control (roll back, compare) • Change log • Automatic model and report documentation • Manage complete workflows (inc db connections, data preparation, policy rules, scoring). • Off-load tasks to the Enterprise server Scoring Server Model Builder Monitoring and alerting Server Model Management Decisioning Platform Reporter Functionality Description Tool Purpose/problem Credit risk Fraud detection Asset & Liability Man. Compliance Market Risk Fraud risk Customer Acquisition Customer retention Role-based security Operational risk

Slide 14

Slide 14 text

14 WoE 14

Slide 15

Slide 15 text

15 • Weight of Evidence (WOE) is a method of grouping the data in a continuous predictor variable (like vehicle age) in a way that maximizes that variable’s ability to distinguish between the two possibilities, e.g. fraud or not fraud, in the target variable. • It allows us to create a fairly precise model (with large lift), that simultaneously provides simple rules that can be easily communicated to employees, managers, customers, and regulators. Weight of Evidence (WOE)

Slide 16

Slide 16 text

16 • The overall idea lies on dividing the data into subgroups so that we can maximize the difference in two classes of the response-variable among groups. • But how do we measure, as an example, the “difference in fraud” among groups? We use a measure called “Information Value” (IV), which is based on an intermediate calculation called the “Weight of Evidence” or “WoE”.

Slide 17

Slide 17 text

17 • There are different ways of making this binning. We can select the model that is most appropriate for the current needs. More complex models will have a larger Information Value (IV). And a larger Information Value roughly corresponds to more lift.

Slide 18

Slide 18 text

18 Why calculate multiple models of varying complexity? Why not always use the model with the largest Information Value (IV)? • Sometimes it is better to trade accuracy for simplicity. • Or maybe some models violate rules against using certain criteria e.g. gender or customer age, to price a product. • Possible considerations are: – The need for rules that can be easily explained to employees, managers, customers, and regulators. – Legal requirements to treat genders and ethnicities equally. – The list of possibilities is endless. • A more complex rule, with a bigger Information Value (IV), may violate some of these requirements. And so a simpler rule may be needed instead.

Slide 19

Slide 19 text

19 The theory of information “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities.” Shannon, C. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656. But where does all this come from ?

Slide 20

Slide 20 text

20 Information Theory can be defined as the branch of mathematics that formalizes the study of “how to move a message from A to B” = − � log =1 H(X) (Entropy) provides a measure of the "quantity of information" provided by the message X

Slide 21

Slide 21 text

21 A very common usage of this concepts lies on the Shannon index, widely used in Ecology. It tries to quantify the diversity of species (or other taxonomical unit) in a certain study area.

Slide 22

Slide 22 text

22 Information value of x for measuring y: quantifying the predictive power of x in explaining y. Example: If y is binary (0,1), we can divide the population in 10 equal parts after sorting the data bucket by x, and calculate the deciles. = � − 𝑔 10 =1 𝑔

Slide 23

Slide 23 text

23 = ∑ 𝒏 = ∑ 𝒏 = Weight of Evidence The Weight of Evidence or WoE value is a widely used measure of the “strength” of a grouping for separating good (G) and bad (B) risk (default).

Slide 24

Slide 24 text

24 • The response-variable is binary (good/bad, fraud/no fraud,…) • Consider a simple example where one of our explanatory variables is divided into 2 groups. • Define the Distribution of Goods (DistrGoods) for a group to be the ratio of goods in that group to all goods in all groups. In our fraud example, good = no fraud. • Similarly, the Distribution of Bads (DistrBads) for a group is the ratio of bads in that group to all bads in all groups. In our fraud example, bad = fraud. • Then WOE for each group = 100 * ln(DistrGoods/DistrBads). Let’s make this more operational…

Slide 25

Slide 25 text

25 • Let’s see how this works with a simple “Monotone” model. • Total goods = 199 + 1857 = 2056 • Total bads = 525 + 151 = 676 • Group 1: • DistrGoods = 199/2056 = 0.0968 and DistrBads = 525/676 = 0.7766 • WOE = 100 * ln(0.0968/0.7766) = -208.23 (rounding difference) IV and WoE

Slide 26

Slide 26 text

26 • We calculate IV like this: IV = (DistrGoods – DistrBads) * ln(DistrGoods/DistrBads) IV = (DistrGoods – DistrBads) * WOE/100 • Why? –We want groups that distinguish between good and bad, e.g. fraud and not fraud. –So for each group, we want that group’s proportion of the total goods to be different than its proportion of the total bads. The bigger that difference is, the better. –It’s all about the “amount of information”…

Slide 27

Slide 27 text

27 • Groups with equal proportions of the total goods (DistrGoods) and total bads (DistrBads) do not tell us anything. • Recall, WOE = 100 * ln(DistrGoods/DistrBads). • So DistrGoods = DistrBads => WOE = 100 * ln(1) = 0. • So WOE = 0 means membership in that group tells us nothing. Conversely, the farther WOE is from zero, the more membership in that group tells us. • But remember, we do not care if DistrGoods is bigger than DistrBads or vice versa. We only care that they are different.

Slide 28

Slide 28 text

28 • WOE = 100 * ln(DistrGoods/DistrBads). • If DistrGoods = DistrBads, the ratio is = 1 => ln() = 0 => WOE = 0. • If DistrGoods > DistrBads, the ratio is > 1 => ln() > 0 => WOE > 0. • If DistrGoods < DistrBads, the ratio is < 1 => ln() < 0 => WOE < 0.

Slide 29

Slide 29 text

29 Please show me some graphs…

Slide 30

Slide 30 text

30 Histogram of Months Acct; categorized by Credit Standing Months Acct No of obs 5.0 10.2 15.5 20.7 25.9 31.2 36.4 41.6 46.8 52.1 57.3 62.5 67.8 73.0 0 10 20 30 40 50 60 Credit Standing: Good Credit Standing: Bad Starting point

Slide 31

Slide 31 text

31 = ∑ 𝒏 = ∑ 𝒏 =

Slide 32

Slide 32 text

32 WoE trend for variable:Months Acct (-inf,10> (10,13> (13,19> (19,25> (25,37> (40,inf) -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 WoE weight of evidence Variable categorized by WoE

Slide 33

Slide 33 text

33 The process leads into a scorecard Dictionary • Characteristics • Attributes

Slide 34

Slide 34 text

34 Why use the Weight of Evidence? Converts a measure of risk into a linear scale => Easier for humans to interpret. “Bucketing” the variables makes them easier to implement on a scorecard Makes the variables comparable because they are all coded to the same scale. Very useful for binning.

Slide 35

Slide 35 text

35 What are the challenges? • Finding the right binning • Build a Good Model • Monitor Performance • Keep Compliance • Integrate Policy Rules • Time, Time, Time – Model Development – Implementation – LOB enablement, real time scoring – Model stability monitoring

Slide 36

Slide 36 text

37 Model Evaluation

Slide 37

Slide 37 text

38 Scorecard Building Process Overview • Data Preparation – Feature Selection – Interactions and Rules – Attribute Building • Modeling – Scorecard Preparation – Survival – Reject Inference • Evaluation and Calibration • Model Evaluation • Cut-off Point Selection • Score Cases • Calibration Tests • Monitoring • Population Stability • Deployment

Slide 38

Slide 38 text

39 Example Scorecard

Slide 39

Slide 39 text

40 AR and KGB .. Accept Reject Known Good Bad Rejects 1,850 Through – -the-Door 10,000 Accepts 8,150 Goods 7,167 Bads 983

Slide 40

Slide 40 text

41 AR and KGB modelling – example project

Slide 41

Slide 41 text

42 Cut-off Point Selection • Use profit to determine a cut-off point.

Slide 42

Slide 42 text

43 NonParametric Methods 43

Slide 43

Slide 43 text

44 K Nearest neighbours

Slide 44

Slide 44 text

45 Support Vector Machines

Slide 45

Slide 45 text

46 Boosted Trees

Slide 46

Slide 46 text

47 Do they work any better ? • PAKDD 2010 Credit Risk Modelling Competitions, win, Grzegorz Haranczyk, StatSoft – Combination of Boosted Trees and Optimization “Predictive model optimization by searching parameter space” • International Computational Intelligence Algorithm Competition (CIAC), co-organized by NeuroTech S.A – Knut Opdal, StatSoft, Boosting Trees, NNs – “Benchmarking of different classes of models used for credit scoring” • CRC 2013 – Thomas Hill, Vladimir Rastunkov, Knut Opdal, “General Approximators for Credit Scoring – Practical Considerations” http://www.business-school.ed.ac.uk/waf/crc_archive/2013/27.pdf • There also DO exist practical implementations of general approximators

Slide 47

Slide 47 text

48 Challengers … How companies are evaluated today.. High Low Partial Complete Challengers Technology masters Vision Ability to execute Leaders Home Players Visionaries Ideas and Inventions

Slide 48

Slide 48 text

49 Dell has a comprehensive, agnostic, and modular portfolio Services: IT/business alignment, infrastructure readiness, analytics maturity, performance measurements Infrastructure Management Integration BI and data discovery Advanced analytics Data platforms: Oracle, SQL Server, Hadoop, MongoDB, IBM DB2, and more Storage, servers, networking TOAD, Hadoop Statistica Toad BI Kitenga (Big Data, in/out) Boomi, SharePlex

Slide 49

Slide 49 text

50 Model and Decision Lifecycle – Time Flies Data Preparation Model Building Model Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring

Slide 50

Slide 50 text

51 Case Study: Danske Bank Validated Modeling Challenges  Danske Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model lifecycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion

Slide 51

Slide 51 text

52 Case Study: Danske Bank Validated Modeling Challenges  Danske Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model lifecycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion

Slide 52

Slide 52 text

53 Case Study: Premiership League - CRM Propensity Modeling Challenges  Need to understand supporters purchasing behaviour  Insights to drive marketing campaigns  Budget constraints  Propensity modelling based on historical supported activity and external data (demographic)  Segmentation  Propensity model built by StatSoft consultant  Accepted by client and tested in marketing campaigns  Initial project for Arsenal FC  Follow-up analysis for 12 premiership football clubs  Sports Alliance developed a statistical product add-on for CRM Challenges Solutions Result Company Profile  Founded: 2010  Employees: undisclosed  Headquarters: London  SME

Slide 53

Slide 53 text

54 Case Study: - Grupo Nelson Paschoalotto (Brazil) Propensity Modeling Challenges  Credit Recovery Success Rate  Optimal Resource Usage  Scoring propensity of responding positively to various credit recovery methods  Solution still in development Challenges Solutions Result Company Profile  Founded: 1996  Employees: 10,000  Headquarters: Brazil  Revenue: Large  Main Business: Credit Recovery

Slide 54

Slide 54 text

References L. Thomas, D. Edelman, J. Crook – Credit Scoring and Its Applications, siam 2002 N. Siddiqi – Credit Risk Scorecards, John Willey, 2006 Statistica Scorecard Formula Guide http://documentation.statsoft.com/portals/0/formula%20guide/STATISTICA%20Scorecard%20Formula%20Guide.pdf

Slide 55

Slide 55 text

propaganda • Yes, we can confirm that Dell is hiring for Data Scientists roles in London …. in the area of risk and compliance for banking and financial services Lunch & Learn Case Studies, in London, by Dell Software www.statsoft.co.uk/event/nextlunch Upcoming Event: 30th of September, London, EC2M 1JH, from 12:45 “Improving Patient Care with Predictive Analytics” How a Macular Unit is using advanced analytics to monitor and analyze AMD data (Age-Related Macular Degeneration) in clinical practice and clinical audits To book your place http://www.eventbrite.com/e/lunch-learn-case-study-improving-patient-care-with-predictive-analytics-tickets-12855492123

Slide 56

Slide 56 text

Thank You Jurek_Gurycz @dell.com [email protected] Dell Information Management Group Advanced Analytics Statistica