Scoring Models and Weight of Evidence (WoE)

Scoring Modelling WoE (Weight of Evidence) Jurek Gurycz, PhD, Senior
Solutions Consultant, Dell Nuno Cruz Antonio, Manager and Data Scientist, Dell Data Science London Meetup, 18.09.2014

2 Agenda 7. Questions And Answers 1. Brief Overview 2.
(Credit) Scoring 3. Financial Process 4. Statistical Methods 5.New Methods 6. Case Studies

3 Scoring 3

4 Problem statement • Good guys & Bad guys –
Default/Non-Default, Fraud/Non-Fraud, Dead/Alive, Response/No-Response • Decision to make • Uncertain outcome • Something to lose, something to gain, and something to optimize • Data driven, historical behaviour available • Time, change, stability, adjustments • Model using scores assigned to cases

5 What do you use Scorecards for? • Credit Scoring
– Are applicants a good credit risk or a bad credit risk? • Insurance – Is a claim fraudulent? • Marketing – Is a contact likely to produce a sale? • Medicine – Is a patient likely to be readmitted to the hospital? • Churn Analysis – Will a customer leave or stay?

6 1941 40’s 1950s 1963 1960s 1974 “unless empirically derived
and statistically valid” Research Linear discrimination applied to credit Rules for Finance Houses Credit Cards (UK) 62 Diners Club 63 Amex 66 Barclaycard B. Fair E. Isaac consultancy Computing power A: Adoption of empirical based brute force instead of rules R: default rates drop US Equal Credit Opportunity Act(s) “no discrimination on sex, religion, race 1980 Logistic regression applied to credit scoring Scoring introduced to consumer lending 1980s 2004 UK card expenditure exceeds cash 1980s 90s Scoring models applied to other industries Big Data Social Data 2010s Boosting Reason Scores Uplift Modelling 1960s History of (consumer) credit scoring Trees NeuralNets Nonlinear Retail Credit Company (1899) changes name to Equifax 1975 Next Challenge

7 Big Data Projects (study of 600) – who uses
big data ? Source: Operationalizing the Buzz Report 2013 Enterprise Management Associates, Shawn Rogers Full report: http://www.9sight.com/BigData_2013_Survey.pdf

8 Four typical decision processes that benefit from predictive analytics.
Financial Services Process with Scoring Components

9 Model and Decision Lifecycle Data Preparation Model Building Model
Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring

10 Tools, Tools, Tools are needed to build and evaluate
models.. … don’t forget the engine .. Data Manipulation Management & Operationalization Visualization Advanced analytics Queries, Data Management Toolboxes Libraries Ideas and Algorithms Statistics!

11 Why do we do that ? • Fame ?
• Fun ? • Save the world? • … Fortune .. • Customer Insight .. • Business Insight ..

12 Why do we do that ?

13 Building predictive (score) models • Sophisticated data mining tool
with modern algorithms • Predictive modeling with use of text and data mining for use for analytical purposes and build statistical & policy models (credit, collection, insurance, churn etc.) Integration Administration of reports, models flows, users, dashboards Data and model management Scoring (calculation) Deploy score models Monitoring Reporting • Complete model flows managed by Risk Management resources (without IT resources) • Administrate models flows through environments: Development, Test, acceptance and production • Real time (1 by 1) • Near real time (small batches) • Batch (Big Data) • Automatic monitoring of processes and critical parameters at the same time. •Basel reports, WoE, capital allocation, Pillar II&III •Scorecard monitoring reports •Portfolio reporting (delinquency, roll rate etc) • Webservice/SOAP/ • OLEDB to databases • Access control (different users different access) • Version control (roll back, compare) • Change log • Automatic model and report documentation • Manage complete workflows (inc db connections, data preparation, policy rules, scoring). • Off-load tasks to the Enterprise server Scoring Server Model Builder Monitoring and alerting Server Model Management Decisioning Platform Reporter Functionality Description Tool Purpose/problem Credit risk Fraud detection Asset & Liability Man. Compliance Market Risk Fraud risk Customer Acquisition Customer retention Role-based security Operational risk

14 WoE 14

15 • Weight of Evidence (WOE) is a method of
grouping the data in a continuous predictor variable (like vehicle age) in a way that maximizes that variable’s ability to distinguish between the two possibilities, e.g. fraud or not fraud, in the target variable. • It allows us to create a fairly precise model (with large lift), that simultaneously provides simple rules that can be easily communicated to employees, managers, customers, and regulators. Weight of Evidence (WOE)

16 • The overall idea lies on dividing the data
into subgroups so that we can maximize the difference in two classes of the response-variable among groups. • But how do we measure, as an example, the “difference in fraud” among groups? We use a measure called “Information Value” (IV), which is based on an intermediate calculation called the “Weight of Evidence” or “WoE”.

17 • There are different ways of making this binning.
We can select the model that is most appropriate for the current needs. More complex models will have a larger Information Value (IV). And a larger Information Value roughly corresponds to more lift.

18 Why calculate multiple models of varying complexity? Why not
always use the model with the largest Information Value (IV)? • Sometimes it is better to trade accuracy for simplicity. • Or maybe some models violate rules against using certain criteria e.g. gender or customer age, to price a product. • Possible considerations are: – The need for rules that can be easily explained to employees, managers, customers, and regulators. – Legal requirements to treat genders and ethnicities equally. – The list of possibilities is endless. • A more complex rule, with a bigger Information Value (IV), may violate some of these requirements. And so a simpler rule may be needed instead.

19 The theory of information “The fundamental problem of communication
is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities.” Shannon, C. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656. But where does all this come from ?

20 Information Theory can be defined as the branch of
mathematics that formalizes the study of “how to move a message from A to B” = − � log =1 H(X) (Entropy) provides a measure of the "quantity of information" provided by the message X

21 A very common usage of this concepts lies on
the Shannon index, widely used in Ecology. It tries to quantify the diversity of species (or other taxonomical unit) in a certain study area.

22 Information value of x for measuring y: quantifying the
predictive power of x in explaining y. Example: If y is binary (0,1), we can divide the population in 10 equal parts after sorting the data bucket by x, and calculate the deciles. = � − 𝑔 10 =1 𝑔

23 = ∑ 𝒏 = ∑ 𝒏 = Weight of
Evidence The Weight of Evidence or WoE value is a widely used measure of the “strength” of a grouping for separating good (G) and bad (B) risk (default).

24 • The response-variable is binary (good/bad, fraud/no fraud,…) •
Consider a simple example where one of our explanatory variables is divided into 2 groups. • Define the Distribution of Goods (DistrGoods) for a group to be the ratio of goods in that group to all goods in all groups. In our fraud example, good = no fraud. • Similarly, the Distribution of Bads (DistrBads) for a group is the ratio of bads in that group to all bads in all groups. In our fraud example, bad = fraud. • Then WOE for each group = 100 * ln(DistrGoods/DistrBads). Let’s make this more operational…

25 • Let’s see how this works with a simple
“Monotone” model. • Total goods = 199 + 1857 = 2056 • Total bads = 525 + 151 = 676 • Group 1: • DistrGoods = 199/2056 = 0.0968 and DistrBads = 525/676 = 0.7766 • WOE = 100 * ln(0.0968/0.7766) = -208.23 (rounding difference) IV and WoE

26 • We calculate IV like this: IV = (DistrGoods
– DistrBads) * ln(DistrGoods/DistrBads) IV = (DistrGoods – DistrBads) * WOE/100 • Why? –We want groups that distinguish between good and bad, e.g. fraud and not fraud. –So for each group, we want that group’s proportion of the total goods to be different than its proportion of the total bads. The bigger that difference is, the better. –It’s all about the “amount of information”…

27 • Groups with equal proportions of the total goods
(DistrGoods) and total bads (DistrBads) do not tell us anything. • Recall, WOE = 100 * ln(DistrGoods/DistrBads). • So DistrGoods = DistrBads => WOE = 100 * ln(1) = 0. • So WOE = 0 means membership in that group tells us nothing. Conversely, the farther WOE is from zero, the more membership in that group tells us. • But remember, we do not care if DistrGoods is bigger than DistrBads or vice versa. We only care that they are different.

28 • WOE = 100 * ln(DistrGoods/DistrBads). • If DistrGoods
= DistrBads, the ratio is = 1 => ln() = 0 => WOE = 0. • If DistrGoods > DistrBads, the ratio is > 1 => ln() > 0 => WOE > 0. • If DistrGoods < DistrBads, the ratio is < 1 => ln() < 0 => WOE < 0.

29 Please show me some graphs…

30 Histogram of Months Acct; categorized by Credit Standing Months
Acct No of obs 5.0 10.2 15.5 20.7 25.9 31.2 36.4 41.6 46.8 52.1 57.3 62.5 67.8 73.0 0 10 20 30 40 50 60 Credit Standing: Good Credit Standing: Bad Starting point

31 = ∑ 𝒏 = ∑ 𝒏 =

32 WoE trend for variable:Months Acct (-inf,10> (10,13> (13,19> (19,25>
(25,37> (40,inf) -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 WoE weight of evidence Variable categorized by WoE

33 The process leads into a scorecard Dictionary • Characteristics
• Attributes

34 Why use the Weight of Evidence? Converts a measure
of risk into a linear scale => Easier for humans to interpret. “Bucketing” the variables makes them easier to implement on a scorecard Makes the variables comparable because they are all coded to the same scale. Very useful for binning.

35 What are the challenges? • Finding the right binning
• Build a Good Model • Monitor Performance • Keep Compliance • Integrate Policy Rules • Time, Time, Time – Model Development – Implementation – LOB enablement, real time scoring – Model stability monitoring

37 Model Evaluation

38 Scorecard Building Process Overview • Data Preparation – Feature
Selection – Interactions and Rules – Attribute Building • Modeling – Scorecard Preparation – Survival – Reject Inference • Evaluation and Calibration • Model Evaluation • Cut-off Point Selection • Score Cases • Calibration Tests • Monitoring • Population Stability • Deployment

39 Example Scorecard

40 AR and KGB .. Accept Reject Known Good Bad
Rejects 1,850 Through – -the-Door 10,000 Accepts 8,150 Goods 7,167 Bads 983

41 AR and KGB modelling – example project

42 Cut-off Point Selection • Use profit to determine a
cut-off point.

43 NonParametric Methods 43

44 K Nearest neighbours

45 Support Vector Machines

46 Boosted Trees

47 Do they work any better ? • PAKDD 2010
Credit Risk Modelling Competitions, win, Grzegorz Haranczyk, StatSoft – Combination of Boosted Trees and Optimization “Predictive model optimization by searching parameter space” • International Computational Intelligence Algorithm Competition (CIAC), co-organized by NeuroTech S.A – Knut Opdal, StatSoft, Boosting Trees, NNs – “Benchmarking of different classes of models used for credit scoring” • CRC 2013 – Thomas Hill, Vladimir Rastunkov, Knut Opdal, “General Approximators for Credit Scoring – Practical Considerations” http://www.business-school.ed.ac.uk/waf/crc_archive/2013/27.pdf • There also DO exist practical implementations of general approximators

48 Challengers … How companies are evaluated today.. High Low
Partial Complete Challengers Technology masters Vision Ability to execute Leaders Home Players Visionaries Ideas and Inventions

49 Dell has a comprehensive, agnostic, and modular portfolio Services:
IT/business alignment, infrastructure readiness, analytics maturity, performance measurements Infrastructure Management Integration BI and data discovery Advanced analytics Data platforms: Oracle, SQL Server, Hadoop, MongoDB, IBM DB2, and more Storage, servers, networking TOAD, Hadoop Statistica Toad BI Kitenga (Big Data, in/out) Boomi, SharePlex

50 Model and Decision Lifecycle – Time Flies Data Preparation
Model Building Model Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring

51 Case Study: Danske Bank Validated Modeling Challenges  Danske
Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model lifecycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion

52 Case Study: Danske Bank Validated Modeling Challenges  Danske
Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model lifecycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion

53 Case Study: Premiership League - CRM Propensity Modeling Challenges
 Need to understand supporters purchasing behaviour  Insights to drive marketing campaigns  Budget constraints  Propensity modelling based on historical supported activity and external data (demographic)  Segmentation  Propensity model built by StatSoft consultant  Accepted by client and tested in marketing campaigns  Initial project for Arsenal FC  Follow-up analysis for 12 premiership football clubs  Sports Alliance developed a statistical product add-on for CRM Challenges Solutions Result Company Profile  Founded: 2010  Employees: undisclosed  Headquarters: London  SME

54 Case Study: - Grupo Nelson Paschoalotto (Brazil) Propensity Modeling
Challenges  Credit Recovery Success Rate  Optimal Resource Usage  Scoring propensity of responding positively to various credit recovery methods  Solution still in development Challenges Solutions Result Company Profile  Founded: 1996  Employees: 10,000  Headquarters: Brazil  Revenue: Large  Main Business: Credit Recovery

References L. Thomas, D. Edelman, J. Crook – Credit Scoring
and Its Applications, siam 2002 N. Siddiqi – Credit Risk Scorecards, John Willey, 2006 Statistica Scorecard Formula Guide http://documentation.statsoft.com/portals/0/formula%20guide/STATISTICA%20Scorecard%20Formula%20Guide.pdf

propaganda • Yes, we can confirm that Dell is hiring
for Data Scientists roles in London …. in the area of risk and compliance for banking and financial services Lunch & Learn Case Studies, in London, by Dell Software www.statsoft.co.uk/event/nextlunch Upcoming Event: 30th of September, London, EC2M 1JH, from 12:45 “Improving Patient Care with Predictive Analytics” How a Macular Unit is using advanced analytics to monitor and analyze AMD data (Age-Related Macular Degeneration) in clinical practice and clinical audits To book your place http://www.eventbrite.com/e/lunch-learn-case-study-improving-patient-care-with-predictive-analytics-tickets-12855492123

Thank You Jurek_Gurycz @dell.com [email protected] Dell Information Management Group Advanced
Analytics Statistica

Questions and Answers Jurek_Gurycz @dell.com [email protected]

Scoring Models and Weight of Evidence (WoE)

Scoring Models and Weight of Evidence (WoE)

More Decks by Data Science London

Other Decks in Technology

Featured

Transcript