Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scoring Models and Weight of Evidence (WoE)

Scoring Models and Weight of Evidence (WoE)

Talk by Jurek Gurycz Sr. Solutions Consultant , Dell Analytics, Statistica & Nuno Antonio Data Scientist @Dell EMEA at Data Science London meetup @ds_ldn

Data Science London

September 23, 2014
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Scoring Modelling WoE (Weight of Evidence) Jurek Gurycz, PhD, Senior

    Solutions Consultant, Dell Nuno Cruz Antonio, Manager and Data Scientist, Dell Data Science London Meetup, 18.09.2014
  2. 2 Agenda 7. Questions And Answers 1. Brief Overview 2.

    (Credit) Scoring 3. Financial Process 4. Statistical Methods 5.New Methods 6. Case Studies
  3. 4 Problem statement • Good guys & Bad guys –

    Default/Non-Default, Fraud/Non-Fraud, Dead/Alive, Response/No-Response • Decision to make • Uncertain outcome • Something to lose, something to gain, and something to optimize • Data driven, historical behaviour available • Time, change, stability, adjustments • Model using scores assigned to cases
  4. 5 What do you use Scorecards for? • Credit Scoring

    – Are applicants a good credit risk or a bad credit risk? • Insurance – Is a claim fraudulent? • Marketing – Is a contact likely to produce a sale? • Medicine – Is a patient likely to be readmitted to the hospital? • Churn Analysis – Will a customer leave or stay?
  5. 6 1941 40’s 1950s 1963 1960s 1974 “unless empirically derived

    and statistically valid” Research Linear discrimination applied to credit Rules for Finance Houses Credit Cards (UK) 62 Diners Club 63 Amex 66 Barclaycard B. Fair E. Isaac consultancy Computing power A: Adoption of empirical based brute force instead of rules R: default rates drop US Equal Credit Opportunity Act(s) “no discrimination on sex, religion, race 1980 Logistic regression applied to credit scoring Scoring introduced to consumer lending 1980s 2004 UK card expenditure exceeds cash 1980s 90s Scoring models applied to other industries Big Data Social Data 2010s Boosting Reason Scores Uplift Modelling 1960s History of (consumer) credit scoring Trees NeuralNets Nonlinear Retail Credit Company (1899) changes name to Equifax 1975 Next Challenge
  6. 7 Big Data Projects (study of 600) – who uses

    big data ? Source: Operationalizing the Buzz Report 2013 Enterprise Management Associates, Shawn Rogers Full report: http://www.9sight.com/BigData_2013_Survey.pdf
  7. 8 Four typical decision processes that benefit from predictive analytics.

    Financial Services Process with Scoring Components
  8. 9 Model and Decision Lifecycle Data Preparation Model Building Model

    Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring
  9. 10 Tools, Tools, Tools are needed to build and evaluate

    models.. … don’t forget the engine .. Data Manipulation Management & Operationalization Visualization Advanced analytics Queries, Data Management Toolboxes Libraries Ideas and Algorithms Statistics!
  10. 11 Why do we do that ? • Fame ?

    • Fun ? • Save the world? • … Fortune .. • Customer Insight .. • Business Insight ..
  11. 13 Building predictive (score) models • Sophisticated data mining tool

    with modern algorithms • Predictive modeling with use of text and data mining for use for analytical purposes and build statistical & policy models (credit, collection, insurance, churn etc.) Integration Administration of reports, models flows, users, dashboards Data and model management Scoring (calculation) Deploy score models Monitoring Reporting • Complete model flows managed by Risk Management resources (without IT resources) • Administrate models flows through environments: Development, Test, acceptance and production • Real time (1 by 1) • Near real time (small batches) • Batch (Big Data) • Automatic monitoring of processes and critical parameters at the same time. •Basel reports, WoE, capital allocation, Pillar II&III •Scorecard monitoring reports •Portfolio reporting (delinquency, roll rate etc) • Webservice/SOAP/ • OLEDB to databases • Access control (different users different access) • Version control (roll back, compare) • Change log • Automatic model and report documentation • Manage complete workflows (inc db connections, data preparation, policy rules, scoring). • Off-load tasks to the Enterprise server Scoring Server Model Builder Monitoring and alerting Server Model Management Decisioning Platform Reporter Functionality Description Tool Purpose/problem Credit risk Fraud detection Asset & Liability Man. Compliance Market Risk Fraud risk Customer Acquisition Customer retention Role-based security Operational risk
  12. 15 • Weight of Evidence (WOE) is a method of

    grouping the data in a continuous predictor variable (like vehicle age) in a way that maximizes that variable’s ability to distinguish between the two possibilities, e.g. fraud or not fraud, in the target variable. • It allows us to create a fairly precise model (with large lift), that simultaneously provides simple rules that can be easily communicated to employees, managers, customers, and regulators. Weight of Evidence (WOE)
  13. 16 • The overall idea lies on dividing the data

    into subgroups so that we can maximize the difference in two classes of the response-variable among groups. • But how do we measure, as an example, the “difference in fraud” among groups? We use a measure called “Information Value” (IV), which is based on an intermediate calculation called the “Weight of Evidence” or “WoE”.
  14. 17 • There are different ways of making this binning.

    We can select the model that is most appropriate for the current needs. More complex models will have a larger Information Value (IV). And a larger Information Value roughly corresponds to more lift.
  15. 18 Why calculate multiple models of varying complexity? Why not

    always use the model with the largest Information Value (IV)? • Sometimes it is better to trade accuracy for simplicity. • Or maybe some models violate rules against using certain criteria e.g. gender or customer age, to price a product. • Possible considerations are: – The need for rules that can be easily explained to employees, managers, customers, and regulators. – Legal requirements to treat genders and ethnicities equally. – The list of possibilities is endless. • A more complex rule, with a bigger Information Value (IV), may violate some of these requirements. And so a simpler rule may be needed instead.
  16. 19 The theory of information “The fundamental problem of communication

    is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities.” Shannon, C. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656. But where does all this come from ?
  17. 20 Information Theory can be defined as the branch of

    mathematics that formalizes the study of “how to move a message from A to B” = − � log =1 H(X) (Entropy) provides a measure of the "quantity of information" provided by the message X
  18. 21 A very common usage of this concepts lies on

    the Shannon index, widely used in Ecology. It tries to quantify the diversity of species (or other taxonomical unit) in a certain study area.
  19. 22 Information value of x for measuring y: quantifying the

    predictive power of x in explaining y. Example: If y is binary (0,1), we can divide the population in 10 equal parts after sorting the data bucket by x, and calculate the deciles. = � − 𝑔 10 =1 𝑔
  20. 23 = ∑ 𝒏 = ∑ 𝒏 = Weight of

    Evidence The Weight of Evidence or WoE value is a widely used measure of the “strength” of a grouping for separating good (G) and bad (B) risk (default).
  21. 24 • The response-variable is binary (good/bad, fraud/no fraud,…) •

    Consider a simple example where one of our explanatory variables is divided into 2 groups. • Define the Distribution of Goods (DistrGoods) for a group to be the ratio of goods in that group to all goods in all groups. In our fraud example, good = no fraud. • Similarly, the Distribution of Bads (DistrBads) for a group is the ratio of bads in that group to all bads in all groups. In our fraud example, bad = fraud. • Then WOE for each group = 100 * ln(DistrGoods/DistrBads). Let’s make this more operational…
  22. 25 • Let’s see how this works with a simple

    “Monotone” model. • Total goods = 199 + 1857 = 2056 • Total bads = 525 + 151 = 676 • Group 1: • DistrGoods = 199/2056 = 0.0968 and DistrBads = 525/676 = 0.7766 • WOE = 100 * ln(0.0968/0.7766) = -208.23 (rounding difference) IV and WoE
  23. 26 • We calculate IV like this: IV = (DistrGoods

    – DistrBads) * ln(DistrGoods/DistrBads) IV = (DistrGoods – DistrBads) * WOE/100 • Why? –We want groups that distinguish between good and bad, e.g. fraud and not fraud. –So for each group, we want that group’s proportion of the total goods to be different than its proportion of the total bads. The bigger that difference is, the better. –It’s all about the “amount of information”…
  24. 27 • Groups with equal proportions of the total goods

    (DistrGoods) and total bads (DistrBads) do not tell us anything. • Recall, WOE = 100 * ln(DistrGoods/DistrBads). • So DistrGoods = DistrBads => WOE = 100 * ln(1) = 0. • So WOE = 0 means membership in that group tells us nothing. Conversely, the farther WOE is from zero, the more membership in that group tells us. • But remember, we do not care if DistrGoods is bigger than DistrBads or vice versa. We only care that they are different.
  25. 28 • WOE = 100 * ln(DistrGoods/DistrBads). • If DistrGoods

    = DistrBads, the ratio is = 1 => ln() = 0 => WOE = 0. • If DistrGoods > DistrBads, the ratio is > 1 => ln() > 0 => WOE > 0. • If DistrGoods < DistrBads, the ratio is < 1 => ln() < 0 => WOE < 0.
  26. 30 Histogram of Months Acct; categorized by Credit Standing Months

    Acct No of obs 5.0 10.2 15.5 20.7 25.9 31.2 36.4 41.6 46.8 52.1 57.3 62.5 67.8 73.0 0 10 20 30 40 50 60 Credit Standing: Good Credit Standing: Bad Starting point
  27. 32 WoE trend for variable:Months Acct (-inf,10> (10,13> (13,19> (19,25>

    (25,37> (40,inf) -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 WoE weight of evidence Variable categorized by WoE
  28. 34 Why use the Weight of Evidence? Converts a measure

    of risk into a linear scale => Easier for humans to interpret. “Bucketing” the variables makes them easier to implement on a scorecard Makes the variables comparable because they are all coded to the same scale. Very useful for binning.
  29. 35 What are the challenges? • Finding the right binning

    • Build a Good Model • Monitor Performance • Keep Compliance • Integrate Policy Rules • Time, Time, Time – Model Development – Implementation – LOB enablement, real time scoring – Model stability monitoring
  30. 38 Scorecard Building Process Overview • Data Preparation – Feature

    Selection – Interactions and Rules – Attribute Building • Modeling – Scorecard Preparation – Survival – Reject Inference • Evaluation and Calibration • Model Evaluation • Cut-off Point Selection • Score Cases • Calibration Tests • Monitoring • Population Stability • Deployment
  31. 40 AR and KGB .. Accept Reject Known Good Bad

    Rejects 1,850 Through – -the-Door 10,000 Accepts 8,150 Goods 7,167 Bads 983
  32. 47 Do they work any better ? • PAKDD 2010

    Credit Risk Modelling Competitions, win, Grzegorz Haranczyk, StatSoft – Combination of Boosted Trees and Optimization “Predictive model optimization by searching parameter space” • International Computational Intelligence Algorithm Competition (CIAC), co-organized by NeuroTech S.A – Knut Opdal, StatSoft, Boosting Trees, NNs – “Benchmarking of different classes of models used for credit scoring” • CRC 2013 – Thomas Hill, Vladimir Rastunkov, Knut Opdal, “General Approximators for Credit Scoring – Practical Considerations” http://www.business-school.ed.ac.uk/waf/crc_archive/2013/27.pdf • There also DO exist practical implementations of general approximators
  33. 48 Challengers … How companies are evaluated today.. High Low

    Partial Complete Challengers Technology masters Vision Ability to execute Leaders Home Players Visionaries Ideas and Inventions
  34. 49 Dell has a comprehensive, agnostic, and modular portfolio Services:

    IT/business alignment, infrastructure readiness, analytics maturity, performance measurements Infrastructure Management Integration BI and data discovery Advanced analytics Data platforms: Oracle, SQL Server, Hadoop, MongoDB, IBM DB2, and more Storage, servers, networking TOAD, Hadoop Statistica Toad BI Kitenga (Big Data, in/out) Boomi, SharePlex
  35. 50 Model and Decision Lifecycle – Time Flies Data Preparation

    Model Building Model Evaluation & Comparison Champion Selection Model Monitoring Model Management Rules Management Conditional Scoring
  36. 51 Case Study: Danske Bank Validated Modeling Challenges  Danske

    Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model life- cycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion
  37. 52 Case Study: Danske Bank Validated Modeling Challenges  Danske

    Bank made the strategic decision to overhaul and upgrade their existing (SAS-based) risk- and credit-scoring solution platform and infrastructure  Requirements included efficient (risk) modeling and model life- cycle management, efficient model deployment from development through production environments, and fast batch and real-time scoring of customer records  The platform needed to support Danske’s existing IT assets and be deployable flexibly and on- demand in a virtualized hardware environment  Statistica offered technology that was easiest to integrate (and perhaps the only one that could be integrated) using their existing tools  The Statistica Decisioning Platform solution for Banks (deployed at Danske) was capable of the highest speed scoring in a parallel distributed virtual environment of all Danske Bank accounts  Statistica offered the quickest model lifecycle management, measuring weeks (vs. months in SAS and IBM) for the development and deployment of a risk model  After the highly successful deployment of this mission-critical risk management system, Danske Bank is in the process of moving all its analytics needs (e.g, Basel II compliance reporting) from the SAS legacy system to Statistica over the next 2 years  Danske is expanding the use of the Statistica platform and is currently engaged in a project with Danske Bank for an automated property valuation solution for over 1 million properties in Denmark Challenges Solutions Result Company Profile  Founded: 1871  Employees: 20,184  Headquarters: Denmark  Revenue: $6.4 Billion
  38. 53 Case Study: Premiership League - CRM Propensity Modeling Challenges

     Need to understand supporters purchasing behaviour  Insights to drive marketing campaigns  Budget constraints  Propensity modelling based on historical supported activity and external data (demographic)  Segmentation  Propensity model built by StatSoft consultant  Accepted by client and tested in marketing campaigns  Initial project for Arsenal FC  Follow-up analysis for 12 premiership football clubs  Sports Alliance developed a statistical product add-on for CRM Challenges Solutions Result Company Profile  Founded: 2010  Employees: undisclosed  Headquarters: London  SME
  39. 54 Case Study: - Grupo Nelson Paschoalotto (Brazil) Propensity Modeling

    Challenges  Credit Recovery Success Rate  Optimal Resource Usage  Scoring propensity of responding positively to various credit recovery methods  Solution still in development Challenges Solutions Result Company Profile  Founded: 1996  Employees: 10,000  Headquarters: Brazil  Revenue: Large  Main Business: Credit Recovery
  40. References L. Thomas, D. Edelman, J. Crook – Credit Scoring

    and Its Applications, siam 2002 N. Siddiqi – Credit Risk Scorecards, John Willey, 2006 Statistica Scorecard Formula Guide http://documentation.statsoft.com/portals/0/formula%20guide/STATISTICA%20Scorecard%20Formula%20Guide.pdf
  41. propaganda • Yes, we can confirm that Dell is hiring

    for Data Scientists roles in London …. in the area of risk and compliance for banking and financial services Lunch & Learn Case Studies, in London, by Dell Software www.statsoft.co.uk/event/nextlunch Upcoming Event: 30th of September, London, EC2M 1JH, from 12:45 “Improving Patient Care with Predictive Analytics” How a Macular Unit is using advanced analytics to monitor and analyze AMD data (Age-Related Macular Degeneration) in clinical practice and clinical audits To book your place http://www.eventbrite.com/e/lunch-learn-case-study-improving-patient-care-with-predictive-analytics-tickets-12855492123
  42. 60