Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FICO

Elastic Co
March 11, 2015

 FICO

This talk was presented at the inaugural Elastic{ON} conference, http://elasticon.com

Session Abstract:

By leveraging Elasticsearch, unstructured and semistructured data can be used to significantly improve the performance of FICO’s analytics models. Their Analytic Modeler for Text products makes these advanced analytics and visualizations available to end users in an intuitive and engaging way. This talk will cover how FICO has integrated advanced descriptive, diagnostic, and predictive analytics with Elasticsearch, and extended Kibana to provide advanced visualizations against same.

Presented by Osvaldo Driollet, FICO

Elastic Co

March 11, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. Behind your credit Score – FICO Financial Credit and Risk

    Analytics System Osvaldo Driollet, Ph.D. Principal Scientist, Director
  2. { } CC-BY-ND 4.0   o  Some  facts  about  FICO

      o  The  unstructured  data  opportunity   o  FICO  Text  And  Visual  Analy=cs  for  Unstructured  Data  Analysis                                                                                           (ELK  stack  +  FICO  Analy=cs)     o  Demo   o  Ques=ons   FICO Text and Visual Analytics
  3. { } CC-BY-ND 4.0 FICO  analy=cs     o  Are

     used  in  three  out  of  four   US  mortgage  origina=ons.   FICO  is  ranked     o  #7  worldwide  in  business  analy=cs     o  #1  in  services  and  analy=cs  applica=ons     FICO  clients  include     o  Most  of  the  top  100  banks  in  the  world,     o  90%  of  the  100  largest  financial  ins=tu=ons  in   the  US.   o  The  top  10  US  property  and  casualty  insurers   o  50%  of  the  largest  US  retailers   o  70%  of  the  world's  top  10  pharmaceu=cals   companies.     o  Nine  of  the  top  10  Fortune  500  companies.   Today FICO  score     o  The  most  widely  used  credit   bureau  score  in  the  world.   o  More  than  100  billion  sold   o  Used  by  over  70,000  retailers   in  USA  alone  
  4. { } CC-BY-ND 4.0 { 7 } Internal   Customer

      Data   External   Customer   Data   Social  Data   Purchase  History   Customer  Profiles   Loyalty  Programs   Call  Center  Data   Customer  Communica=ons   Geographic   Demographic   Consumer  Reports   User  Generated  Content   Ra=ngs  and  Reviews   Forums   Twi[er,  Facebook.   Linkedin   Google+   Increasing  data  variety  and  complexity   Enterprise  Data   Big  Data   85%   unstructured   FICO®  Text   and  Visual   Analy?cs   Billions  of   Lost  Insights   FICO® Text and Visual Analytics The  Unstructured    Data  Opportunity    
  5. { } CC-BY-ND 4.0 { 8 } o  Automates  the

     analysis  and  explora=on  of  unstructured  data   o  Enables  more  sophis=cated  descrip=ve  and  predic=ve  models  when  used  with   the  FICO  Solu=on  Stack  and  the  FICO  Analy=c  Cloud   o  Facilitates  ac=onable  intelligence  for  improved  business  performance.     o  Extracts  valuable  insights  by  embedded  Visual  Analy=cs.   o  Discovers  hidden  pa[erns  and  sen=ment  buried  in  text  with  less  =me   and  effort.       o  Offers  a  rich  and  easy  to  use  set  of  integrated  data  mining  tools   FICO® Text and Visual Analytics The  Power  to  Discover  
  6. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    9 } Natural  Language   Processing   Sta?s?cal   Analysis   Text  Mining   En=ty  Extrac=on   Sen=ment  Analysis   Visual  Data   Explora?on   FICO  ®  Text   and  Visual   Analy?cs   Feature  Selec=on   Preprocessing   and  Filtering   Normaliza=on   Informa=on  Theory   Informa=on   Retrieval   Dedicated   Dashboards   Machine  Learning  for   -­‐  Classifica=on                                   -­‐  Clustering                                           -­‐  Recommenda=on               -­‐  Topic  Modeling   Time  series   Analysis   Topic   Iden=fica=on   Automa=c   Summariza=on   Rela=onal  Concept   Mining   Ontology   Management   And  we  are  working  to  add…   By  leveraging  Elas=csearch   and  Kibana,  FICO  Text  and   Visual  Analy=cs  offers   advanced  descrip=ve  and   predic=ve  analy=cs   embedded  in  a  interac=ve   visualiza=on  framework  in   an  intui=ve  and  engaging   way.     FICO   Analy=c   Panels   In  a  Nutshell     Unsupervised   Sen=ment   Analysis  
  7. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    11 Structured  and  Unstructured  Data  Flow     Filters   Queries   Informa=on   Retrieval   <Pass  Through>   Structured  Data  Analysis   Text  Analysis   Unstructured  Data   Analysis   Sen?ment  Analysis   En?ty  Extrac?on   Data   Index   (Split)   Aggrega?ons   Hadoop   Connector   Dynamic   Analyses   Visualiza?ons   Visual   Analy=cs   Feature   Extrac?on   Dynamic   Analy?cs   Feature   Extrac?on   Search  
  8. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    12 Extending  ELK  with  FICO  Analy=cs   Sta=s=cal  Text  Mining  
  9. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    13 Extending  ELK  with  FICO  Analy=cs     Rela=onal  Concept  Mining  
  10. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    14 Extending  ELK  with  FICO  Analy=cs     En=ty  Extrac=on  and   Sen=ment  Analysis  
  11. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    15 • Preprocessing,  Indexing  and   Meta  data  support   • Mul=-­‐format  support   • Mul=-­‐language  support   • Security  Layer   • Text  Mining   • Sta=s=cal  Analysis   • En=ty  Extrac=on   • Sen=ment  Analysis   • Fraud  and  Risk  Mi=ga=on   • Predic=ve  Modeling   • Insurance  Assessment   • Marke=ng  Campaigns   • Customer  Experience  Management   • Social  Media  Analysis   • Interac=ve  Data   Visualiza=ons   • Customizable  Dashboards,   Queries  and  Filters   • Interac=ve  Insights   Visualiza=on   and   Discovery   Business   Solu=ons   Data  Access   Analy=c   Services     FICO  ®  Text   and  Visual   Analy?cs   Be[er  Data  Insights    
  12. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics Discovering

     Hidden  Insights  for  Be[er  Business  Decisions   { 16 Add  Social  Media  insights  into     business  intelligence  schemes   Improve  Business  Assessment   Minimize  Risk   Increase  Response  Rate  for     Marke=ng  Campaigns   Curb  Customer  A[ri=on   Detect  Fraud   Create  Be[er  Predic=ve  Models   Explore  Customer  Experience     Management  (CEM)  strategies     An=cipate  Resource  Demands  
  13. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    17 Demo  of  Selected  Use  Case   Gebng  Started  (applica=on  overview)   Core  text  Analysis   Insights  Discovery   Named  En=ty  Recogni=on   Sen=ment  Analysis   Visualiza=on  of  Results  
  14. { } CC-BY-ND 4.0 { 18 Data  Descrip?on   Complaint

     and  defec=ve  automo=ve  parts  database  including  car   makers,  brand  names,  accidents  occurrence,  injuries,  fatali=es,  drivers’   comments,  dealer’s  info,  etc.     About  1  Million  complaints  from  1980  to  2013     47  data  fields                  4  categorical  Indicators  (Crash,  Fire,  Injured,  and  Fatali=es)                2  Text  Fields:                                  -­‐    Failing  Component  Descrip=on                                -­‐    Complaint  Descrip=on   For  demo  purposes,  3  random  samples  of  about  70,000  data  records   were  generated                    NHTSA  1980-­‐2013                  NHTSA  1980-­‐1999                  NHTSA  2000-­‐2013   Na?onal  Highway  Transporta?on  and  Security  Administra?on  Database     hMp://www.safercar.gov/Vehicle+Owners     You  will  see  (with  a  few  clicks)   •  How  to  use  the  out-­‐of  the-­‐box  Predic=ve  Mining  Panel  to  unleash   the  power  of  unstructured  data               You  will  see  (with  a  few  clicks)   •  How  to  use  the  out-­‐of  the-­‐box  Predic=ve  Mining  Panel  to  unleash   the  power  of  unstructured  data             •  How  to  add  Interac=ve  Visual  Analy=cs  to  your  business   applica=ons   You  will  see  (with  a  few  clicks)   •  How  to  use  the  out-­‐of  the-­‐box  Predic=ve  Mining  Panel  to  unleash   the  power  of  unstructured  data             •  How  to  add  Interac=ve  Visual  Analy=cs  to  your  business   applica=ons   •  How  to  customize  and  profile  your  data  and  find  the  ‘needle  in   the  haystack’     You  will  see  (with  a  few  clicks)   •  How  to  use  the  out-­‐of  the-­‐box  Predic=ve  Mining  Panel  to  unleash   the  power  of  unstructured  data             •  How  to  add  Interac=ve  Visual  Analy=cs  to  your  business   applica=ons   •  How  to  customize  and  profile  your  data  and  find  the  ‘needle  in   the  haystack’   •  How  to  use  the  FICO  Solu=on  Stack  to  transform  the  discovered   insights  into  more  profitable  decisions     You  will  see  (with  a  few  clicks)   •  How  to  use  the  out-­‐of  the-­‐box  Predic=ve  Mining  Panel  to  unleash   the  power  of  unstructured  data             •  How  to  add  Interac=ve  Visual  Analy=cs  to  your  data  applica=ons   •  How  to  customize  and  profile  your  data  and  find  the  ‘needle  in   the  haystack’   •  How  to  use  the  FICO  Solu=on  Stack  to  transform  the  discovered   insights  into  more  profitable  decisions     •  How  easy  and  fast  Insurance  Assessment,  Survey  Analysis,  Brand   Reputa=on,  Predic=ve  Modeling,  and  other  applica=ons  can  be   built.     FICO® Text and Visual Analytics  
  15. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    19 Real  Data,  Real  Results  
  16. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    20 Accuracy:  FICO’s  Compara=ve  Advantage   Sentiment Analysis Comparison SemEval 2013 (*) (2218 Tweets) NHTSA (5000 records) Evaluation Criterion F-Measure on Average Classification (Negative - Neutral - Positive) •  FICO Text and Visual Analytics 54% (Tweeter Model) 97% - 1% - 2% (Product Model) •  Semantria (Lexalytics) 50% 41% - 53% - 5% •  RapidMiner 24% 23% - 10% - 67% •  AYLIEN 49% 81% - 1% - 16% •  Textalytics 51% 48% - 35% - 17% (*)    h[p://www.cs.york.ac.uk/semeval-­‐2013/task2/   F-­‐Measure:  Weighted  combina=on  of  Precision  and  Recall   Precision:  Percentage  of  documents  correctly  classified  over  the  total  found   Recall:  Percentage  of  documents  correctly  classified  over  the  actual  total    
  17. { } CC-BY-ND 4.0 FICO® Text and Visual Analytics {

    21 Be;er  decisions  through  be;er  data  insights                                     Blogs  and   Social   Media   Online   Forums   Review  Sites   Email  and   Messages   Market  and   Consumer   Reviews   Call  Center   Notes  and   Transcripts   News  and   Ar=cles   Sen=ment   and   Emo=ons   Customer   Defined   En==es   Named   En==es   Metadata   Predic=ve   Features   Hidden   Pa[erns   Customer   Service     Financial   Services   Reputa=on   Mgement   Banking  and   Insurance   Risk   Management   CEM  and  CRM   Insights  from   almost  any  data   source   Extrac=ng   almost  any   feature   For  virtually    any   applica=on   To  drive  be[er   business  decisions   The  Power  to  Discover    
  18. Fraud,  Waste  and  Abuse  Applica=on   FICO® Text and Visual

    Analytics The  Power  to  Discover    
  19. Fraud,  Waste  and  Abuse  Applica=on   FICO® Text and Visual

    Analytics The  Power  to  Discover    
  20. { } CC-BY-ND 4.0 This work is licensed under the

    Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nd/4.0/ or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042 USA { 29