Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning at Zopa

Machine Learning at Zopa

How we do training and deployment of ML models at Zopa.

techsessions

December 07, 2017
Tweet

More Decks by techsessions

Other Decks in Technology

Transcript

  1. Machine Learning at Zopa 7 December 2017 Vlasios Vasileiou Head

    of Data and Data Science Vlasios Vasileiou Head of Data Science
  2. A pioneering financial services company World’s 1st peer-to-peer lending platform

    in 2004 £2.8 billion lent to date. Strong annual growth >50% 246,000 people have taken a Zopa loan 59,000 actively invest through Zopa
  3. Strong, consistent growth Amount lent and forecast by year 2005

    2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 £2bn £1bn
  4. ML & Advanced Analytics have been a strong supporter of

    our growth ü Credit Risk Assessment ü Onboarding Funnel Simulation ü Pricing Optimization ü Fraud Identification ü Document Forensics ü Optimized Prospect and Existing Customer Marketing ü Income and Rent Estimation for Affordability Evaluation 5
  5. ML & Advanced Analytics have been a strong supporter of

    our growth 6 ü Credit Risk Assessment ü Onboarding Funnel Simulation ü Pricing Optimization ü Fraud Identification ü Document Forensics ü Optimized Prospect and Existing Customer Marketing ü Income and Rent Estimation for Affordability Evaluation
  6. ML & Advanced Analytics have been a strong supporter of

    our growth 7 How we did get here? What were our challenges and learnings? ü Credit Risk Assessment ü Onboarding Funnel Simulation ü Pricing Optimization ü Fraud Identification ü Document Forensics ü Optimized Prospect and Existing Customer Marketing ü Income and Rent Estimation for Affordability Evaluation
  7. Analytical infancy (2004 – 2013) • SQL • Excel •

    Some Python • Externally produced credit models
  8. Analytical renaissance (2014 – 2015) 2014, £15m investment Board recognized

    need for data-driven growth ü Creation of Data Science function ü Investment in Data Analytics
  9. Systematizing Machine Learning at Zopa (2014) Wanted to be able

    to produce ML models that were: § Rapidly generated § Easily vettable § Highly predictive § Easily deployable Several considerations: n Common codebase or personal choice of tools? n Buy or build? n Which language? Which package? Created by Alekksall - Freepik.com
  10. Wanted to be able to produce ML models that were:

    § Rapidly generated § Easily vettable § Highly predictive § Easily deployable Several considerations: n Common codebase or personal choice of tools? n Buy or build? n Which language? Which package? Systematizing Machine Learning at Zopa (2014) Common codebase
  11. Systematizing Machine Learning at Zopa (2014) Common codebase Built in-house

    Wanted to be able to produce ML models that were: § Rapidly generated § Easily vettable § Highly predictive § Easily deployable Several considerations: n Common codebase or personal choice of tools? n Buy or build? n Which language? Which package?
  12. Wanted to be able to produce ML models that were:

    § Rapidly generated § Easily vettable § Highly predictive § Easily deployable Several considerations: n Common codebase or personal choice of tools? n Buy or build? n Python Systematizing Machine Learning at Zopa (2014) Common codebase Built in-house
  13. Predictor – Zopa’s ML Toolkit (2014) n Streamlined, Automated ML

    Application n Implements all stages of producing an ML model, requiring minimal user input n Leverages PyData Ecosystem n 9k lines
  14. How is Predictor Used? n Same exact code used by

    Data Scientists and in Production √ No “Model Translation” Overhead √ No restrictions on which ML techniques can be used n Training n Needs unprocessed data and a simple config file n Driven via CLI, interacting with Python codebase, or Jupyter GUI n Querying Mode n Driven by all of the above + n rest API (Flask-based microservice) in Production, <1—2 s/ call
  15. Learnings? n We have full control of how we do

    ML √ Methodology √ Deployment √ Features √ Competitive advantage, selling point to DSs. √ One day we will open source à Brand boost Ø Need to maintain at least 2 people familiar with the code base Ø Need to keep improving it – otherwise we’ll miss latest developments
  16. How have we used Predictor? n Was used to credit

    assess >£2billion worth of loans since 2015 n Production: credit and fraud risk assessment, affordability evaluation n Offline: Pricing and marketing optimization n Future: Operational efficiency for Underwriting, Collections, Customer Services n Techniques we have been using n Neural Networks, n Bagged logistic regressions and Multivariate Adaptive Regression Splines n Gradient Boosted Trees, and Random Forests