Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predictive models for P2P lending

Predictive models for P2P lending

Predictive models for loans approval/rejections, loans default and risk calculation, and borrowers rating classification using data from www.prosper.com


May 03, 2014

More Decks by jeyaramashok

Other Decks in Programming


  1. Prosper Platform •Prosper has a rich dataset when compared to

    Lending Club. •Historical data from its inception(2005). •Prosper provides observations across 7 objects •Listings, Loans, Groups, Categories, Marketplaces, Members
  2. How it works? Prosper Platform Borrower Investor $$$.. Creates Listing

    Places BID Please read the offer documentation carefully  Monthly EMI Investment
  3. Prosper Data •3.2 GB XML •2.3 million records and 70+

    variables •Subset 2008 (95k) and 2013(50k) data •~70 variables
  4. Variables Predictors Response Quantitative AmountRequested, BidCount, EstimatedLoss, LenderYield, ProsperScore, DebtToIncomeRatio,

    OnTimeProsperPayments, ProsperPaymentsLessThanOneMonthLate, ProsperPaymentsOneMonthPlusLate, AmountFunded , AmountRemaining, BidMaximumRate, BorrowerMaximumRate, BorrowerRate, Category, Duration,ActiveProsperLoans, TotalProsperLoans, ProsperPrincipalBorrowed, ProsperPrincipalOutstanding, TotalProsperPaymentsBilled, CreditScoreRangeLower, CreditScoreRangeUpper, MonthlyLoanPayment, BankDraftFeeAnnualRate, GroupLeaderRewardRate, PercentFunded, Term Categorical HasVerifiedBankAccount, IsBorrowerHomeowner, FundingOption,City,State,GroupName,GroupRating ProsperRating, ListingStatus, LoanStatus
  5. Loan Default Prediction • 2013 data didn’t work out. •

    Loan term was 3 and 5 year • Binary Classification • Response: LoanStatus • Defaulted, Complete • Random forests • 86 % prediction accuracy
  6. Loan Approval Prediction •Approved/Rejected ? •Binary Classification •Response Variable: ListingStatus

    • Completed, Cancelled •Random forests didn’t work (more than 32 categories) •Naïve Bayes Classifier •87% prediction accuracy
  7. Borrower Ratings • Multiclass Classification • Response: ProsperRating • Completed

    - Naïve Bayes • 90% accuracy • In-Progress - Gradient Boosted Trees, SVM
  8. Tools •Grep, Sed, AWK •Libxml – python •MySQL •R –

    Caret, e1071, gbm, libsvm •Rattle •Weka
  9. Future Scope •Improving prediction accuracy – Analyzing individual credit profile

    data •Money flow – Analyzing bids placed •Impacts of Social Networking (Friends, References) •Influence of groups and categories