Slide 1

Slide 1 text

Predictive Models For Prosper Peer-to-Peer lending platform Jeyaram Ashokraj

Slide 2

Slide 2 text

P2P Lending •Crowd sourced funding •No intermediaries •Investment Options •Lending Club •Prosper

Slide 3

Slide 3 text

Prosper Platform •Prosper has a rich dataset when compared to Lending Club. •Historical data from its inception(2005). •Prosper provides observations across 7 objects •Listings, Loans, Groups, Categories, Marketplaces, Members

Slide 4

Slide 4 text

How it works? Prosper Platform Borrower Investor $$$.. Creates Listing Places BID Please read the offer documentation carefully  Monthly EMI Investment

Slide 5

Slide 5 text

Objectives •Will my loan get approved? •Will my investment default? •What rating a borrower gets?

Slide 6

Slide 6 text

Prosper Data •3.2 GB XML •2.3 million records and 70+ variables •Subset 2008 (95k) and 2013(50k) data •~70 variables

Slide 7

Slide 7 text

Variables Predictors Response Quantitative AmountRequested, BidCount, EstimatedLoss, LenderYield, ProsperScore, DebtToIncomeRatio, OnTimeProsperPayments, ProsperPaymentsLessThanOneMonthLate, ProsperPaymentsOneMonthPlusLate, AmountFunded , AmountRemaining, BidMaximumRate, BorrowerMaximumRate, BorrowerRate, Category, Duration,ActiveProsperLoans, TotalProsperLoans, ProsperPrincipalBorrowed, ProsperPrincipalOutstanding, TotalProsperPaymentsBilled, CreditScoreRangeLower, CreditScoreRangeUpper, MonthlyLoanPayment, BankDraftFeeAnnualRate, GroupLeaderRewardRate, PercentFunded, Term Categorical HasVerifiedBankAccount, IsBorrowerHomeowner, FundingOption,City,State,GroupName,GroupRating ProsperRating, ListingStatus, LoanStatus

Slide 8

Slide 8 text

Key Variables •BidCount •ProsperScore •CreditScore •Debt to Income Ratio •OnTimePayments

Slide 9

Slide 9 text

GBM Relative Influence

Slide 10

Slide 10 text

Loan Default Prediction • 2013 data didn’t work out. • Loan term was 3 and 5 year • Binary Classification • Response: LoanStatus • Defaulted, Complete • Random forests • 86 % prediction accuracy

Slide 11

Slide 11 text

Confusion Matrix

Slide 12

Slide 12 text

Loan Approval Prediction •Approved/Rejected ? •Binary Classification •Response Variable: ListingStatus • Completed, Cancelled •Random forests didn’t work (more than 32 categories) •Naïve Bayes Classifier •87% prediction accuracy

Slide 13

Slide 13 text

Borrower Ratings •7 Ratings •Key for Investment •AA least risky •HR high risky •Prosper Algorithm

Slide 14

Slide 14 text

Borrower Ratings • Multiclass Classification • Response: ProsperRating • Completed - Naïve Bayes • 90% accuracy • In-Progress - Gradient Boosted Trees, SVM

Slide 15

Slide 15 text

Confusion Matrix

Slide 16

Slide 16 text

Tools •Grep, Sed, AWK •Libxml – python •MySQL •R – Caret, e1071, gbm, libsvm •Rattle •Weka

Slide 17

Slide 17 text

Future Scope •Improving prediction accuracy – Analyzing individual credit profile data •Money flow – Analyzing bids placed •Impacts of Social Networking (Friends, References) •Influence of groups and categories

Slide 18

Slide 18 text

Thanks for your patience