Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning for Chief Future Officers

130

Machine Learning for Chief Future Officers

Data Science for Finance Meetup

Christophe Bourguignat

March 10, 2016
Tweet

Transcript

  1. Machine Learning for Chief Financial Future Officers Data Science in

    Finance Meetup - March 10, 2016 Christophe Bourguignat - @chris_bour
  2. Data breaks down domain frontiers “ The winning team included

    a mathematician and an engineer, but no doctor ”
  3. The CFO as an organization control tower Data - Invoices

    - Orders - Payments - HR - Contracts - ...
  4. If people could see in high dimensions machine learning would

    not be necessary “ ” “A Few Useful Things to Know about Machine Learning”, Pedro Domingos
  5. Framing The Problem Id Supplier Id Budget Code Buyer Id

    Order Date Amount Invoice Date Payment Date Ageing 34 SH24 FR657 B26 12/03/14 7600 15/06/14 25/06/14 10 35 SH87 FR425 B22 02/03/14 12390 10/07/14 30/07/14 20 98 SH65 FR034 B33 02/10/14 72980 01/11/14 25/11/14 24 With these (heterogeneous) features .... … predict that target
  6. Features engineering (2/2) Order Date -> Day of the week,

    month, holiday indicator Invoice Date -> Day of the week, month, holiday indicator, number of unpaid invoices to date Order Date, Invoice Date -> number of days between Order and Invoice Supplier ID -> average order amount, average number of order per year, size of the supplier, revenue of the supplier Budget Code -> average number of order per year, % of budget Buyer Id -> average number of order per year
  7. Where we are now 2 3 0 5 3 7800

    23 12 57 3 4 0 4 9 12300 2 54 10 1 10 1 4 1 5440 4 2 78 5 3 0 3 2 1500 0 67 23 4 12 1 5 1 54988 2 34 76 20 000 rows 30 columns 36 72 14 38 22 target
  8. Selection criteria - Small / medium Data (< 1 GB)

    - Batch - Flat Files Data Sources - Open Source - Machine Learning - (French )
  9. Selection criteria - Small / medium Data (< 1 GB)

    - Batch - Flat Files Data Sources - Open Source - Machine Learning - (French )
  10. Train / Test split advanced strategies (1/2) Problem with random

    train / test split : bulk supplier payments makes the model performance over optimistic Supplier ID Invoice Date Payment Date S921 20/02/2013 31/03/2013 S921 20/02/2013 31/03/2013 S921 20/02/2013 31/03/2013 Supplier ID Invoice Date Payment Date S921 20/02/2013 31/03/2013 S921 20/02/2013 31/03/2013 Supplier ID Invoice Date Payment Date S921 20/02/2013 31/03/2013 Train Test Test set contains observations already seen in train set. This is cheating ...
  11. Train / Test split advanced strategies (2/2) Better split by

    date, coherently with the fiscal period E.g : train = fiscal year N. Test = fiscal year N+1
  12. The Loneliness of The Data Scientist who made a ML

    model Will anybody ever use it ?
  13. Other Use Cases Late payement risk prediction Cash forecasting Audit

    / anomaly detection Contracts assessements Automated validation Smart Procurement ... @chris_bour