Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Technical Debt in Machine Learning

Avatar for Lia Lia
August 29, 2018

Technical Debt in Machine Learning

Presentation regarding the paper: https://ai.google/research/pubs/pub43146

Avatar for Lia

Lia

August 29, 2018
Tweet

More Decks by Lia

Other Decks in Programming

Transcript

  1. ML - Examples Personal Information Decision Tree / Random Forest

    Positive Number Address, Age, Education, Taxes, Last Name… Each observation is a row A single number between R$954 and R$1.79 trillion
  2. ML - Examples Image Neural Network Probability Is it a

    cat or a dog? Matrix 32x32x3 where 3 is the RGA representation Each observation is a matrix 3D Probability to be a cat
  3. ML - Training (supervised learning) Inputs / Features Output Which

    algorithm is appropriated for this problem? How I find the best parameters? What am I trying to optimize? What is my cost / loss? How should I evaluate it?
  4. ML - Training (supervised learning) Inputs / Features Output Linear

    Algebra Statistics Calculus, convex and non convex optimization Business
  5. ML - Training (supervised learning) Inputs / Features Train /

    Find best parameters and hyperparameters
  6. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters
  7. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy API call Library, package Job
  8. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy Remember it! API call Library, package Job
  9. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy Remember it! Getting and cleaning data take about 80% of working time (not sexy) Most of companies do not have a mature data warehouse to support it API call Library, package Job
  10. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) Real Time - You can’t more than one second
  11. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job Real Time - You can’t more than one second - It is closer to a service
  12. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing
  13. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited
  14. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler
  15. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) - Deployment is usually easier Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler - Deployment is usually harder
  16. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) - Deployment is usually easier - Does not affect at lot your current infra Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler - Deployment is usually harder - Affects at lot your current infra Cool project to deploy Cool project in C - YOLO
  17. Technical Debt - System Level Spaghetti … due to “engineering”

    and “research” roles (who train is not the person who deploy)
  18. Technical Debt - System Level Spaghetti … due to “engineering”

    and “research” roles (who train is not the person who deploy)