Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Technical Debt in Machine Learning

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.
Avatar for Lia Lia
August 29, 2018

Technical Debt in Machine Learning

Presentation regarding the paper: https://ai.google/research/pubs/pub43146

Avatar for Lia

Lia

August 29, 2018
Tweet

More Decks by Lia

Other Decks in Programming

Transcript

  1. ML - Examples Personal Information Decision Tree / Random Forest

    Positive Number Address, Age, Education, Taxes, Last Name… Each observation is a row A single number between R$954 and R$1.79 trillion
  2. ML - Examples Image Neural Network Probability Is it a

    cat or a dog? Matrix 32x32x3 where 3 is the RGA representation Each observation is a matrix 3D Probability to be a cat
  3. ML - Training (supervised learning) Inputs / Features Output Which

    algorithm is appropriated for this problem? How I find the best parameters? What am I trying to optimize? What is my cost / loss? How should I evaluate it?
  4. ML - Training (supervised learning) Inputs / Features Output Linear

    Algebra Statistics Calculus, convex and non convex optimization Business
  5. ML - Training (supervised learning) Inputs / Features Train /

    Find best parameters and hyperparameters
  6. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters
  7. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy API call Library, package Job
  8. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy Remember it! API call Library, package Job
  9. ML - Training (supervised learning) Inputs / Features + Labels

    Labels: *Observed “outputs” from historical data* Train / Find best parameters and hyperparameters Evaluation Deploy Remember it! Getting and cleaning data take about 80% of working time (not sexy) Most of companies do not have a mature data warehouse to support it API call Library, package Job
  10. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) Real Time - You can’t more than one second
  11. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job Real Time - You can’t more than one second - It is closer to a service
  12. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing
  13. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited
  14. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler
  15. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) - Deployment is usually easier Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler - Deployment is usually harder
  16. ML - Batch and Real Time Batch - You can

    wait to get the answer (5 min, 1 day...) - It is closer to a job - It is possible to wait until an information be available - It can take more information in account - Model can be more complex (more time to run) - Deployment is usually easier - Does not affect at lot your current infra Real Time - You can’t more than one second - It is closer to a service - Has to give a answer even if some information is missing - Information is limited - Model have to be simpler - Deployment is usually harder - Affects at lot your current infra Cool project to deploy Cool project in C - YOLO
  17. Technical Debt - System Level Spaghetti … due to “engineering”

    and “research” roles (who train is not the person who deploy)
  18. Technical Debt - System Level Spaghetti … due to “engineering”

    and “research” roles (who train is not the person who deploy)