Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Under Test @ BiForum2015

Valerio Maggio
October 14, 2015

Machine Learning Under Test @ BiForum2015

One point usually underestimated or omitted when dealing with machine learning algorithms is how to write *good quality* code. The obvious way to face this issue is to apply automated testing, which aims at implementing (likely) less-buggy and higher quality code.

However, testing machine learning code introduces additional concerns that has to be considered. On the one hand, some constraints are imposed by the domain, and the risks intrinsically related to machine learning methods, such as handling unstable data, or avoid under/overfitting. On the other hand, testing scientific code requires additional testing tools (e.g., `numpy.testing`), specifically suited to handle numerical data.

In this talk, some of the most famous machine learning techniques will be discudded and analysed from the `testing` point of view, and some of the most important *Model Evaluation* workflow will be discussed.

The talk is intended for an *intermediate* audience. The content of the talk is intended to be mostly practical, and code oriented. Thus a good proficiency with the Python language is **required**. Conversely, **no prior knowledge** about testing nor Machine Learning algorithms is necessary to attend this talk.

Valerio Maggio

October 14, 2015
Tweet

More Decks by Valerio Maggio

Other Decks in Science

Transcript

  1. Machine learning under test PostDoc @ University of Salerno, Italy

    Budapest BI FORUM 2015 Valerio Maggio @leriomaggio
  2. Testing Machine Learning Code & Algorithms (a.k.a. Models) • Part

    1: Common Risks and Pitfalls 
 (related to learning models) https://goo.gl/yYBhri Slides already online on my (speaker)deck:
  3. Testing Machine Learning Code & Algorithms (a.k.a. Models) • Part

    1: Common Risks and Pitfalls 
 (related to learning models) • Part 2: Testing Machine Learning Code • What does it mean? • What tools I’m required to use? https://goo.gl/yYBhri Slides already online on my (speaker)deck:
  4. Machine learning is the systematic study of algorithms and systems

    that improve their knowledge or performance with experience T. Mitchell, 1997
  5. Machine learning is the systematic study of algorithms and systems

    that improve their knowledge or performance with experience T. Mitchell, 1997 • algorithms means code • knowledge implies data • performance requires testing
  6. Machine Learning is a child of statistics, computer science, and

    mathematical optimisation. Along the way, it took inspiration from information theory, neural science, theoretical physics, and many other fields. […] 
 The result is that Machine Learning papers are often full of impenetrable mathematics and technical jargon Alice Zheng (Chief Data Scientist @ Dato), 2015
  7. Machine Learning is a child of statistics, computer science, and

    mathematical optimisation. Along the way, it took inspiration from information theory, neural science, theoretical physics, and many other fields. […] 
 The result is that Machine Learning papers are often full of impenetrable mathematics and technical jargon Alice Zheng (Chief Data Scientist @ Dato), 2015 Machine Learning is a child of statistics, computer science, and mathematical optimisation. Along the way, it took inspiration from information theory, neural science, theoretical physics, and many other fields. […] 
 The result is that Machine Learning papers are often full of impenetrable mathematics and technical jargon
  8. Machine learning teaches machines how to carry out tasks by

    themselves. It is that simple. W. Richert & L.P. Coelho, 2013 Building Machine Learning Systems with Python
  9. Machine learning teaches machines how to carry out tasks by

    themselves. It is that simple. The complexity comes with the details. W. Richert & L.P. Coelho, 2013 Building Machine Learning Systems with Python
  10. What will be missing A/B Testing Pitfalls Problem Formulation (Def)

    The process of matching a dataset and a desired output to a well-understood Machine Learning task.
  11. What will be missing A/B Testing Pitfalls Problem Formulation Feature

    Engineering (Def) The process of matching a dataset and a desired output to a well-understood Machine Learning task. Extremely important!
 Having good features can make big difference in the quality of delivered results (even more than the choice of the model)
  12. Risks related to ML • Unstable data • programming fault

    (despite outliers reduction) a.k.a. What to test?
  13. Risks related to ML • Unstable data • programming fault

    (despite outliers reduction) • Underfitting • the learning function does not take into account enough information to accurately model the phenomenon a.k.a. What to test?
  14. Risks related to ML • Unstable data • programming fault

    (despite outliers reduction) • Underfitting • the learning function does not take into account enough information to accurately model the phenomenon • Overfitting • the learning function does not generalise enough to properly model the phenomenon a.k.a. What to test?
  15. Risks related to ML • Unstable data • programming fault

    (despite outliers reduction) • Underfitting • the learning function does not take into account enough information to accurately model the phenomenon • Overfitting • the learning function does not generalise enough to properly model the phenomenon • Unpredictable Future (Online Evaluation) • We don’t actually know if our model is working or not! 
 (running time checking) a.k.a. What to test?
  16. Deal with Unstable Data –Warning, There’s an attempt of a

    joke up there. “Cross Validation, RMSE, and Grid Search walk into a bar. 
 The bartender looks up and says: Who the heck are you?”
  17. Deal with Unstable Data Model Evaluation –Warning, There’s an attempt

    of a joke up there. “Cross Validation, RMSE, and Grid Search walk into a bar. 
 The bartender looks up and says: Who the heck are you?”
  18. Model Evaluation • (Recall) The goal of prototyping is to

    find 
 the right model to fit the data
  19. Model Evaluation • (Recall) The goal of prototyping is to

    find 
 the right model to fit the data • The model must be evaluated on a dataset that is statistically independent from the one it was trained on • (Why?) Because performance on the training set is an overly optimistic estimate of the true performance
  20. Model Evaluation • (Recall) The goal of prototyping is to

    find 
 the right model to fit the data • The model must be evaluated on a dataset that is statistically independent from the one it was trained on • (Why?) Because performance on the training set is an overly optimistic estimate of the true performance • This gives an estimate of the generalisation error
  21. Performance Metrics • Different Machine Learning Tasks have different Performance

    Metrics • Examples (in this talk) • Classification: Average Accuracy, Confusion Matrix • Regression: RMSE, R2
  22. RMSE: The closer to zero, 
 the better the model

    performance R2 Score: The closer to one, 
 the better the model performance
  23. 1

  24. 1 2

  25. (a couple of) interesting things left behind Fuzz testing or

    fuzzing is an (automated) software testing technique that involves providing invalid, unexpected, 
 or random data to the inputs (source: Wikipedia) https://hypothesis.readthedocs.org/en/latest/ Check out Hypothesis