Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Machine Learning Under Test @ EuroPython2015

Machine Learning Under Test @ EuroPython2015

One point usually underestimated or omitted when dealing with
machine learning algorithms is how to write *good quality* code.
The obvious way to face this issue is to apply automated testing, which aims at implementing (likely) less-buggy and higher quality code.

However, testing machine learning code introduces additional concerns that has to be considered. On the one hand, some constraints are imposed by the domain, and the risks intrinsically related to machine learning methods, such as handling unstable data, or avoid under/overfitting. On the other hand, testing scientific code requires additional testing tools (e.g., `numpy.testing`), specifically suited to handle numerical data.

In this talk, some of the most famous machine learning techniques will be discudded and analysed from the `testing` point of view, emphasizing that testing would also allow for a better understanding of how the whole learning model works under the hood.

The talk is intended for an *intermediate* audience.
The content of the talk is intended to be mostly practical, and code
oriented. Thus a good proficiency with the Python language is **required**.
Conversely, **no prior knowledge** about testing nor Machine Learning
algorithms is necessary to attend this talk.

Valerio Maggio

July 20, 2015
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. Machine learning under test PostDoc @ University of Salerno, Italy

    EuroPython 2015 @ Bilbao Valerio Maggio @leriomaggio
  2. Testing Machine Learning Code & Algorithms (a.k.a. Models) • Part

    1: Common Risks and Pitfalls 
 (related to learning models) • Part 2: Testing Machine Learning Code • What does it mean? • What tools I’m required to use?
  3. Please answer to Five questions THREE questions • Do you

    already know what Machine Learning is? • Do you already know/use/hear about Testing ? • Have you ever used Scikit-Learn?
  4. So, what is Machine Learning? Machine learning is the systematic

    study of algorithms and systems that improve their knowledge or performance with experience T. Mitchell, 1997
  5. So Basically… Machine learning teaches machines how to carry out

    tasks by themselves. It is that simple. 
 The complexity comes with the details. W. Richert & L.P. Coelho, 2013 Building Machine Learning Systems with Python
  6. Risk with Machine Learning • Unstable data • programming fault

    (despite outliers reduction) • Underfitting • the learning function does not take into account enough information to accurately model the phenomenon • Overfitting • the learning function does not generalise enough to properly model the phenomenon • Unpredictable Future • We don’t actually know if our model is working or not! 
 (running time checking) a.k.a. What to test?
  7. 1 • All 150 training examples are correctly identified •

    Polynomial Degree 4 for features • This does not mean that our model is perfect! • Indeed, it is far from that! • We can simulate this by splitting our data into a training set and a testing set.
  8. How to evaluate the performance of a Regression Model More

    accurate when comparing multiple models!
  9. RMSE: The closer to zero, 
 the better the model

    performance R2 Score: The closer to one, 
 the better the model performance
  10. (one of) the interesting things left behind Fuzz testing or

    fuzzing is an (automated) software testing technique that involves providing invalid, unexpected, 
 or random data to the inputs (source: Wikipedia) https://hypothesis.readthedocs.org/en/latest/ Check out Hypothesis