What is your ML test score?

What is your machine learning test score? Tania Allard, PhD
Developer Advocate @ Microsoft Google Developer expert - ML / Tensorflow

2 Let’s avoid disappointment @ixek

3 Scoring is also called prediction, and is the process
of generating values based on a trained machine learning model, given some new input data. @ixek

4 Scores may refer to a quantification of a model
or algorithm performance on various metrics. @ixek

5 @ixek So what are we talking about?

6 This is what we are covering: @ixek • Machine
learning systems validation / quality assurance • How to establish clear testing responsibilities • How to establish a rubric to measure how good we are at testing • We are not covering generic software engineering best practices • Or specific techniques like unit-testing, smoke or pen testing • This is not a technical dive on ML learning testing strategies

7 Why do we need testing or quality assurance anyway?
@ixek

8 The “subtle” differences between production systems and offline or
R&D examples @ixek

9 The (ML) systems are continuously evolving: from collecting and
aggregating more data, to retraining models and improving their accuracy @ixek

10 @ixek Pet projects can be a bit more forgiving

11 We can also get some good laughs... @ixek https://www.reddit.com/r/funny/comments
/7r9ptc/i_took_a_few_shots_at_lake_louis e_today_and/dsvv1nw/

12 A high number of false negatives or type-II errors
can lead to havoc (i.e. healthcare and financial sectors) @ixek

13 @ixek Automation bias: “The tendency to disregard or not
search for contradictory information in light of a computer-generated solution that is accepted as correct” (Parasuraman & Riley, 1997)

14 @ixek

15 Quality control and assurance should be performed before the
consumption by users to increase the reliability and reduce bias in our systems @ixek

16 Where do unit tests fit in software? @ixek

17 @ixek

18 @ixek If only ML looked like this

19 @ixek But they look a bit more like this

20 @ixek So what do we test?

21 @ixek What should we keep an eye on

22 Who is responsible? @ixek

23 Keeping a score @ixek For manual testing 1 point
Automated testing 1 point

24 @ixek Features and data

25 @ixek Test your features and distributions Do they match
your expectations? From the iris data set: is the sepal length consistent? Is the width what you’d expect?

26 @ixek The cost of each feature

27 @ixek Test the correlation between features and target

28 @ixek https://www.tylervigen.com/spurious-correlations

29 @ixek Test the correlation between features and target

30 @ixek Test your privacy control across the pipeline Towards
the science of security and privacy in Machine Learning. N Papernot, P McDaniel et al. https://pdfs.semanticscholar.org/ebab/687cd1be7d25392c11f89fce6a63bef7219d.pdf

31 @ixek Great expectations - Python package Test all code
that creates input features

32 Model development

33 @ixek Best practices

34 @ixek Every piece of code is peer reviewed

35 @ixek Test the impact of each tunable hyperparameter

36 @ixek Test for model staleness

37 @ixek Test against a simpler model

38 @ixek Test for implicit bias

39 Infrastructure @ixek

40 @ixek Integration of the full pipeline From ingestion through
training and serving

41 @ixek Test model quality before serving Test against known
output data

42 @ixek Test how quickly and safely you can rollback

43 Test the reproducibility of training Train at least two
models on the same data: differences in aggregated metrics, sliced metrics or example-example predictions. @ixek

44 @ixek Adding up

45 Getting your score 3. Add points for infrastructure @ixek
2. Add points for development 1. Add points for features and data Which is your lowest score???

46 0 points: not production ready 1-2 points: might have
reliability holes 3-4 points: reasonably tested 5-6 points: good level of testing 7+ points: very strong levels of automated testing @ixek

47 Thank you @ixek

Rate today’s session Session page on conference website O’Reilly Events
App

What is your ML test score?

What is your ML test score?

More Decks by Tania Allard

Other Decks in Programming

Featured

Transcript