Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pitfalls of Machine Learning

Pitfalls of Machine Learning

Roman Roznik (Kiwi.Com, Data Science Slave) @ Moscow Python Conf 2017
"No doubt machine learning is a hot topic in recent years, it seem's everybody can easily become a data scientist and do ML within few lines of code. Reality is much harder. Understanding the problem, preparing right training data, cleaning them, designing features, interpretability / complexity of the model, defining right metrics, looking at false positives / negatives, interpretation of ML results or AB tests - those are topics highly tied with data science that are often overlooked and underrate. I'd like to emphasize that those are very important and ML itself is just a one small piece of complex data science puzzle. Not a single line of a code in this talk".
Video: https://conf.python.ru/ml-pitfalls/

Moscow Python Meetup
PRO

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. Roman Rožník
    Pitfalls of Machine Learning

    View Slide

  2. About
    Supervised ML, numbers based, easy
    Problem, data, features, model, metrics, domain knowledge, customers,
    overengineering, cause & effect, brain, AB tests, …
    Why?

    View Slide

  3. Data
    This is data, do the science.
    It’s not data.
    Clean them.
    Still they are biased.
    Train / test / (validation) split.
    https://arxiv.org/pdf/1611.04135.pdf

    View Slide

  4. Features
    Aren’t they from future?
    The less the better.
    https://medium.com/@michalillich/how-google-s-financial-predictor-predicts-the-
    past-58dc4d644703

    View Slide

  5. Model
    NN: MCLXII + CLXI = MCCCXXIII
    Algorithm: 1162 + 161 = 1323
    The simpler the better
    Model / metaparams
    Overfitting, learning the bias
    Try random features

    View Slide

  6. Metrics
    $ / CR / CTR / relevance / presence / FP / FN

    View Slide

  7. Define the problem
    Understand your customers / business

    View Slide

  8. Overengineering
    Complex / complicated
    Evolving to monster

    View Slide

  9. Ceiling analysis

    View Slide

  10. View Slide

  11. AB tests

    View Slide

  12. AB tests

    View Slide

  13. Number intelligence
    :-(

    View Slide

  14. ASS pyramid
    pull out of one's ass
    have a think
    stats
    ML
    AB
    ape / 50 / 0$ / 0s
    human / 100 / 10$ / 1min
    analyst / 120 / 100$ / 1h
    data scientist / 200 / 10000$ / 1month
    God
    who / IQ / $ / t

    View Slide

  15. :-(

    View Slide

  16. The end
    Avoid ML
    Reveal the blackbox
    Look at data, metrics, FP / FN
    Be aware ML does NEVER do what you think it does
    Use brain & common sense

    View Slide