Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pitfalls of Machine Learning

Pitfalls of Machine Learning

Roman Roznik (Kiwi.Com, Data Science Slave) @ Moscow Python Conf 2017
"No doubt machine learning is a hot topic in recent years, it seem's everybody can easily become a data scientist and do ML within few lines of code. Reality is much harder. Understanding the problem, preparing right training data, cleaning them, designing features, interpretability / complexity of the model, defining right metrics, looking at false positives / negatives, interpretation of ML results or AB tests - those are topics highly tied with data science that are often overlooked and underrate. I'd like to emphasize that those are very important and ML itself is just a one small piece of complex data science puzzle. Not a single line of a code in this talk".
Video: https://conf.python.ru/ml-pitfalls/

Moscow Python Meetup

October 20, 2017
Tweet

More Decks by Moscow Python Meetup

Other Decks in Programming

Transcript

  1. About Supervised ML, numbers based, easy Problem, data, features, model,

    metrics, domain knowledge, customers, overengineering, cause & effect, brain, AB tests, … Why?
  2. Data This is data, do the science. It’s not data.

    Clean them. Still they are biased. Train / test / (validation) split. https://arxiv.org/pdf/1611.04135.pdf
  3. Model NN: MCLXII + CLXI = MCCCXXIII Algorithm: 1162 +

    161 = 1323 The simpler the better Model / metaparams Overfitting, learning the bias Try random features
  4. ASS pyramid pull out of one's ass have a think

    stats ML AB ape / 50 / 0$ / 0s human / 100 / 10$ / 1min analyst / 120 / 100$ / 1h data scientist / 200 / 10000$ / 1month God who / IQ / $ / t
  5. :-(

  6. The end Avoid ML Reveal the blackbox Look at data,

    metrics, FP / FN Be aware ML does NEVER do what you think it does Use brain & common sense