{Machine, Deep} Learning for software engineers

1b324e4900e79878eb518c1263b41795?s=47 Piotr Migdał
November 30, 2016

{Machine, Deep} Learning for software engineers

An opinionated recommendation for data science: Python environemnt, scikit-learn for Machine Learning and Keras for Deep Learning. At Product Tech Stories Meetup @ Codility

1b324e4900e79878eb518c1263b41795?s=128

Piotr Migdał

November 30, 2016
Tweet

Transcript

  1. {Machine, Deep} Learning for software engineers dr Piotr Migdał freelancer

    / deepsense.io Product Tech Stories Meetup @ Codility
 Warsaw, 30 Nov 2016
  2. http://xkcd.com/1425/

  3. But… https://laughingsquid.com/park-or-bird-a-national-park-and-bird-identifying-app-inspired-by-an-xkcd-comic/ …and now a simple exercise in Deep Learning

  4. http://deepsense.io/deep-learning-right-whale-recognition-kaggle/

  5. ML and DL progress • image recognition, neural style, word

    analogies, per-char translations, playing ATARI games, Go, [no idea what’s next] • fast-paced (more than my quantum physics PhD):
 6 month ago a breakthrough, now a baseline • (no questions about Singularity please!)
  6. https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learning-julia/

  7. Challenges • data science is both statistics and programming •

    ML algorithms base on randomness and data • trying a wide array of options & parameters • unavoidable research-production overlap
  8. http://www.economist.com/news/business/21695908- silicon-valley-fights-talent-universities-struggle-hold-their

  9. What I {use, teach}? • general Machine Learning:
 scikit-learn (in

    Python) • general Deep Learning:
 Keras (in Python) • spaCy+gensim, SparkML, Neptune, …
  10. Why Python? • de facto standard for ML/DL • sane

    language + new stuff + Jupyter Notebook • not R, MATLAB or Julia?
 http://sebastianraschka.com/blog/2015/why-python.html • not JavaScript? oh, wait…
 http://cs.stanford.edu/people/karpathy/convnetjs/ • warning: Python 2.7 is still the default :/
  11. Machine Learning scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

  12. scikit-learn
 http://scikit-learn.org • many popular techniques with the same interface

    • fast, reliable • good documentation • XGBoost has its interface • not much for time series (statsmodels, R forecast) • or natural language processing (spaCy, gensim)
  13. Code example

  14. Performance of 500 trees https://github.com/szilard/benchm-ml http://datascience.la/benchmarking-random-forest-implementations/

  15. Deep Learning

  16. None
  17. http://arxiv.org/abs/1508.06576 research paper on arXiv code on GitHub online tools

    (e.g. deepart.io) Aug ‘15 Dec ‘15 Oct ‘15 movies Apr ‘16
  18. Keras https://keras.io/ • Theano or TensorFlow backend • abstraction at

    the right level
 (the rule of least power) • a LOT of EASY examples for NEW techniques • (yes, we can do a sparse Matrix Factorisation) • also for JavaScript, with GPU support :)
 https://github.com/transcranial/keras-js
  19. Code example

  20. VGG16 in Keras

  21. https://twitter.com/fchollet/status/765212287531495424 DL framework popularity

  22. keras2ascii https://gist.github.com/stared/8411d4e7e457b0f14f39d700afc8511c

  23. other tools

  24. spaCy + gensim + pyLDAvis http://press.deepsense.io/10012-deepsense-io-brings-big-data-to-the- united-nations-office-of-information-and-communications-technology

  25. Neptune http://neptune.deepsense.io and Try Neptune (private beta) and wait a

    few days… or drop me an email of you want it now :)
  26. Where to learn…? • scikit-learn:
 http://p.migdal.pl/2016/03/15/data-science-intro-for-math-phys-background.html 
 and the Machine

    Learning section • Keras:
 https://gist.github.com/stared/70daf8e0334abf6e7527259e7221f568 
 and references there • everything:
 http://workshops.deepsense.io
  27. Thank you! http://p.migdal.pl pmigdal@gmail.com “linear space of words (word2vec vis)”

    “dating for nerds” coming soon: see: data science stuff + quantum game