Upgrade to Pro — share decks privately, control downloads, hide ads and more …

{Machine, Deep} Learning for software engineers

Piotr Migdał
November 30, 2016

{Machine, Deep} Learning for software engineers

An opinionated recommendation for data science: Python environemnt, scikit-learn for Machine Learning and Keras for Deep Learning. At Product Tech Stories Meetup @ Codility

Piotr Migdał

November 30, 2016
Tweet

More Decks by Piotr Migdał

Other Decks in Programming

Transcript

  1. {Machine, Deep} Learning
    for software engineers
    dr Piotr Migdał
    freelancer / deepsense.io
    Product Tech Stories Meetup @ Codility

    Warsaw, 30 Nov 2016

    View Slide

  2. http://xkcd.com/1425/

    View Slide

  3. But…
    https://laughingsquid.com/park-or-bird-a-national-park-and-bird-identifying-app-inspired-by-an-xkcd-comic/
    …and now a simple exercise in Deep Learning

    View Slide

  4. http://deepsense.io/deep-learning-right-whale-recognition-kaggle/

    View Slide

  5. ML and DL progress
    • image recognition, neural style, word analogies,
    per-char translations, playing ATARI games, Go,
    [no idea what’s next]
    • fast-paced (more than my quantum physics PhD):

    6 month ago a breakthrough, now a baseline
    • (no questions about Singularity please!)

    View Slide

  6. https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learning-julia/

    View Slide

  7. Challenges
    • data science is both statistics and programming
    • ML algorithms base on randomness and data
    • trying a wide array of options & parameters
    • unavoidable research-production overlap

    View Slide

  8. http://www.economist.com/news/business/21695908-
    silicon-valley-fights-talent-universities-struggle-hold-their

    View Slide

  9. What I {use, teach}?
    • general Machine Learning:

    scikit-learn (in Python)
    • general Deep Learning:

    Keras (in Python)
    • spaCy+gensim, SparkML, Neptune, …

    View Slide

  10. Why Python?
    • de facto standard for ML/DL
    • sane language + new stuff + Jupyter Notebook
    • not R, MATLAB or Julia?

    http://sebastianraschka.com/blog/2015/why-python.html
    • not JavaScript? oh, wait…

    http://cs.stanford.edu/people/karpathy/convnetjs/
    • warning: Python 2.7 is still the default :/

    View Slide

  11. Machine Learning
    scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

    View Slide

  12. scikit-learn

    http://scikit-learn.org
    • many popular techniques with the same interface
    • fast, reliable
    • good documentation
    • XGBoost has its interface
    • not much for time series (statsmodels, R forecast)
    • or natural language processing (spaCy, gensim)

    View Slide

  13. Code example

    View Slide

  14. Performance
    of 500 trees
    https://github.com/szilard/benchm-ml
    http://datascience.la/benchmarking-random-forest-implementations/

    View Slide

  15. Deep Learning

    View Slide

  16. View Slide

  17. http://arxiv.org/abs/1508.06576
    research paper
    on arXiv
    code
    on GitHub
    online tools
    (e.g. deepart.io)
    Aug ‘15
    Dec ‘15
    Oct ‘15
    movies Apr ‘16

    View Slide

  18. Keras
    https://keras.io/
    • Theano or TensorFlow backend
    • abstraction at the right level

    (the rule of least power)
    • a LOT of EASY examples for NEW techniques
    • (yes, we can do a sparse Matrix Factorisation)
    • also for JavaScript, with GPU support :)

    https://github.com/transcranial/keras-js

    View Slide

  19. Code example

    View Slide

  20. VGG16 in Keras

    View Slide

  21. https://twitter.com/fchollet/status/765212287531495424
    DL framework popularity

    View Slide

  22. keras2ascii
    https://gist.github.com/stared/8411d4e7e457b0f14f39d700afc8511c

    View Slide

  23. other tools

    View Slide

  24. spaCy + gensim + pyLDAvis
    http://press.deepsense.io/10012-deepsense-io-brings-big-data-to-the-
    united-nations-office-of-information-and-communications-technology

    View Slide

  25. Neptune
    http://neptune.deepsense.io and Try Neptune (private beta)
    and wait a few days… or drop me an email of you want it now :)

    View Slide

  26. Where to learn…?
    • scikit-learn:

    http://p.migdal.pl/2016/03/15/data-science-intro-for-math-phys-background.html

    and the Machine Learning section
    • Keras:

    https://gist.github.com/stared/70daf8e0334abf6e7527259e7221f568

    and references there
    • everything:

    http://workshops.deepsense.io

    View Slide

  27. Thank you!
    http://p.migdal.pl
    [email protected]
    “linear space of words (word2vec vis)”
    “dating for nerds”
    coming soon:
    see: data science stuff + quantum game

    View Slide