Slide 1

Slide 1 text

{Machine, Deep} Learning for software engineers dr Piotr Migdał freelancer / deepsense.io Product Tech Stories Meetup @ Codility
 Warsaw, 30 Nov 2016

Slide 2

Slide 2 text

http://xkcd.com/1425/

Slide 3

Slide 3 text

But… https://laughingsquid.com/park-or-bird-a-national-park-and-bird-identifying-app-inspired-by-an-xkcd-comic/ …and now a simple exercise in Deep Learning

Slide 4

Slide 4 text

http://deepsense.io/deep-learning-right-whale-recognition-kaggle/

Slide 5

Slide 5 text

ML and DL progress • image recognition, neural style, word analogies, per-char translations, playing ATARI games, Go, [no idea what’s next] • fast-paced (more than my quantum physics PhD):
 6 month ago a breakthrough, now a baseline • (no questions about Singularity please!)

Slide 6

Slide 6 text

https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learning-julia/

Slide 7

Slide 7 text

Challenges • data science is both statistics and programming • ML algorithms base on randomness and data • trying a wide array of options & parameters • unavoidable research-production overlap

Slide 8

Slide 8 text

http://www.economist.com/news/business/21695908- silicon-valley-fights-talent-universities-struggle-hold-their

Slide 9

Slide 9 text

What I {use, teach}? • general Machine Learning:
 scikit-learn (in Python) • general Deep Learning:
 Keras (in Python) • spaCy+gensim, SparkML, Neptune, …

Slide 10

Slide 10 text

Why Python? • de facto standard for ML/DL • sane language + new stuff + Jupyter Notebook • not R, MATLAB or Julia?
 http://sebastianraschka.com/blog/2015/why-python.html • not JavaScript? oh, wait…
 http://cs.stanford.edu/people/karpathy/convnetjs/ • warning: Python 2.7 is still the default :/

Slide 11

Slide 11 text

Machine Learning scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Slide 12

Slide 12 text

scikit-learn
 http://scikit-learn.org • many popular techniques with the same interface • fast, reliable • good documentation • XGBoost has its interface • not much for time series (statsmodels, R forecast) • or natural language processing (spaCy, gensim)

Slide 13

Slide 13 text

Code example

Slide 14

Slide 14 text

Performance of 500 trees https://github.com/szilard/benchm-ml http://datascience.la/benchmarking-random-forest-implementations/

Slide 15

Slide 15 text

Deep Learning

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

http://arxiv.org/abs/1508.06576 research paper on arXiv code on GitHub online tools (e.g. deepart.io) Aug ‘15 Dec ‘15 Oct ‘15 movies Apr ‘16

Slide 18

Slide 18 text

Keras https://keras.io/ • Theano or TensorFlow backend • abstraction at the right level
 (the rule of least power) • a LOT of EASY examples for NEW techniques • (yes, we can do a sparse Matrix Factorisation) • also for JavaScript, with GPU support :)
 https://github.com/transcranial/keras-js

Slide 19

Slide 19 text

Code example

Slide 20

Slide 20 text

VGG16 in Keras

Slide 21

Slide 21 text

https://twitter.com/fchollet/status/765212287531495424 DL framework popularity

Slide 22

Slide 22 text

keras2ascii https://gist.github.com/stared/8411d4e7e457b0f14f39d700afc8511c

Slide 23

Slide 23 text

other tools

Slide 24

Slide 24 text

spaCy + gensim + pyLDAvis http://press.deepsense.io/10012-deepsense-io-brings-big-data-to-the- united-nations-office-of-information-and-communications-technology

Slide 25

Slide 25 text

Neptune http://neptune.deepsense.io and Try Neptune (private beta) and wait a few days… or drop me an email of you want it now :)

Slide 26

Slide 26 text

Where to learn…? • scikit-learn:
 http://p.migdal.pl/2016/03/15/data-science-intro-for-math-phys-background.html 
 and the Machine Learning section • Keras:
 https://gist.github.com/stared/70daf8e0334abf6e7527259e7221f568 
 and references there • everything:
 http://workshops.deepsense.io

Slide 27

Slide 27 text

Thank you! http://p.migdal.pl [email protected] “linear space of words (word2vec vis)” “dating for nerds” coming soon: see: data science stuff + quantum game