An opinionated recommendation for data science: Python environemnt, scikit-learn for Machine Learning and Keras for Deep Learning. At Product Tech Stories Meetup @ Codility
But… https://laughingsquid.com/park-or-bird-a-national-park-and-bird-identifying-app-inspired-by-an-xkcd-comic/ …and now a simple exercise in Deep Learning
ML and DL progress • image recognition, neural style, word analogies, per-char translations, playing ATARI games, Go, [no idea what’s next] • fast-paced (more than my quantum physics PhD): 6 month ago a breakthrough, now a baseline • (no questions about Singularity please!)
Challenges • data science is both statistics and programming • ML algorithms base on randomness and data • trying a wide array of options & parameters • unavoidable research-production overlap
What I {use, teach}? • general Machine Learning: scikit-learn (in Python) • general Deep Learning: Keras (in Python) • spaCy+gensim, SparkML, Neptune, …
Why Python? • de facto standard for ML/DL • sane language + new stuff + Jupyter Notebook • not R, MATLAB or Julia? http://sebastianraschka.com/blog/2015/why-python.html • not JavaScript? oh, wait… http://cs.stanford.edu/people/karpathy/convnetjs/ • warning: Python 2.7 is still the default :/
scikit-learn http://scikit-learn.org • many popular techniques with the same interface • fast, reliable • good documentation • XGBoost has its interface • not much for time series (statsmodels, R forecast) • or natural language processing (spaCy, gensim)
Keras https://keras.io/ • Theano or TensorFlow backend • abstraction at the right level (the rule of least power) • a LOT of EASY examples for NEW techniques • (yes, we can do a sparse Matrix Factorisation) • also for JavaScript, with GPU support :) https://github.com/transcranial/keras-js
Thank you! http://p.migdal.pl [email protected] “linear space of words (word2vec vis)” “dating for nerds” coming soon: see: data science stuff + quantum game