Slide 1

Slide 1 text

MACHINE LEARNING for the rest of us @providenz

Slide 2

Slide 2 text

$ whoami @providenz

Slide 3

Slide 3 text

datamateur $ whoami

Slide 4

Slide 4 text

Machine learning ? Pieces Chambres Surface Quartier Prix 3 2 90 Montferrand 137000 3 3 76 La Plaine 168500 7 5 160 Les Cézeaux 276500 6 4 126 Saint Alyre 149000 … … … … …

Slide 5

Slide 5 text

Machine learning ? Pieces Chambres Surface Quartier Prix 8 4 101 Pré de la reine ??????

Slide 6

Slide 6 text

Machine learning if block == sthg: if area < 90: if rooms > 3: if bedrooms >= 2: sq_meters_price = XX if …

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Machine learning ? phase 1/2: apprentissage ALGO Moult voire + MODÈLE Données

Slide 9

Slide 9 text

Machine learning ? phase 2/2: prédiction Prédiction MODÈLE Pieces Chambres Surface Quartier Prix 8 4 101 Pré de la reine ??????

Slide 10

Slide 10 text

Applications Vision Moteurs de recherche Antispam Systèmes de recommandations Finance Détection de fraude Médical Analyse de sentiment Détection de langue Shazam Siri Art …

Slide 11

Slide 11 text

2 types de problèmes Classification Régression

Slide 12

Slide 12 text

Classification

Slide 13

Slide 13 text

Iris Setosa Versicolor Virginica

Slide 14

Slide 14 text

Régression

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Implémentations

Slide 17

Slide 17 text

En python scikit-learn gensim orange …

Slide 18

Slide 18 text

Hors python Weka: http://www.cs.waikato.ac.nz/ml/weka/ Vowpal wabbit: https://github.com/JohnLangford/ vowpal_wabbit Shogun: http://www.shogun-toolbox.org/

Slide 19

Slide 19 text

Predictives apis BIGML : bigml.com GOOGLE PREDICTION API aka ML as a service

Slide 20

Slide 20 text

Python

Slide 21

Slide 21 text

Process Spécifier le problème Collecter et préparer les données Feature engineering Sélectionner un algo Training Validation Prédiction

Slide 22

Slide 22 text

Feature engineering Pieces Chambres Surface Lat Long Prix 3 2 90 45.77966 3.08628 137000 Pieces Chambres Surface Quartier Prix 3 2 90 137000

Slide 23

Slide 23 text

Pieces Chambres Surface Quartier 3 2 90 Montferrand 3 3 76 La Plaine 7 5 160 Les Cézeaux 6 4 126 Saint Alyre Prix 137000 168500 276500 149000 Features Target 3 2 68 N/A 3 2 67 Gare 93000 112000 Train Test Cross validation

Slide 24

Slide 24 text

SCIKIT LEARN http://scikit-learn.org/ pip install scikit-learn

Slide 25

Slide 25 text

Compléments ipython pandas numpy scipy hdf5 nltk

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Classifiers (Algos) random forest k-means support vector machines lasso ridge regression …

Slide 28

Slide 28 text

Notebooks

Slide 29

Slide 29 text

Warnings

Slide 30

Slide 30 text

Trop peu de données

Slide 31

Slide 31 text

Warnings Corrélation != rapport de causes à effets Spurious correlations http://tylervigen.com/

Slide 32

Slide 32 text

Ressources http://scikit-learn.org/ http://pandas.pydata.org/ http://matplotlib.org/ http://stanford.edu/~mwaskom/software/seaborn/ https://www.kaggle.com/ libs challenges

Slide 33

Slide 33 text

Ressources: livres Bootstrapping machine learning: http:// www.louisdorard.com/machine-learning-book/ Building Machine Learning Systems with Python: https:// www.packtpub.com/big-data-and-business-intelligence/ building-machine-learning-systems-python Programming collective intelligence : http:// www.amazon.com/gp/product/0596529325/ An Introduction to Statistical Learning: http://www- bcf.usc.edu/~gareth/ISL/ La statistique sans formule mathématique Bernard Py

Slide 34

Slide 34 text

https://www.flickr.com/photos/52569650@N00/ https://www.flickr.com/photos/haglundc/ Crédits photo

Slide 35

Slide 35 text

Merci @providenz