Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[StartupCourse/18] Discover Machine Learning

[StartupCourse/18] Discover Machine Learning

You want to predict the future ?

Machine Learning is not so complex !

Let's discover it!

Fabien Vauchelles

April 12, 2016
Tweet

More Decks by Fabien Vauchelles

Other Decks in Technology

Transcript

  1. BACK TO MACHINE LEARNING If you know the passengers list:

    • Gender • Age • Ticket class • Does he survived ? You can create a Decision Tree ... ... for this Supervised Problem !
  2. A feature is a column in a dataset. Age of

    passengers is a feature !
  3. DATA ANALYSIS / DISTRIBUTION 5% 25% 50% 75% 95% Quantile

    1 Quantile 2 Quantile 3 Quantile 4 Outlier BOX PLOT
  4. DATA ANALYSIS / MISSING VALUES Choose replace value carefully !

    10 Nan 20 Nan 30 10 0 20 0 30 Mean = 12 BAD !!!
  5. DATA ANALYSIS / MISSING VALUES Fill empty value with median:

    20 10 Nan 20 Nan 30 10 20 20 20 30 Mean = 20 GOOD !!!
  6. DATA ANALYSIS / REMOVE OUTLIERS Use median: 30 Mean =

    28 GOOD !!! 10 20 20 30 30 10 20 30 1000 50
  7. Surface (m2) Rooms Bedrooms Garden (m2) Price (€) 200 5

    2 200 500 000 100 3 1 0 200 000 300 5 2 300 800 000 150 4 2 100 300 000 200 4 1 200 ? REGRESSION Find house price:
  8. ALGORITHMS DECISION TREE DEEP LEARNING REGRESSION CLUSTERING BAYESIAN NLP Linear

    Regression Logistic Regression Convolutional Neural Network Deep Boltzmann Machine Recurrent Neural Network Gaussian Naive Bayes Multinomial Naive Bayes Bayesian Network k-Means k-Medians Hierarchical Clustering Perceptron Random Forest Gradient Boosting XGBoost TF-IDF Word2Vec
  9. DECISION TREES / SPLIT Passenger with class 3 ? Adult

    passenger ? Male passenger ? Alive Dead
  10. DECISION TREES / SPLIT Passenger Class 3 ? YES NO

    DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  11. DECISION TREES / SPLIT Adult ? YES NO DEAD Passenger

    Class 3 ? YES NO DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  12. DECISION TREES / SPLIT ALIVE DEAD Male ? YES NO

    Adult ? YES NO DEAD Passenger Class 3 ? YES NO DEAD Passenger with class 3 ? Adult passenger ? Male passenger ? Alive Dead
  13. CLUSTERING / DISTANCE Euclidienne Manhattan d(A,B) = |X B -X

    A |+|Y B -Y A | d(A,B) = sqrt[(X B -X A )2+(Y B -Y A )2]
  14. PRECISION Prediction Positive Negative Reality True True positive False negative

    False False Positive True negative Precision = True positive True positive + False positive
  15. RECALL Prediction Positive Negative Reality True True positive False negative

    False False Positive True negative Recall = True positive True positive + False negative
  16. TRAIN & TEST X y X y y_predict Δ TEST

    (30%) TRAIN (70%) random
  17. RESOURCES • Coursera Machine Learning https://www.coursera.org/learn/machine-learning • Fondamentaux et Etudes

    de cas, Eric Biernat http://www.amazon.fr/dp/2212142439 • Meetup Machine Learning Paris http://www.meetup.com/fr-FR/Paris-Machine-learning-applications-group/ • Le Meilleur Data Scientist de France http://www.meetup.com/fr-FR/FrenchData/events/228508819/