1 Représenter le problème en données Comprendre données ⇔ problème Comprendre et utiliser le Machine Learning 5 Évaluer la solution choix métier ⇔ performance algorithme 6 Communiquer la solution et prendre des décisions 7 Utilisateur satisfait 2 3
S c'est quoi? C'est un environnement pour l'analyse statistique 1976 S développé en Fortran par les laboratoires Bell. 1988 Réécriture en C. 1991 Début de l'implémentation de R par l'université d'Aukland. 2007 Création de Revolution Analytics pour le support commercial de R 2008 S-PLUS devient la propriété de TIBCO. 2015 Création du consortium R Acquisition de Revolution Analytics par Microsoft R en version 3.2
Caroline F M F 24 30 23 Représentation Vector Alice Bob Carol.. F M F 24 30 23 Représentation DataFrame prenoms sexe ages monDF[2, 'prenoms'] monDF[2, 1] monDF$prenoms[2] 1 2 3
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! i wanted to share with you what I feel are the remaining "to dos" Make big money foreclosed real estate in your area! i wanted to share with you what feel are the remaining "to dos" make big money foreclosed real estate in your area! Minuscule
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! i wanted to share with you what feel are the remaining "to dos" make big money foreclosed real estate in your area! i wanted to share with you what feel are the remaining dos make big money foreclosed real estate in your area Ponctuation et caractères spéciaux
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! i wanted to share with you what feel are the remaining dos make big money foreclosed real estate in your area wanted share feel remaining dos make big money foreclosed real estate area Stop words
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! wanted share feel remaining dos make big money foreclosed real estate area want share feel remain dos make big money foreclos real estat area Stemming
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 Doc1 0 0 0 0 0 1 1 1 1 1 1 1 Doc2 Bag of Words i wanted to share with you what I feel are the remaining "to dos" Make big money with foreclosed real estate in your area!
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Doc1 Doc2 Bag of Words She wants to make sure that all the sharing remains good for you! Doc3
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Doc1 Doc2 Bag of Words She wants to make sure that all the sharing remains good for you! Doc3 1 0 0 0 0 0 0 0 0 0 0 0
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Doc1 Doc2 Bag of Words She wants to make sure that all the sharing remains good for you! Doc3 1 0 0 0 0 1 0 0 0 0 0 0
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Doc1 Doc2 Bag of Words She wants to make sure that all the sharing remains good for you! Doc3 1 1 0 0 0 1 0 0 0 0 0 0
I feel are the remaining "to dos" Make big money with foreclosed real estate in your area! want share feel remain dos make big money foreclos real estat area 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 Doc1 Doc2 Bag of Words She wants to make sure that all the sharing remains good for you! Doc3 1 1 0 1 0 1 0 0 0 0 0 0
Comment prédire une variable catégorielle (spam/ham) ? ⇒ régression logistique (prédire la probabilité qu’un résultat soit vrai, P(y = Classe) probabilité = un chiffre entre 0 et 1 (inclus)
manipulé la capacité de génération d'énergie et forcer les prix à la hausse • Les enquêteurs cherchent à déterminer le rôle Enron pendant la crise et estiment une amende de $1.52 Milliards
des mots clés ◦ “prix de l'électricité” ◦ “planning distribution d’énergie” • Cette méthode prend beaucoup de temps et coûte chère ◦ Un avocat peut étudier entre 80-125 e-mails (avec PJ) par jour ◦ Les honoraires peuvent coûter jusqu’à $1 000/jour ◦ La justice accepte seulement les documents vérifiés par un avocat agrée