Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Decision Trees, Data Science, and Machine Learn...

Addam Hardy
November 10, 2015

Decision Trees, Data Science, and Machine Learning: Using Entropy to Discover Path to Purchase

Addam Hardy

November 10, 2015
Tweet

More Decks by Addam Hardy

Other Decks in Technology

Transcript

  1. DECISION TREES, DATA SCIENCE & MACHINE LEARNING: USING ENTROPY TO

    DISCOVER PATH TO PURCHASE ADDAM HARDY NWA Tech Summit, 10 Nov 2015
  2. RGB(84,109,172) RGB(61,148,4) RGB(204,52,126) RGB(48,25,245) RGB(78,114,93) RGB(104,178,75) RGB(110,92,8) RGB(49,65,114) RGB(23,55,24) RGB(211,65,23)

    RGB(145,214,222) RGB(73,210,62) RGB(47,123,206) RGB(196,51,120) RGB(96,66,92) RGB(60,134,127) RGB(199,112,182) RGB(110,29,202) RGB(28,215,129) RGB(123,108,150) RGB(121,66,112) RGB(217,159,104) RGB(22,111,250) RGB(33,205,104) RGB(4,62,227) RGB(177,246,42) RGB(160,157,124) RGB(147,180,20) RGB(141,46,211) RGB(189,218,73) RGB(177,154,61) RGB(187,66,117) RGB(200,188,39) RGB(221,41,196) RGB(246,109,30) RGB(13,24,116) RGB(23,24,201) RGB(114,43,52) RGB(6,177,253) RGB(221,98,240) RGB(226,21,242) RGB(238,236,86) RGB(224,9,29) RGB(193,82,149) RGB(8,225,89) RGB(37,102,174) RGB(94,192,111) RGB(106,241,207) RGB(145,221,34) RGB(150,139,147) RGB(234,137,16) RGB(143,208,237) RGB(244,195,105) RGB(74,137,229) RGB(34,194,57) RGB(213,79,231) RGB(15,165,133) RGB(126,110,159) RGB(31,241,243) RGB(231,164,167) RGB(129,166,143) RGB(23,29,145) RGB(72,254,92) RGB(25,106,28) RGB(94,49,177) RGB(93,104,159) RGB(144,97,4) RGB(252,180,13) RGB(115,56,55) RGB(237,18,254) RGB(41,61,11) RGB(15,88,141) RGB(78,17,171) RGB(217,14,177) RGB(35,238,166) RGB(125,214,251) RGB(71,130,184) RGB(158,215,157) RGB(187,26,186) RGB(139,33,250) RGB(133,20,79) RGB(210,141,50) RGB(14,216,90) RGB(168,127,104) RGB(48,239,168) RGB(187,145,139) RGB(243,56,32) RGB(79,77,114) RGB(48,110,46) RGB(46,75,8) RGB(197,132,39) RGB(216,27,62) RGB(138,254,137) RGB(121,76,229) RGB(137,227,190) RGB(190,53,99) RGB(151,13,150) RGB(154,230,60) RGB(171,13,32) RGB(175,126,241) RGB(207,1,47) RGB(161,86,61) RGB(217,222,183) RGB(146,96,23) RGB(155,203,206) RGB(168,189,23) RGB(128,51,186) RGB(230,54,198) RGB(237,237,107) RGB(108,191,228) RGB(49,91,61) RGB(19,43,177) RGB(77,140,115) RGB(87,107,228) RGB(222,1,231) RGB(39,7,4) RGB(236,22,163) RGB(126,186,228) RGB(150,160,5) RGB(45,123,70) RGB(28,206,71) RGB(244,248,65) RGB(130,90,155) RGB(42,254,37) RGB(139,241,164) RGB(125,36,35) RGB(224,187,84) RGB(34,36,156) RGB(172,106,219) RGB(22,7,249) RGB(217,182,237) RGB(251,124,12) RGB(162,189,168) RGB(72,149,79) RGB(38,97,211) RGB(163,100,137) RGB(226,56,28) RGB(9,200,52) RGB(130,12,237) RGB(109,132,69) RGB(39,152,215) RGB(136,216,221) RGB(90,154,59) RGB(24,99,204) RGB(80,121,143) RGB(132,110,250) RGB(12,238,13) RGB(236,134,86) RGB(158,47,208) RGB(100,138,207) RGB(203,240,204) RGB(153,209,18) RGB(181,75,22) RGB(3,156,254) RGB(233,208,39) RGB(122,117,211) RGB(16,8,158) RGB(244,69,201) RGB(101,197,36) RGB(112,235,205) RGB(28,53,11) RGB(178,126,148) RGB(5,101,191) RGB(60,195,71) RGB(40,222,6) RGB(1,97,232) RGB(1,34,34) RGB(57,59,250) RGB(93,219,123)
  3. THERE IS NO LACK OF TOOLS: ID3 DECISION TREES LOGISTIC

    REGRESSION RANDOM FORESTS SUPPORT VECTOR MACHINES NEURAL NETWORKS NAIVE BAYES K-MEANS DEEP BOLTZMANN MACHINE PRINCIPAL COMPONENT ANALYSIS AND ON.. AND ON..
  4. THERE IS NO LACK OF TOOLS: ID3 DECISION TREES LOGISTIC

    REGRESSION RANDOM FORESTS SUPPORT VECTOR MACHINES NEURAL NETWORKS NAIVE BAYES K-MEANS DEEP BOLTZMANN MACHINE PRINCIPAL COMPONENT ANALYSIS AND ON.. AND ON..
  5. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Outlook Temp Humidity Windy

    Run? Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No
  6. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Outlook Temp Humidity Windy

    Run? Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No PREDICTORS TARGET
  7. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) ITERATIVE DICHOTOMISER 3 (ID3)

    IS A TOP DOWN, GREEDY SEARCH THROUGH THE SPACE OF POSSIBLE BRANCHES WITH NO BACK TRACKING. USING THIS METHOD, WE CAN PARTITION A DATA SET AND MEASURE ENTROPY AND INFORMATION GAIN AS THE DATA IS SPLIT TO DETERMINE THE OPTIMAL STRUCTURE TO CONSTRUCT A DECISION TREE.
  8. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) ITERATIVE DICHOTOMISER 3 (ID3)

    IS A TOP DOWN, GREEDY SEARCH THROUGH THE SPACE OF POSSIBLE BRANCHES WITH NO BACK TRACKING. USING THIS METHOD, WE CAN PARTITION A DATA SET AND MEASURE ENTROPY AND INFORMATION GAIN AS THE DATA IS SPLIT TO DETERMINE THE OPTIMAL STRUCTURE TO CONSTRUCT A DECISION TREE. ALRIGHT, ENOUGH DEFINITION
  9. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No 9

    5 3 GE 7 2 3 GE . 1 Run Yes No 9 5 SINGLE ATTRIBUTE CALCULATION
  10. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No 9

    5 3 GE 7 2 3 GE . 1 2 3 GE .(*- 1(*- Run Yes No 9 5 SINGLE ATTRIBUTE CALCULATION
  11. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No 9

    5 3 GE 7 2 3 GE . 1 2 3 GE .(*- 1(*- 2 3 GE ) ,/ ) /- Run Yes No 9 5 SINGLE ATTRIBUTE CALCULATION
  12. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No 9

    5 3 GE 7 2 3 GE . 1 2 3 GE .(*- 1(*- 2 3 GE ) ,/ ) /- 2 ) ,/ + ) ,/ ) /- + ) /- Run Yes No 9 5 SINGLE ATTRIBUTE CALCULATION
  13. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No 9

    5 3 GE 7 2 3 GE . 1 2 3 GE .(*- 1(*- 2 3 GE ) ,/ ) /- 2 ) ,/ + ) ,/ ) /- + ) /- 2 ) 1- SINGLE ATTRIBUTE CALCULATION
  14. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 5 Outlook Overcast 4 0 4 Rainy 2 3 3 14 MULTIPLE ATTRIBUTE CALCULATION
  15. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 5 Outlook Overcast 4 0 4 Rainy 2 3 3 14 3 GE 7 5 G 2 6 3 , + 6 5 =E G 3 - ) 6 7 3 + , MULTIPLE ATTRIBUTE CALCULATION
  16. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 5 Outlook Overcast 4 0 4 Rainy 2 3 3 14 3 GE 7 5 G 2 6 3 , + 6 5 =E G 3 - ) 6 7 3 + , 2 .(*- ) 10* -(*- ) ) .(*- ) 10* MULTIPLE ATTRIBUTE CALCULATION
  17. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 5 Outlook Overcast 4 0 4 Rainy 2 3 3 14 3 GE 7 5 G 2 6 3 , + 6 5 =E G 3 - ) 6 7 3 + , 2 .(*- ) 10* -(*- ) ) .(*- ) 10* 2 ) /1, MULTIPLE ATTRIBUTE CALCULATION
  18. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) INFORMATION GAIN IS THE

    DIFFERENCE IN ENTROPY BEFORE AND AFTER THE PARTITION IN DATA.
  19. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 Outlook Overcast 4 0 Rainy 2 3 4 9 2 3 GE 9 3 GE 9 INFORMATION GAIN CALCULATION Run Yes No Hot 2 2 Temp Mild 4 2 Cool 3 1 Gain = 0.029
  20. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 Outlook Overcast 4 0 Rainy 2 3 4 9 2 3 GE 9 3 GE 9 INFORMATION GAIN CALCULATION Run Yes No Hot 2 2 Temp Mild 4 2 Cool 3 1 Gain = 0.029 4 7 5 G 2 3 7 3 7 5 G
  21. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 Outlook Overcast 4 0 Rainy 2 3 4 9 2 3 GE 9 3 GE 9 INFORMATION GAIN CALCULATION Run Yes No Hot 2 2 Temp Mild 4 2 Cool 3 1 Gain = 0.029 4 7 5 G 2 3 7 3 7 5 G 2 ) 1-) ) /1, 2 ) +-0
  22. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) Run Yes No Sunny

    3 2 Outlook Overcast 4 0 Rainy 2 3 Gain = 0.247 4 9 2 3 GE 9 3 GE 9 INFORMATION GAIN CALCULATION Run Yes No Hot 2 2 Temp Mild 4 2 Cool 3 1 Gain = 0.029 4 7 5 G 2 3 7 3 7 5 G 2 ) 1-) ) /1, 2 ) +-0 HIGHEST INFORMATION GAIN
  23. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) THE ALGORITHM IS RUN

    ON EACH BRANCH RECURSIVELY UNTIL IT TERMINATES ON A LEAF NODE.
  24. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) FROM THIS: Outlook Temp

    Humidity Windy Run? Sunny Hot High Weak No Sunny Hot High Strong No Overcast Hot High Weak Yes Rain Mild High Weak Yes Rain Cool Normal Weak Yes Rain Cool Normal Strong No Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Overcast Hot Normal Weak Yes Rain Mild High Strong No
  25. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) DEMOGRAPHIC DATA + PURCHASE

    Gender Ethnicity Income Age Purchase Female African American 50k-100k 45-54 No Male African American 50k-100k 18-24 Yes Female Hispanic <50k 25-34 Yes Male African American <50k 45-54 Yes Female Asian >100k 35-44 Yes Female Hispanic <50k 18-24 No Female Asian 50k-100k 25-34 No Female African American <50k 25-34 No Male Hispanic >100k 25-34 No Male Caucasian 50k-100k 25-34 No Female Hispanic >100k 18-24 No Female Asian >100k 18-24 Yes Male African American <50k 55+ Yes
  26. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) DEMOGRAPHIC DATA + PURCHASE

    FEATURES: GENDER, ETHNICITY, INCOME, AGE 83.3% ACCURACY
  27. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) DEMOGRAPHIC DATA + PURCHASE

    83.3% ACCURACY FEATURES: AGE, INCOME, ETHNICITY
  28. DECISION TREES: ITERATIVE DICHOTOMISER 3 (ID3) DEMOGRAPHIC DATA + PURCHASE

    FEATURES: GENDER, ETHNICITY, INCOME, AGE WE SHOULD TRY FOCUSING OUR MARKETING ON HISPANICS AGED 25-34 WHO MAKE LESS THAN $50K A YEAR