Strataconf 2014 - Chicago Bars, Prisoner's Dilemma, and Practical Models in Search

Strataconf 2014 - Chicago Bars, Prisoner's Dilemma, and Practical Models in Search

My talk from Strataconf 2014...it has a Shia Lebeouf GIF

63a734a216422efeab3b81d058f1b7b5?s=128

Chris Harland

April 08, 2014
Tweet

Transcript

  1. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com
  2. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com Who I am…
  3. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com Who I am… What I do…
  4. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com Who I am… What I do… Where I work…
  5. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com Who I am… What I do… Where I work… Where you can find me…
  6. Not a technical talk… Not an infrastructure talk… Not an

    “in the future” talk…
  7. Going from question -> data -> models -> information ->

    solution
  8. Going from question -> data -> models -> information ->

    solution (by example)
  9. What about the bars? Living in Chicago taught me some

    things…
  10. People take drinking seriously…

  11. None
  12. The “best” model is not always the one that achieves

    the desired results…
  13. The “best” model is not always the one that achieves

    the desired results…
  14. Goal: Best score on leaderboard The “best” model is not

    always the one that achieves the desired results… Goal: Make money
  15. Life is tough for a Chicago Bar… “No new tavern

    licenses can be issued to any location that is within 400 feet of existing businesses already licensed for the sale of alcoholic liquor in certain zoning districts.” – City of Chicago Ordinance
  16. Bars need to make more money

  17. Bars need to make more money Get new customers

  18. Bars need to make more money Get new customers Or…

    Get current customers to spend more
  19. Bars need to make more money Get new customers Or…

    Get current customers to spend more
  20. Mail drop…

  21. Mail drop… You don’t choose

  22. Mail drop… You don’t choose They promise ~200k

  23. Mail drop… You don’t choose They promise ~200k You get

    what you get…
  24. Build a recommendation (or at least rank)

  25. Build a recommendation (or at least rank) Collaborative Filtering

  26. Build a recommendation (or at least rank) Collaborative Filtering From

    my bag of 200k possible “users”…who do I send a mailer to first?
  27. Users + Potential Users Bars or User Features

  28. User Group 02 0.986 User Group 15 0.963 User Group

    13 0.942 User Group 20 0.921 User Group 16 0.900 User Group 05 0.898 My Users Start Sending Mail
  29. 13.5% lift in response rate

  30. 13.5% lift in response rate Response rate -> $$

  31. 13.5% lift in response rate Response rate -> $$ No

    change in business model / tactics
  32. To a Bar Owner… This is magic…

  33. What users features are important?

  34. What users features are important?

  35. What users features are important? IsCoronaDrinker: -> strong predictive power

    -> captures a lot variation
  36. What users features are important? IsCoronaDrinker: -> strong predictive power

    -> captures a lot variation How does a bar digest this information?
  37. What users features are important? IsCoronaDrinker: -> strong predictive power

    -> captures a lot variation How does a bar digest this information? They don’t need a model… They need an action
  38. “I didn’t hire you to teach me to computer” -

    Bar owner that keeps it real
  39. “It is…rude to hand someone…a distribution when all [they] asked

    for was an estimate” - Cam Davidson-Pilon
  40. Don’t waste time optimizing CF for them… Just get to

    the actions
  41. Bars need to make more money Get new customers Or…

    Get current customers to spend more
  42. How do we interpret this problem given the data at

    hand?
  43. How do we interpret this problem given the data at

    hand? Bucket users into spending types
  44. How do we interpret this problem given the data at

    hand? Bucket users into spending types Find the good buckets
  45. None
  46. None
  47. Potential Users

  48. Potential Users

  49. Move them to the good bucket =)

  50. Build RF classifiers… Find variables that segment strongly… Tell the

    bars to change themselves accordingly…
  51. Old friend Corona… IsCoronaDrinker is of high importance / predictive

    power
  52. Old friend Corona… IsCoronaDrinker is of high importance / predictive

    power But it doesn’t help my bar… They can’t make you drink Corona…
  53. Remove features from model…leave ones with actionable segmentation

  54. Remove features from model…leave ones with actionable segmentation Pay cost

    in accuracy for the benefit of action… (few percent)
  55. Remove features from model…leave ones with actionable segmentation Pay cost

    in accuracy for the benefit of action… (few percent) visitsHappyHour bubbles up the variable importance…
  56. What exactly is Happy Hour?

  57. None
  58. Amount of special time users spend at your bar

  59. Amount of perceived special time users spend at your bar

  60. Amount of perceived special time users spend at your bar

    Cheap food for full price drinks…
  61. Ulterior motive: start at happy hour… stay past… $$

  62. Ulterior motive: start at happy hour… stay past… $$ Happy

    Hour Time Window
  63. Ulterior motive: start at happy hour… stay past… $$ Happy

    Hour Time Window Just put people on the edge of the window
  64. Okay…but how do you do all of this for one

    bar?
  65. Okay…but how do you do all of this for one

    bar? You don’t…you do it for a lot of bars
  66. Pool data to make recommendations to everyone

  67. Pool data to make recommendations to everyone Problem…competitive information sharing

  68. Image by Chris Jensen and Greg Riestenberg Players: Diamond Circle

    Scenario: Crime Decision: Defect Penalty: Prison
  69. Image by Chris Jensen and Greg Riestenberg Players: Diamond Circle

    Scenario: Crime Decision: Defect Penalty: Prison Turn on each other
  70. Image by Chris Jensen and Greg Riestenberg Players: Diamond Circle

    Scenario: Crime Decision: Defect Penalty: Prison Luck out
  71. Image by Chris Jensen and Greg Riestenberg Players: Diamond Circle

    Scenario: Crime Decision: Defect Penalty: Prison Keep quiet
  72. Image by Chris Jensen and Greg Riestenberg Players: Diamond Circle

    Scenario: Crime Decision: Defect Penalty: Prison Players: Bar A Bar B Scenario: Data Decision: Share Penalty: Loss of potential $$
  73. One big problem… Once a bar shares data…there’s no going

    back…
  74. We hold all the data…but don’t expose to participants This

    is the most crucial piece of the whole system… Central Data Bar A Bar B Bar C
  75. But what about search users? Search is whatever users want

    it to be… Value can come from exploring search behavior and surfacing scenarios…
  76. None
  77. What “most” people do…

  78. What “most” people do… What an almost equal number of

    people do…
  79. What is a user to a data scientist?

  80. What is a user to a data scientist? Collection of

    log lines…
  81. What is a user to a data scientist? Collection of

    log lines… What does a user mean when they type “Tom Cruise”?
  82. None
  83. Tom Cruise Mia Sara

  84. Tom Cruise Mia Sara Tim Curry

  85. Tom Cruise Mia Sara Tim Curry David Bennent

  86. We have a graph… We have an adjacent graph (think

    Wikipedia)… Find the “best” path between nodes in adjacent graph…
  87. Legend Tom Cruise Mia Sara Tim Curry David Bennent Nodes

    are from session graph… Links are from adjacent graph Defining the transition from one graph to the other is tough
  88. To this user… Right now… Tom Cruise means “Legend”

  89. Models like this are great for back end understanding…

  90. Models like this are great for back end understanding… They

    allow for long tail behavior bucketing…
  91. Models like this are great for back end understanding… They

    allow for long tail behavior bucketing… But…they are bad for naïve application…almost no one saw “Legend”
  92. Models like this are great for back end understanding… They

    allow for long tail behavior bucketing… But…they are bad for naïve application…almost no one saw “Legend” And can sometimes transition to production…
  93. When making a model… Create with a purpose… Abstract your

    business question…but not too far… Understand when good is good enough…
  94. Chicago Bars, Prisoner’s Dilemma, and Practical Models in Search Chris

    Harland Data Scientist @ Microsoft Strataconf 2014 Santa Clara, Ca @cdubhland chharlan@microsoft.com