Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Topic Modelling on Customer Reviews

Topic Modelling on Customer Reviews

A look at how N Brown approached classifying their Customer Reviews, whilst also taking us the company's data science journey.

Chris Billingham

September 13, 2018
Tweet

More Decks by Chris Billingham

Other Decks in Business

Transcript

  1. Who we are 2 Data Scientist and Analytical Business Partner

    Joined N Brown in 2016 Previous worked at Openreach Had a non-traditional Data Science upbringing
  2. Background – Customer Pulse 3 “We need a better way

    of understanding what our customers are saying” “Excellent experience shopping both online and on the phone.” – JD Williams Customer “Order arrived very promptly, items were as described and pictured. Would order again.” – Simply Be Customer “I used “Joke a mo” to buy a pair of jeans and a polo shirt…” – Jacamo Customer
  3. Customer Pulse – The Word Climate 4 What are people

    talking about this week? Positive Negative Rank Word Frequency WOW 1 product 18 61% 2 great 21 50% 3 pay 14 25% 4 wait 20 19% 5 quickly 16 14% 6 service 44 12% 7 stock 36 12% 8 experience 20 10% Rank Word Frequency WOW 1 dress 58 110% 2 shoes 51 107% 3 date 21 77% 4 friends 16 47% 5 jacamo 32 47% 6 things 20 45% 7 class 17 44% 8 comfortable 21 42%
  4. Customer Pulse – What we learned 7 It (mostly) worked

    It only worked on my computer I didn’t entirely understand what it was doing It needed some manual finessing each week Other people want to know R The business was hungry for more detail
  5. Customer Pulse – What happened next 8 “This is all

    well and good but what kind of things are our customer’s talking about?” “This code is a mess, why can’t other people run it? Hey what’s this tidytext thing?”
  6. “The Jane Austen Epidemic” 10 “But every time I turn

    R on bloody Jane Austen is there.” “This seems much easier, I think we can use this over the tm package.”
  7. The N Brown R User Group 12 Bi Monthly or

    so Great opportunity to share “Analysts Anonymous” sessions An opportunity to berate people into using projects Cross-business
  8. 25

  9. What we learned about Topic Models 36 You should always,

    always properly read the manual You need to spend the time tuning your available tokens to get a coherent model The stm package is a fantastic tool for generating and assessing good quality models, particularly when you have accompanying metadata Progress bars are good Some of your topics are just going to be wonky, its down to you to minimise those At some point you’re going to go manual and hand classify
  10. …and our Data Science Team 37 We now have a

    bi-monthly (or so) R User Group hosted by the team R is the primary tool of choice (we have acquired some Pythonistas as well) Use Projects (mostly) Are using Version Control (mostly) Currently in the process of building out an on-prem Data Science server for those bigger workloads