Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mining the Voice of the Customer (Stefan Debortoli, MineMyText)

Mining the Voice of the Customer (Stefan Debortoli, MineMyText)

It is estimated that more than 80% of today’s data is stored in unstructured form (e.g., text, audio, video) and much of its content is expressed in rich and ambiguous natural language. Traditionally, the analysis of natural language has prompted the use of qualitative data analysis approaches, such as, manual coding. Yet, the size of textual datasets available from online social networks like Twitter or rating platforms (e.g., Tripadvisor, Yelp, Amazon) exceeds the information processing capacities of human analysts. One approach to overcome these limitations is the use of text mining techniques to (semi-)automatically extract implicit, previously unknown, and potentially useful knowledge from large amounts of unstructured textual data. Although text mining and related natural language processing techniques only scratch the surface of the meaning of natural language, they have proven to be reliable tools when fed with sufficiently large datasets.

The goal of this talk is to demonstrate how to reduce the efforts needed for analyzing unstructured and textual customer feedback with topic modeling. Topic models are unsupervised machine learning algorithms for inductively discovering latent topics running through large collections of documents. Topic modeling algorithms identify topics in a purely data-driven way—neither necessitating any prior labelling of documents, the existence of predefined categorization schemes, or human input. The talk will include a live demo with real data.

Presentation given at the 4ländereck Data Science Meetup: https://www.meetup.com/4laendereck-Data-Science-Meetup/events/234041619/

Transcript

  1. 1 Mining the Voice of the Customer x 1st Data

    Science Meetup @ Avira Stefan Debortoli www.MineMyText.com [email protected]
  2. 5

  3. 7 Advanced Text Analytics I bought this for my 14

    year old daughter as a gift. She received it in July. It works great - she lost 6 pounds in 2 weeks. The Fitbit makes staying in shape easy. The iPhone app works fine. Exemplary Customer Review about a Fitbit Flex Gift / present Loosing weight Mobile app Topics Sentiments + + o – Two major applications > Automatic text categorization (e.g., topic modeling) > Automatic opinion mining (e.g., sentiment analysis)
  4. 8 Topic Modeling 0 0.1 0.2 0.3 0.4 0.5 0.6

    Topic 01 Topic 02 Topic 03 Topic 04 Topic 05 Topic 06 Topic 07 Topic 08 Topic 09 Topic 10 Topic 11 Topic 12 Topic 13 Topic 14 Topic 15 Topic 16 Topic 17 Topic 18 Topic 19 Topic 20 «I bought this for my 14 year old daughter as a gift. She received it in July. She lost 6 pounds in 2 weeks. The Fitbit makes staying in shape easy. The iPhone app works fine.» Document … … … Topic Distribution … Topic 01 Topic 02 Topic 03 love 0,13 gift 0,10 weight 0,08 recommend 0,08 love 0,07 loss 0,05 thing 0,07 christmas 0,07 pounds 0,04 color 0,07 bought 0,06 lose 0,04 purchased 0,06 husband 0,05 week 0,02 band 0,06 daughter 0,04 lb 0,02 buy 0,03 received 0,03 month 0,02 mine 0,03 birthday 0,03 helped 0,01 amazing 0,03 present 0,02 shape 0,01 friend 0,02 son 0,02 goal 0,01 Topic 04 Topic 05 Topic 06 battery 0,15 heart 0,11 app 0,12 day 0,09 rate 0,10 iphone 0,08 charge 0,07 monitor 0,07 sync 0,03 life 0,05 blood 0,02 ipad 0,02 week 0,03 pressure 0,02 work 0,01 time 0,02 pedometer 0,02 io 0,01 hour 0,02 measure 0,02 apple 0,01 low 0,01 tracking 0,01 android 0,01 recharge 0,01 device 0,01 computer 0,01 dead 0,01 glucose 0,01 update 0,01
  5. 9 Topic Modeling t1 t2 ty … w1 w2 wz

    … “Per-Document Topic Distribution” “Per-Topic Word Distribution” Documents Topics Words d1 d2 dx …
  6. 10 Sentiment Analysis I loooove [3] [+0.6 spelling emphasis] this

    product!! [+1 punctuation emphasis] I loooove this product!! Positive Sentiment Score: Negative Sentiment Score: 4.6 0.0
  7. 11 Real World Example I loooove [3] [+0.6 spelling emphasis]

    this product!! [+1 punctuation emphasis] I loooove this product!! Positive(Sentiment(Score: Negative(Sentiment(Score: 4.6 0.0 + + 0 0.1 0.2 0.3 0.4 0.5 0.6 Topic.01 Topic.02 Topic.03 Topic.04 Topic.05 Topic.06 Topic.07 Topic.08 Topic.09 Topic.10 Topic.11 Topic.12 Topic.13 Topic.14 Topic.15 Topic.16 Topic.17 Topic.18 Topic.19 Topic.20 «I.bought this for my 14.year old daughter as a.gift..She received it in.July..She lost.6 pounds in.2.weeks..The.Fitbit makes staying in.shape easy..The.iPhone app works fine.» Document … … … Topic,Distribution … Topic,01 Topic,02 Topic,03 love 0,13 gift 0,10 weight 0,08 recommend 0,08 love 0,07 loss 0,05 thing 0,07 christmas 0,07 pounds 0,04 color 0,07 bought 0,06 lose 0,04 purchased 0,06 husband 0,05 week 0,02 band 0,06 daughter 0,04 lb 0,02 buy 0,03 received 0,03 month 0,02 mine 0,03 birthday 0,03 helped 0,01 amazing 0,03 present 0,02 eat 0,01 friend 0,02 son 0,02 goal 0,01 Topic,04 Topic,05 Topic,06 battery 0,15 heart 0,11 app 0,12 day 0,09 rate 0,10 iphone 0,08 charge 0,07 monitor 0,07 sync 0,03 life 0,05 blood 0,02 ipad 0,02 week 0,03 pressure 0,02 work 0,01 time 0,02 pedometer 0,02 io 0,01 hour 0,02 measure 0,02 apple 0,01 low 0,01 tracking 0,01 android 0,01 recharge 0,01 device 0,01 computer 0,01 dead 0,01 glucose 0,01 update 0,01 What do customers like / dislike about Fitbit products?
  8. 12

  9. 13 Enhanced Review Data Structure – Review Text – Date

    – Star-Rating – Author – Positive Sentiment Score – Negative Sentiment Score – Average Sentiment Score – Topic Probabilities > Topic 01 > Topic 02 > … > Topic 50 What do customers like / dislike about our products? What? Topics Like / dislike? Star-Rating / Sentiment
  10. 14 What is the effect of each topic on the

    customer satisfaction? ============================================================================== Dep. Variable: rating R-squared: 0.503 Model: OLS Adj. R-squared: 0.501 Method: Least Squares F-statistic: 265.5 Date: Fri, 22 May 2015 Prob (F-statistic): 0.00 Time: 13:49:10 No. Observations: 12910 Df Residuals: 12860 Df Model: 49 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept 2.9586 0.156 19.002 0.000 2.653 3.264 ... T3 2.9716 0.202 14.690 0.000 2.575 3.368 ... T9 2.4776 0.195 12.688 0.000 2.095 2.860 ... T20 -2.4133 0.189 -12.758 0.000 -2.784 -2.043 ... T39 -3.7576 0.227 -16.531 0.000 -4.203 -3.312 ... Losing Weight Gift / present Stopped working Cost / benefit
  11. 15 Implications – Positive topics > Losing weight > Gift

    / present > … – Negative Topics > Stopped working > Cost / benefit > … Use these insights for enhancing advertisements and promotions Feedback for product engineers Update pricing strategy …
  12. 16 Summary > What do customers like / dislike about

    our products? > Automated analysis of more than 12,000 online customer reviews > Inductive identification of latent topics > Temporal analysis of individual topics > Impact of topics on user’s satisfaction