Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #47 - Survey Data Analysis Part 1 - PCA, Clustering, & NPS

Exploratory Seminar #47 - Survey Data Analysis Part 1 - PCA, Clustering, & NPS

Doing a survey is easy, but getting values out of the survey result data is a different story.

In this seminar, Kan will present a few analytics and data wrangling techniques to gain more value from your survey data.

* Understanding Correlation among Questions with PCA (Principal Component Analysis)
* Segmenting Customers based on Answers with Clustering
* Evaluating Customer Satisfaction with NPS (Net Promoter Score)

Subscribe ↓
https://www.youtube.com/channel/UCOVfLaSQBvMRwZCyiccq4Iw

Twitter ↓
https://twitter.com/ExploratoryData

UI Tool: Exploratory(https://exploratory.io/)
Exploratory Online Seminar: https://exploratory.io/online-seminar

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

June 03, 2021
Tweet

Transcript

  1. EXPLORATORY Online Seminar #47 Survey Data Analysis Part 1 PCA,

    Clustering, & NPS
  2. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  3. 3 Data Science is not just for Engineers and Statisticians.

    Exploratory makes it possible for Everyone to do Data Science. The Third Wave
  4. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Science Workflow
  5. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) ExploratoryɹModern & Simple UI
  6. EXPLORATORY Online Seminar #47 Survey Data Analysis Part 1 PCA,

    Clustering, & NPS
  7. 7 1. Correlation 2. PCA (Principal Component Analysis) 3. Clustering

    4. NPS Survey Data Analysis Part 1
  8. 8 1. Correlation 2. PCA (Principal Component Analysis) 3. Clustering

    4. NPS Survey Data Analysis Part 1
  9. 9

  10. 10

  11. 11

  12. 12

  13. 13 • Get as many responses as you can •

    Get high quality answers When you do survey you want to …
  14. 14 The more questions there are, the less completions there

    are.
  15. 15 The more questions there are, the less quality the

    answers become.
  16. 16 We can’t remove these questions because some people want

    them to be asked. How about using Amazon Gift card? We already have too many questions, it will take more than 20 minutes to answer them all.
  17. 17 I know, but this is a great opportunity to

    know them better. I don’t want to miss anything potentially important. We should keep them minimal so that they can be all answered under 5 minutes.
  18. 18 We want to have our questions answered with high

    quality from as many customers as possible. How can we ask fewer questions without losing important information?
  19. None
  20. Name Passionate about my work Consider my work is important

    Support company’s mission John 5 5 2 Nancy 5 5 3 Yoko 5 4 3 Mike 4 5 5 Stephany 4 3 4 Mary 3 3 2 Ken 3 2 1 Sunil 2 2 4 Tom 2 1 3 Brenda 1 1 3 20 If the two questions have very similar answers, then you can guess how a given person would answer to one of the questions if you know how he/she answer to the other question.
  21. 21 Passionate about work My work is important Correlation 5

    4 3 2 1 1 2 3 4 5
  22. Strong Negative Correlation No Correlation Strong Positive Correlation 0 1

    -1 -0.5 0.5 Correlation
  23. 23 The Correlation Coefficient is 0.84, which indicates a highly

    positive correlation between the two questions.
  24. Name Passionate about my work Consider my work is important

    Support company’s mission John 5 5 2 Nancy 5 5 3 Yoko 5 4 3 Mike 4 5 5 Stephany 4 3 4 Mary 3 3 2 Ken 3 2 1 Sunil 2 2 4 Tom 2 1 3 Brenda 1 1 3 24 If the two questions have very similar answers, then you can guess how a given person would answer to one of the questions if you know how he/she answer to the other question.
  25. On the other hand…

  26. Name Passionate about my work Consider my work is important

    Support company’s mission John 5 5 2 Nancy 5 5 3 Yoko 5 4 3 Mike 4 5 5 Stephany 4 3 4 Mary 3 3 2 Ken 3 2 1 Sunil 2 2 4 Tom 2 1 3 Brenda 1 1 3 26 Some questions are very different in terms of how they are answered.
  27. 27 The correlation coefficient is 0.019, which means there is

    almost no correlation between the two questions.
  28. Name Passionate about my work Consider my work is important

    Support company’s mission John 5 5 2 Nancy 5 5 3 Yoko 5 4 3 Mike 4 5 5 Stephany 4 3 4 Mary 3 3 2 Ken 3 2 1 Sunil 2 2 4 Tom 2 1 3 Brenda 1 1 3 28 This means that removing one of the questions will lose a significant part of information about the employees.
  29. 29 We have more questions, and we can investigate every

    single combination. But…
  30. You can use ‘Correlation’ under Analytics view to investigate the

    correlation between any given combinations of the variables.
  31. Select the variables (questions) and run it.

  32. 32 You can see which pairs of the questions are

    correlated the highest among all.
  33. 33 1. Correlation 2. PCA (Principal Component Analysis) 3. Clustering

    4. NPS Survey Data Analysis Part 1
  34. 34 ‘Correlation’ helps you understand how strong (or weak) the

    relationship between two variables. Using the Correlation Coefficient values you can compare which combinations are more correlated than the others. However, it doesn’t give you an overall picture of how all the questions are related to.
  35. Generates a new set of artificial dimensions (components) that are

    created in a way that they are not correlated to one another and that can carry as much information of the original data as possible with fewer dimensions. It is one of the ‘Dimensionality Reduction’ techniques. PCA (Principal Component Analysis)
  36. PCA • Find the directions (Components) in data that has

    high variance. • Find a few components with high variance that can explain the most variance of data. (Principal Components)
  37. How PCA finds the new dimensions? 1. Finds a center

    point of the whole data presented in the multi-dimensional space. 2. Finds the direction that has the highest variance. (The 1st Component) 3. Finds the direction that is orthogonal to the 1st component and has the highest variance. (The 2nd Component) 4. Finds the direction that is orthogonal to the 1st and the 2nd components and has the highest variance. (The 3rd component) 5. Repeat till the last Nth component. 1 2 3 4
  38. 38 PCA helps you understand which questions are similar to

    one another and how similar they are. And also, how different they are. You can grasp the overall relationship among all the questions. This helps you to understand which questions can be removed or should be kept. PCA for Survey Data Analysis
  39. Let’s Try!

  40. 40 Sample Data Employee Satisfaction Survey

  41. None
  42. Each row represents each employee. Each column represents each question.

    The cell is each survey answer (scaled 1 - 5). 42
  43. 43 Select ‘Principal Component Analysis (PCA)’.

  44. 44 Click on the Variable Columns button to select the

    variables.
  45. 45 Select all the questions (variables) and run it.

  46. 46 You will see a chart called ‘Biplot’, which tries

    to present you all the variables in a 2- dimensional space and places all the rows (employees) as dots in related to the variables.
  47. 47 The variables that going into the similar direction are

    considered highly and positively correlated.
  48. 48 For example, both of these two questions are asking

    about similar thing.
  49. 49 You can see these are highly correlated when you

    visualize them with Scatter chart.
  50. 50 Both of these questions are about ‘amount of work’

    and similar.
  51. 51 You can see these are highly correlated when you

    visualize them with Scatter chart.
  52. 52 These two questions are diverged from each other with

    almost 90 degree. This means they are independent from each other in the context of all the variables.
  53. 53 You can see these are not correlated at all

    when you visualize them with Scatter chart.
  54. 54 These are the questions we can consider removing because

    removing them won’t lose out much information.
  55. 55 With Scatter chart, you can visualize the relationship between

    a given pair of questions intuitively. With Correlation under Analytics, you can investigate the strength of the correlation for every single combination of all the questions. With PCA under Analytics, you can visualize the relationship among all the questions and see which questions are similar or different in an overall view.
  56. 56 With these tools, you can investigate what are the

    minimal set of questions without losing much information. By reducing the number of questions, you will have more people complete your survey questions with high quality, which will help you understand your customers better.
  57. 57 1. Correlation 2. PCA (Principal Component Analysis) 3. Clustering

    4. NPS Survey Data Analysis Part 1
  58. 58 Some people answer the questions the same way, but

    some don’t. Can we segment them based on how they answer the questions so that we can approach them differently in more optimized ways?
  59. 59 Let’s say we ask what is important about their

    work.
  60. Name Relationship is important for my work Salary is important

    for my work John 5 2 Nancy 5 1 Yoko 5 2 Mike 4 2 Stephany 4 1 Mary 4 1 Ken 1 4 Sunil 2 5 Tom 2 5 Brenda 1 5 For some people Relationship is more important, but for others Salary is more important.
  61. Name Relationship is important for my work Salary is important

    for my work John 5 2 Nancy 5 1 Yoko 5 2 Mike 4 2 Stephany 4 1 Mary 4 1 Ken 1 4 Sunil 2 5 Tom 2 5 Brenda 1 5 Relationship is more important Salary is more important We can segment them into 2 groups.
  62. We have more questions! Can we segment them based on

    how they answered all these questions automatically?
  63. Clustering

  64. • Detect the inherent structures in the data • Categorize

    the data into groups of maximum commonality Clustering
  65. Let's do it! 65

  66. 66 Sample Data Employee Satisfaction Survey

  67. None
  68. Each row represents each employee with his/her survey answers. 68

  69. 69 Select ‘K-Means Clustering’ under the Analytics view.

  70. 70 Select all the numerical variables (questions) and run it.

  71. 71 Once you run it, you will see the similar

    Biplot chart we saw with PCA.
  72. 72 People in the Cluster 1 are located at the

    opposite side of the satisfactory questions, which means that they score low on these questions. They are not happy!
  73. 73 On the other hand, people in the Cluster 2

    scored high on the satisfactory questions. We can consider this group as a ‘happy’ group.
  74. 74 The people in the Cluster 3 score high on

    the company mission and the work amount related questions.
  75. 75 Boxplot tab shows you the distribution of the scores

    on each question in each cluster. The Y-Axis values are the scores in the standardized scale.
  76. 76 Cluster 1’s satisfaction levels are low on all measures.

    This is the ‘un-happy’ group.
  77. 77 People in the Cluster 2’s satisfaction levels are high

    overall, though their support on the mission is relatively low.
  78. 78 People in the Cluster 3 score high on the

    mission, the salary, and the amount of work related questions.
  79. 79 We can use the Label Column to see how

    that is related to each cluster.
  80. 80 By assigning the Age column to the Label, you

    can see the age bucket for each employee that is shown as a dot.
  81. 81 Under the Stack Bar tab, we can see the

    ratio of each age bucket in each cluster.
  82. 82 For example, the cluster 2 is the ‘happy’ group

    and we can see that it consists of mainly 40 something employees.
  83. 83 On the other hand, the cluster 3 is the

    group who support the company mission the most and it consists of mainly 20s and 30s employees.
  84. 84 With Clustering under Analytics, you can segment the respondents

    (customers, employees, etc.) of the survey questions into a few groups and understand the characteristics of each group. This type of insight will help you strategize how you can approach or communicate to your customers (or employees) in more optimized ways.
  85. 85 1. Correlation 2. PCA (Principal Component Analysis) 3. Clustering

    4. NPS Survey Data Analysis Part 1
  86. 86 Often, we do surveys because we want to understand

    if / how customers are satisfied with our product or service in order to improve our product or service.
  87. A typical question about the customer satisfaction would be… 87

  88. 88 The problem with this question is that it is

    obscure and it tends to make many people end up scoring too high (or too low) without considering it too much.
  89. 89 This is where NPS comes in rescue. NPS is

    a measure of how much value the customers find in your product or service.
  90. 90 NPS asks a question to see how likely they

    want to recommend your product or service to other people.
  91. Because the question is more specific people don’t blindly score

    high unless they can see they would really do it. 91
  92. 92 According to Airbnb, 4% of the customers who scored

    10 have referred other customers within a year while 0% of customers who scored between 0 and 6 didn’t referred at all.
  93. 93 Now, we got 100 people answered the NPS, how

    should we calculate the overall NPS? Not average.
  94. We’ll group the scores into 3 buckets. 94 1 2

    3 4 5 6 0 7 8 9 10
  95. First, the people who scored less than 6 are called

    ‘Detractors’. 95 1 2 3 4 5 6 0 7 8 9 10 Detractors
  96. Second, the ones who score 7 or 8 are called

    Passive. They are not disappointed but also don’t think your product is superb. 96 1 2 3 4 5 6 0 7 8 9 10 Passive
  97. Last, people who scored 9 or 10 are called Promotor.

    These are the people who are really satisfied and therefore will tell their friends good things about your product. 97 1 2 3 4 5 6 0 7 8 9 10 Promotor
  98. 98 1 2 3 4 5 6 0 7 8

    9 10 % of Promotors NPS − = % of Detractors Promotor Detractors
  99. 99 Let’s say we’ve got 10 people answered the NPS

    like the below. 0 1 2 3 4 5 6 7 8 9 10
  100. 100 We can segment them into the three groups. 0

    1 2 3 4 5 6 7 8 9 10 Detractors Promotor
  101. Detractors 30%ʢ3/10ʣ We calculate the % of Promotors and the

    % of Detractors. 101 0 1 2 3 4 5 6 7 8 9 10 Promotor 40%ʢ4/10ʣ
  102. We can subtract the % of detractors from the %

    of promotors. 102 0 1 2 3 4 5 6 7 8 9 10 10ʢNPSʣ = − % of Promotors 40% % of Detractors 30% Detractors 30%ʢ3/10ʣ Promotor 40%ʢ4/10ʣ
  103. 103 In general, if your NPS is greater than 50

    you are considered ‘Excellent’. If it is greater than 70 you are considered ‘World Class!’
  104. 104 62 68 72 96 74 77

  105. When the NPS is around 50, it tends to have

    many Passives. 105
  106. When the NPS goes beyond 70, a significant portion of

    people are scoring 9 or 10 and not many detractors. 106
  107. 107 Here is a distribution of NPS scores for Airbnb.

    NPS = 74
  108. Let's do it! 108

  109. 109 Each row represents each customer’s answer.

  110. 110 ʁ We don’t have a column to indicate whether

    a given customers is Promoter, Passive, or Detractor so we need to create one.
  111. 111

  112. 112

  113. None
  114. None
  115. None
  116. None
  117. None
  118. None
  119. None
  120. None
  121. None
  122. None
  123. None
  124. None
  125. ܭࢉͨ͠NPSͷσʔλ͔ΒɺμογϡϘʔυΛ࡞੒͢Δ͜ͱͰɺ࠷৽ ͷύϑΥʔϚϯε΍࣌ܥྻͷτϨϯυΛཧղ͢Δ͜ͱ͕Ͱ͖Δɻ 125

  126. That’s it for today!

  127. Next Seminar

  128. EXPLORATORY Online Seminar #48 6/16/2021 (Wed) 11AM PT Exploratory v6.6

  129. None
  130. Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

  131. Q & A 131

  132. EXPLORATORY 132