Slide 1

Slide 1 text

EXPLORATORY Online Seminar #43 Cohort Analysis Part 2 Retention / Churn Analysis with Survival Curve

Slide 2

Slide 2 text

Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory, Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker

Slide 3

Slide 3 text

3 Data Science is not just for Engineers and Statisticians. Exploratory makes it possible for Everyone to do Data Science. The Third Wave

Slide 4

Slide 4 text

4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics / Machine Learning) Data Analysis Data Science Workflow

Slide 5

Slide 5 text

5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI

Slide 6

Slide 6 text

EXPLORATORY Online Seminar #43 Cohort Analysis Part 2 Retention / Churn Analysis with Survival Curve

Slide 7

Slide 7 text

7 Agenda • Survival Curve - Understand Customer Retention / Churn Better • Survival Rate - How to Calculate? • Cohort Analysis - Compare Customer Retention among Groups

Slide 8

Slide 8 text

8 Let’s say our business’s MRR is growing as below. Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Slide 9

Slide 9 text

By dividing the MRR by the cohort of when the users converted, you can see how efficiently your business is growing and can grow further. 9 Month Ago 8 Month Ago 7 Month Ago 6 Month Ago 5 Month Ago 4 Month Ago 3 Month Ago 2 Month Ago 1 Month Ago

Slide 10

Slide 10 text

Layer Cake chart helps you understand how efficient your MRR is growing by visualizing the net revenue retention for each cohort of customers based on when they converted. Ideally, you want to have each cohort to expand, not shrink, in terms of the revenue.

Slide 11

Slide 11 text

Now, the Net Revenue Retention is the result of Retention / Churn. But if you want to see how your customers are churning or if your customer retention is getting better or not, you want to see the retention rates over time and compare them among various cohorts (groups) of customers.

Slide 12

Slide 12 text

Over the customer lifetime when do the customers churn the most, at the first month, or in the first few months? Does the churn stabilize after a certain time period? Or, would the retention rates keep dropping until everybody churn? If the retention becomes stable, when is it? And what is the retention rate when it becomes stable? Example Questions

Slide 13

Slide 13 text

13 We can calculate the retention rates through the customer lifetime and draw a curve like below. Passing Month 35% 100% 40% 32% 60% 48%

Slide 14

Slide 14 text

14 Each retention rate on this chart indicate the rate of customers who would retain through a given period. 48% 48% of Customers would retain through 2nd month. Month 35% 100% 40% 32% 60%

Slide 15

Slide 15 text

15 1st Month 100% 60% 40% 32% 48% 35% By visualizing the retention rate over the customer lifetime, we can spot where the retention drops more significantly. In general, the steeper the curve is the more attention you want to pay.

Slide 16

Slide 16 text

In the world of Data Science / Statistics, this curve is called ‘Survival Curve’ and each ‘retention rate’ is called ‘Survival Rate’. 16

Slide 17

Slide 17 text

17 How to Calculate Survival Rates?

Slide 18

Slide 18 text

Example: Customers Retention for Subscription Business

Slide 19

Slide 19 text

Apr Mar Feb Jan Cancelled Cancelled Jessica Tien Nancy Victor

Slide 20

Slide 20 text

4th Month 3rd Month 2nd Month 1st Month Jessica Tien Nancy Victor

Slide 21

Slide 21 text

1 0.66 0.75 1 Survival Rate of Each Month Jessica Tien Nancy Victor

Slide 22

Slide 22 text

Survival Rate of Each Period 0.75 1 0.75 1 Survival Rate through the period 1*0.75 Jessica Tien Nancy Victor

Slide 23

Slide 23 text

Survival Rate of Each Period 0.5 0.75 1 0.66 0.75 1 Survival Rate through the period 1*0.75 0.75*0.66 Jessica Tien Nancy Victor

Slide 24

Slide 24 text

Survival Rate of Each Period 0.5 0.5 0.75 1 1 0.66 0.75 1 Survival Rate through the period 1*0.75 0.75*0.66 0.5*1 Jessica Tien Nancy Victor

Slide 25

Slide 25 text

In Reality, customers convert at different times.

Slide 26

Slide 26 text

Apr Mar Feb Jan Jessica Tien Nancy Victor Jane

Slide 27

Slide 27 text

4th Month 3rd Month 2nd Month 1st Month Jessica Tien Nancy Victor Jane Align all customers based on how many months they are in.

Slide 28

Slide 28 text

? ? ? ? 4th Month 3rd Month 2nd Month 1st Month We don’t know if the recently joined customers will quit or not. Jessica Tien Nancy Victor Jane

Slide 29

Slide 29 text

4th Month 3rd Month 2nd Month 1st Month Label them as Censored. Jessica Tien Nancy Victor Jane

Slide 30

Slide 30 text

Survival Rate of Each Month 1 0.66 0.75 1 Jessica Tien Nancy Victor Jane

Slide 31

Slide 31 text

1 0.66 0.75 1 0.5 0.5 0.75 1 Survival Rate through the period 1*0.75 0.75*0.66 0.5*1 Survival Rate of Each Period Jessica Tien Nancy Victor Jane

Slide 32

Slide 32 text

32 0.5 0.5 0.75 1 Survival Rate through 4th Month 3rd Month 2nd Month 1st Month Survival Curve!

Slide 33

Slide 33 text

33 There is an algorithm called ‘Kaplan-Meier estimator’, which calculate the survival rate though the lifetime taking into an account for the censored data.

Slide 34

Slide 34 text

34 In Exploratory, you can use ‘Survival Curve’ under Analytics view.

Slide 35

Slide 35 text

35 Let’s try!

Slide 36

Slide 36 text

• One row represents one observation. • Each observation needs to have the following information. • Start Time • End Time • Event Status: e.g. Cancelled, Quit, Died, etc. Input data for Survival Analysis

Slide 37

Slide 37 text

37 Expected Input Data for Survival Curve Unique Observation ID Event Status Survival Period

Slide 38

Slide 38 text

38 We’ll use a sample data published at Exploratory Data Catalog.

Slide 39

Slide 39 text

39 Select ‘Data Catalog’ from the Data Frame menu.

Slide 40

Slide 40 text

Search with ‘Cohort’ and click the Import button for ‘Sample Data for Cohort Analysis - Survival Curve’.

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

43 Expected Input Data for Survival Curve Unique Observation ID Event Status Survival Period

Slide 44

Slide 44 text

In reality though, most likely your data don’t look like this! Rather, you might have the data that is more like a payment data where each row represents each payment of a given user or each access of a given user. If so, you need to transform the data to the expected format. Take a look at this blog post for more details on how to prepare the data. Appendix:

Slide 45

Slide 45 text

45 Now, let’s run the Survival Curve!

Slide 46

Slide 46 text

46 Select ‘Survival Curve’ under the Analytics view.

Slide 47

Slide 47 text

47 Assign the first_date, the last_date, and the canceled columns as follows.

Slide 48

Slide 48 text

48 We’ve got our first survival curve!

Slide 49

Slide 49 text

49 The default period is set automatically based on the data period.

Slide 50

Slide 50 text

50 We can change it to ‘Month’ to see the monthly survival rates.

Slide 51

Slide 51 text

51 We have about 30% of the customer who churned by the end of the first month.

Slide 52

Slide 52 text

52 But after that, the slope of the curve becomes stable, which indicates that the churn seems to settle.

Slide 53

Slide 53 text

The ‘Data’ tab shows the raw data.

Slide 54

Slide 54 text

• Time - (e.g. Days since sign up) In survival data, one row represents one point of time. • Observations - Number of observations whom the event hans’t happened to yet at a given time. • Events - Number of observations whom the event happened at a given time. • Censored - Number of observations censored at a given time. • Survival Rate - Survival Rate through this point of time. These are the values visualized as Survival Curve • Std Error - Standard error for the estimates • Conf Low - Lower limit of the confidence interval for the estimate • Conf High - Upper limit of the confidence interval for estimate Survival Analysis Data

Slide 55

Slide 55 text

Day 1 Day 2 Day 3 Observations Events Censored Survival Rate Day 1 Day 2 Day 3

Slide 56

Slide 56 text

Day 1 Day 2 Day 3 Observations Events Censored Survival Rate Day 1 10 Day 2 8 Day 3 4

Slide 57

Slide 57 text

Day 1 Day 2 Day 3 Observations Events Censored Survival Rate Day 1 10 1 Day 2 8 2 Day 3 4 1

Slide 58

Slide 58 text

Day 1 Day 2 Day 3 Observations Events Censored Survival Rate Day 1 10 1 1 Day 2 8 2 2 Day 3 4 1 1

Slide 59

Slide 59 text

Day 1 Day 2 Day 3 Observations Events Censored Survival Rate Day 1 10 1 1 0.9 Day 2 8 2 2 0.675 Day 3 4 1 1 0.5025 Survival Rates through the day

Slide 60

Slide 60 text

Now, is this retention rate trend a problem? Is this bad or good? 60

Slide 61

Slide 61 text

61

Slide 62

Slide 62 text

Cohort Analysis

Slide 63

Slide 63 text

• The team was originated from epidemiology. • Analysis of the Survival Rates of multiple groups. • Each group is called Cohort. Cohort Analysis

Slide 64

Slide 64 text

64 We can draw the survival curves by cohorts (groups). Here, I’ve created two cohorts of customers based on what device they use. PC Users Mobile Users

Slide 65

Slide 65 text

65 The survival rates for the Mobile users are better than the ones for the PC users. The slower the drop of survival rates is better. PC Users Mobile Users

Slide 66

Slide 66 text

66 It could be because the experience for the Mobile users is better or there are specific values that only Mobile users are getting out of it. PC Users Mobile Users

Slide 67

Slide 67 text

67 On the other hand, there seems be some problems for PC users. PC Users Mobile Users

Slide 68

Slide 68 text

68 Let’s try!

Slide 69

Slide 69 text

69 We’ll investigate if there is a difference between Mac users and Windows users in terms of the survival curve.

Slide 70

Slide 70 text

70 Assign the ‘os’ column to Color and run it.

Slide 71

Slide 71 text

71 The survival curve of the Mac users seems to be better. Mac Windows

Slide 72

Slide 72 text

72 But is it really???

Slide 73

Slide 73 text

73 Let’s take a look at the difference at the 1st month.

Slide 74

Slide 74 text

74 Beginning of 1st Month End of 1st Month 70% If we had 3 out of 10 customers churned the survival rate becomes 70%.

Slide 75

Slide 75 text

75 70%

Slide 76

Slide 76 text

But, if 1 of the 3 churned customers decided not to churn in the last minute, the survival rate would become 80%. 76 80% Beginning of 1st Month End of 1st Month

Slide 77

Slide 77 text

Whether one customer churns or not can make the survival rate very different. 77 80%

Slide 78

Slide 78 text

In the opposite scenario, what if one of the retained customers decided to churn at the last minute? The survival rate would become 60%. 78 60% Beginning of 1st Month End of 1st Month

Slide 79

Slide 79 text

In this case, the survival rate of the Mac customers would go down and become worse than the Windows customers. 79 60%

Slide 80

Slide 80 text

80 Just a few customers’ churn can influence the survival rates a lot. And that can make the Mac customers’ survival rates better or worse than the Windows customers.

Slide 81

Slide 81 text

81 Now, what if we had 100 customers instead of 10?

Slide 82

Slide 82 text

82 70% If we had 30 out of 100 customers churned the survival rate becomes 70%. Beginning of 1st Month End of 1st Month

Slide 83

Slide 83 text

83 71% If one of the churned customers decided to not churn at the last minute, the survival rate becomes 71%. Beginning of 1st Month End of 1st Month

Slide 84

Slide 84 text

The survival curve for the Mac customers doesn’t make much difference. 84 71%

Slide 85

Slide 85 text

2݄ 85 1݄ 69% In the opposite scenario, if one of the retained customers decided to churn at the last minute, the survival rate becomes 69%.

Slide 86

Slide 86 text

86 Again, the survival curve for the Mac customers doesn’t make much difference. 69%

Slide 87

Slide 87 text

When we have many customers then the difference caused by a few customers won’t make much of the difference on the survival curve. 87

Slide 88

Slide 88 text

88 The survival rates vary, and whether a few customers churn or not can make the survival rates different. The difference is depending on how many customers we have. So we can’t conclude that they’re different just because a given pair of the survival curves are different. How can we compare the two and conclude if there is a meaningful difference or not?

Slide 89

Slide 89 text

We want to show a range for the survival curve so that we can say the ‘True’ survival curve will reside within this range. Then we can compare the two survival curves with such ranges and see if one is really better than the other. 89 Confidence Interval!

Slide 90

Slide 90 text

We typically do this for Mean or Ratio.

Slide 91

Slide 91 text

3.3 3.4 3.6 3.5 3.7 True Mean Sample Mean

Slide 92

Slide 92 text

3.3 3.4 3.6 3.5 3.7 True Mean Sample Mean 95% Confidence Interval

Slide 93

Slide 93 text

What is 95% Confidence Interval?

Slide 94

Slide 94 text

94 Take many samples and calculate the 95% Confidence Interval for each group.

Slide 95

Slide 95 text

95 True Mean 95% of these confidence intervals should include the true mean of the population. }Sample

Slide 96

Slide 96 text

96

Slide 97

Slide 97 text

97 We happen to be looking at one of the sample and its mean and its confidence interval. } True Mean Sample

Slide 98

Slide 98 text

We can do the same thing for the Survival Rates

Slide 99

Slide 99 text

99 Take a look at ‘Visualizing Uncertainty’ seminar for more details on the confidence interval!

Slide 100

Slide 100 text

100

Slide 101

Slide 101 text

101 Turn on the Confidence Interval to show the range along with the curves.

Slide 102

Slide 102 text

102 The confidence intervals of the two are not overlapping up through the 3rd month.

Slide 103

Slide 103 text

103 If they’re not overlapping there is a significant difference between the two.

Slide 104

Slide 104 text

104 But, they overlap after the 4th month.

Slide 105

Slide 105 text

105 So we can conclude that there is no much difference between the two after the 4th month.

Slide 106

Slide 106 text

106 How about the difference among the countries?

Slide 107

Slide 107 text

107 Assign the ‘country’ column to the Color.

Slide 108

Slide 108 text

108 One thing to note…

Slide 109

Slide 109 text

109 There are 87 countries.

Slide 110

Slide 110 text

If we draw the survival curves for all the countries…. 110

Slide 111

Slide 111 text

This time, we’ll keep only the 3 most frequent countries and put all the other countries in a group called ‘Others’. 111

Slide 112

Slide 112 text

112 When you assign a column with many unique values it will automatically create ‘Other’ group. You can click the green text to adjust the setting.

Slide 113

Slide 113 text

113 We can set 3 for ‘Number of Most Frequents’.

Slide 114

Slide 114 text

Now we have the survival curves for the top 3 countries and ‘other’ group. 114

Slide 115

Slide 115 text

115 Looks the curve for United Kingdom is better than the others.

Slide 116

Slide 116 text

The 95% confidence intervals are overlapping on each other. 116

Slide 117

Slide 117 text

If we show only the United Kingdom and Others…. 117

Slide 118

Slide 118 text

Given that they don’t overlap up through the 2nd month there is a significant difference between United Kingdom and Others. 118

Slide 119

Slide 119 text

119 One more thing…

Slide 120

Slide 120 text

120 One of the things we wanted to know is whether the customer retention is getting better or worse over time. Are the recent customers tend to churn more or less compared to the older customers?

Slide 121

Slide 121 text

121 We can create the cohorts based on when the customers converted.

Slide 122

Slide 122 text

122 Converted Month

Slide 123

Slide 123 text

In SaaS / Subscription business, ‘Cohort Analysis’ generally means analyzing the survival curves by the cohort of the converted month (or year). 123

Slide 124

Slide 124 text

124 If the survival curves of the recently converted customers are getting better (less steep) that is a sign of your service or product is improving for the customers.

Slide 125

Slide 125 text

125 On the other hand, if the survival curves of the recently converted customers are getting worse (steeper) than the older ones that’s a sign of problems for your product, customer service, user experience, etc.

Slide 126

Slide 126 text

Let’s do it! 126

Slide 127

Slide 127 text

127 We want to assign the ‘first_date’ column to color to create cohorts based on the converted month, but…

Slide 128

Slide 128 text

128 We can’t find the column in the dropdown because it is already assigned to the Start Date.

Slide 129

Slide 129 text

129 We can copy the ‘First_Date’ column as a different name!

Slide 130

Slide 130 text

130 Select ‘Create Calculation’ from the column header of the ‘first_date’ column.

Slide 131

Slide 131 text

Confirm that the ‘first_date’ column is entered in the editor. Type the new column name and Enter. 131

Slide 132

Slide 132 text

132 Now we have a new column ‘joined_date’ which has the same value as the last_date column.

Slide 133

Slide 133 text

133 Before assigning the new column to Color we need to move the Pin, otherwise the new column won’t show up in the list.

Slide 134

Slide 134 text

134 Assign the ‘first_date’ column to the Color.

Slide 135

Slide 135 text

135 Change the scale to ‘Month’.

Slide 136

Slide 136 text

136 Now we have the survival curves drawn up by the cohorts based on the converted month.

Slide 137

Slide 137 text

Next Seminar

Slide 138

Slide 138 text

EXPLORATORY Online Seminar #44 5/12/2021 (Wed) 11AM PT Cohort Analysis Part 3 Customer Lifetime Value

Slide 139

Slide 139 text

139 For Subscription businesses, it’s important to know how much you can make as revenue from your customers so that you can calculate how much you can spend to acquire the customers.

Slide 140

Slide 140 text

But, calculating the Customer Lifetime Value is not as simple as multiplying the monthly (or yearly) payment by the number of payment periods. 140 Jan $10 Feb Mar Total … Apr $120? ച੾ $10 $10 $10

Slide 141

Slide 141 text

141 Not all customers become customers for the same number of periods.

Slide 142

Slide 142 text

We can use the survival rates to calculate the CLV (Customer Lifetime Value) more accurately. 142

Slide 143

Slide 143 text

And we can calculate the CLV for each cohort (or group). 143

Slide 144

Slide 144 text

No content

Slide 145

Slide 145 text

Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

Slide 146

Slide 146 text

Q & A 146

Slide 147

Slide 147 text

EXPLORATORY 147