Social Network Retention Analysis

Qize (Chase) Le Social Network Retention Analysis

How are New Users Look Like? 1/21/2017 2 Page Views
Follows Search Likes Re-blogs 89% 65% 28% 16% 6.6 3 Average Median* 7.5 5 3.9 3 4.5 4 * Exclude users that do not exhibit such behaviors when displayed median value 50% 5.9 6 Information Receiver Information Sharer

How are New Users Look Like?- Continue 1/21/2017 3 Original
Post Received Engagement Un-follow Retention Email Verification 18% 0.7 1 Average Median* 16% 0.4 1 9% 0.6 2 36% * Exclude users that do not exhibit such behaviors when displayed median value 34% Creator Popular User

How are New Users Look Like?- Continue 1/21/2017 4 Register
Time Devices Source 40% 36% 23% web android iphone api yahoo 16% 7% 6% 3% 8% 60% login no referer google follow page others unknown 29% 23% 28% 20% morning afternoon night late night

Differences between Active and Inactive Users 1/21/2017 5 0.3 1.1
non-active active 282% Received Engagement 4.2 10.9 non-active active 163% Page Views 0.4 1.1 non-active active 157% Original Post 2.2 6.9 non-active active 214% Likes • Active users showed 113% lift compared with non-active user in unfollows • Active users showed 113% lift compared with non-active user in searches • Active users showed 55% lift compared with non-active user in email verification

Indentify Factors correlated with Users Retention - Methodology 1/21/2017 6
Full Dataset Training Data (80%) Validation Data (20%) Build Random Forest (RF) and Gradient Boosting (GB) Model Identify Variable Importance Performance Measurement • Data cleansing, imputation, dummy variable creation will be performed if necessary • We will not perform variable transformation, feature engineering, and variable interaction, etc. • After identifying variable importance, Spearman correlation metric is calculated for sanity check • Both models produce predictions in probability. Performance measurement will be based on AUC and Gain Chart

Indentify Factors correlated with Users Retention - Results 1/21/2017 7
High Correlation (RF Imp>10%, GB Imp>8%, Corr>20%) • Pageviews • Follows • Searches • Registration Time* Medium Correlation (RF Imp>2%, GB Imp>3%, Corr>10%) • Likes • Original Posts • Reblogs • Is_verified • Receive_engagement • Unfollows** * Registration Time showed >20% importance in RF Model and >10% in GB Model, but only 5% Spearman correlation (further explained using univariate chart) ** Unfollows has 2.8% importance in GB Model, marginally meet the criteria

Indentify Factors Affecting Users Retention - Validation 1/21/2017 8 Core
Model – Use only 4 highly correlated variables Extended Model – Use both highly and medium correlated variables Random Forest Gradient Boosting 65.2% 71.5% AUC Random Forest Gradient Boosting 68.6% 73.1% AUC • Full Model (with all variables) measured 71.1% AUC in RF model, and 74.2% AUC in GB model • Both Core Model and Extended Model performed reasonably well 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 Cumulative Gain Chart RF Model GB Model 0% 20% 40% 60% 80% 100% 1 2 3 4 5 6 7 8 9 10 Cumulative Gain Chart RF Model GB Model

Business Strategy - Users Retention Early Indicator 1/21/2017 9 New
Users Page View<5 5<=Page View<=10 Page View>10 Follows<=3 Follows>3 Low Retention Med Retention High Retention 50% 34% 15% 0% 20% 40% 60% Low Med High % of Users 24% 42% 69% 0% 20% 40% 60% 80% Low Med High Retention Rate

Proposed Actionable Marketing Strategy 1/21/2017 10 View Pages (if never)
Follow Others (if never) Search a Topic (if never) Like a Post (if never) Write a Post (if never) Share other’s Post (if never) • Target low/medium retention users segment after 24 hours • Depends if users login or not, communications can be delivered on-site, or through emails • Key Metric can be retention rate, # of login, etc. Test Group Control Group No Communications

Appendix 1/21/2017 11

Univariate Plot – Page Views 1/21/2017 12 • Empirical Logit
is defined as: log((n_event+alpha)/(n_nonevent+alpha)), where alpha=0.5 0 500 1000 1500 2000 2500 3000 -2 -1.5 -1 -0.5 0 0.5 1 1.5 0 1 2 3 4 5 6 9 12 18 52 Volume Empirical Logit Variable Mean Volume Empirical Logit

Univariate Plot – Follows 1/21/2017 13 • Empirical Logit is
defined as: log((n_event+alpha)/(n_nonevent+alpha)), where alpha=0.5 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0 1 2 3 5 6 7 10 14 23 68 Volume Empirical Logit Variable Mean Volume Empirical Logit

Univariate Plot – Searches 1/21/2017 14 • Empirical Logit is
defined as: log((n_event+alpha)/(n_nonevent+alpha)), where alpha=0.5 0 1000 2000 3000 4000 5000 6000 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0 1 2 3 4 6 9 13 20 52 Volume Empirical Logit Variable Mean Volume Empirical Logit

Univariate Plot – Registration Time 1/21/2017 15 • Empirical Logit
is defined as: log((n_event+alpha)/(n_nonevent+alpha)), where alpha=0.5 555 560 565 570 575 580 585 590 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0 1 2 3 5 6 8 9 10 12 13 14 15 16 17 18 19 20 21 22 Volume Empirical Logit Variable Mean Volume Empirical Logit Possible Explanation: Internet bots register during mid-night

Social Network Retention Analysis

Social Network Retention Analysis

Qize Le

More Decks by Qize Le

Other Decks in Technology

Featured

Transcript

Qize (Chase) Le Social Network Retention Analysis

How are New Users Look Like? 1/21/2017 2 Page Views

How are New Users Look Like?- Continue 1/21/2017 3 Original

How are New Users Look Like?- Continue 1/21/2017 4 Register

Differences between Active and Inactive Users 1/21/2017 5 0.3 1.1

Indentify Factors correlated with Users Retention - Methodology 1/21/2017 6

Indentify Factors correlated with Users Retention - Results 1/21/2017 7

Indentify Factors Affecting Users Retention - Validation 1/21/2017 8 Core

Business Strategy - Users Retention Early Indicator 1/21/2017 9 New

Proposed Actionable Marketing Strategy 1/21/2017 10 View Pages (if never)

Appendix 1/21/2017 11

Univariate Plot – Page Views 1/21/2017 12 • Empirical Logit

Univariate Plot – Follows 1/21/2017 13 • Empirical Logit is

Univariate Plot – Searches 1/21/2017 14 • Empirical Logit is

Univariate Plot – Registration Time 1/21/2017 15 • Empirical Logit