Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #27: Google Analytics meets Data Science

Exploratory Seminar #27: Google Analytics meets Data Science

Introducing how to query and import data from Google Analytics and how to use Data Science methods to find deeper insights from your Google Analytics data.

- Introduction to Google Analytics Data
- Visualizing Google Analytics Data
- User Engagement (DAU/MAU) Analysis
- Text Analysis
- Time Series Forecasting with Prophet

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

March 12, 2020
Tweet

Transcript

  1. Exploratory Seminar #27 Data Science X

  2. 2 EXPLORATORY

  3. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  4. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  5. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  6. Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics /

    Machine Learning) Data Analysis Data Science Workflow
  7. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  8. Exploratory Seminar #27 Data Science X

  9. None
  10. None
  11. Google Analytics Data is Treasure Trove • The pre-built dashboards

    on the Google Analytics page are optimized for general purpose, they are not designed to answer your questions you need to answer for your business. • By downloading the data, visualizing it from various perspectives, wrangling it flexibly, and applying various analytics methods quickly, you can gain deeper insights to answer your own questions.
  12. Communication Data Access Data Wrangling Visualization Analytics (Statistics / Machine

    Learning) Data Analysis with EXPLORATORY
  13. Data Access

  14. 1. Select View 2. Select Period 3. Select Dimensions &

    Measures 4. Select Segment (Optional) 5. Import Import Google Analytics Data
  15. Import Data

  16. None
  17. Select Account, Property, and View.

  18. Select Period.

  19. Select Segment

  20. Segment

  21. Create Segments in Google Analytics

  22. • Users who have visited a product page. • Users

    who have converted. • Sessions that came from Google Search (Organic) • Sessions that came from mobile devices Example Segments
  23. Dimensions & Metrics

  24. Select Dimensions & Metrics

  25. Dimension Metric

  26. Dimensions: They are the attributes of what you are interested

    in measuring for. Landing Page, Country, Device Type, Source, etc. Metrics: Quantitative measurements of what you are interested. Number of Sessions, Page Views, Bounce Rates, Conversion Rates, etc.
  27. Scope

  28. Scope • Scope is how Google Analytics collects data. Google

    Analytics collects data at various different levels and summarize the data by the levels. • Each dimension and measure belongs to a particular level of scope. • This means that when you mix the dimensions and measures that belong to different levels you will get inaccurate data.
  29. Top Page Page A Page D Page A Page D

    Add to Cart Purchase Confirm Page Top Page Page A Page B
  30. User : It uses each web browser as a proxy

    of each user. This level of data is measured across the sessions. Session : Collected per Visit. The activities during the time a given user comes to the site and exits. Hit : Collected per Action. The data about each action such as Click.
  31. Session can end without a clear exit • If you

    kept opening the same page with no activity for 30 minutes then the session is considered as Exit. • When the day changes, passing midnight. • Even within the same 30 minutes duration with the same web browser, if you revisit the same site from different source (e.g. Google Search vs. Facebook) a new session starts.
  32. Select Dimensions & Measures

  33. Show only the selected Dimensions & Measures

  34. Click the Run button to preview the data.

  35. Click the Save button to import the data.

  36. It will show the summary information once the data is

    imported.
  37. Communication Data Access Data Wrangling Visualization Analytics (Statistics / Machine

    Learning) Data Analysis with EXPLORATORY
  38. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  39. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  40. Round by Day

  41. Round by Week

  42. Round by Month

  43. What is this drop!?!?

  44. This month has only 10 days and is not complete.

  45. None
  46. Remove ‘This Month’ Data

  47. None
  48. Extract Month

  49. Extract Day of Week

  50. Different Level for X-Axis, Color, and Repeat By

  51. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  52. Break Down with Repeat By

  53. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  54. Too many unique values for dimensions.

  55. None
  56. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  57. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  58. None
  59. Limit to Top 30 Countries

  60. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  61. None
  62. None
  63. None
  64. Limit values with more than 1,000 sessions.

  65. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  66. Too Many Values (Lines)

  67. None
  68. None
  69. Limit to only the United States and Japan

  70. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  71. Sometimes, too many lines is not a bad thing if

    you can highlight.
  72. None
  73. Highlight

  74. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  75. None
  76. None
  77. None
  78. None
  79. None
  80. None
  81. Communication Data Access Data Wrangling Visualization Analytics (Statistics / Machine

    Learning) Data Analysis with EXPLORATORY
  82. • Text Data Wrangling • Joining • Merging

  83. Text data almost always needs Clean Up!

  84. • Extract Text • Remove Text • Replace Text •

    Convert Text - lower case / UPPER CASE / Title Case
  85. Text Wrangling - Extract

  86. Text Wrangling - Remove

  87. Example Visualize Search Parameter Text

  88. Landing Page Path

  89. If you visualize the path as is…

  90. Extract Parameter Value

  91. Extract Parameter Value

  92. Extract Parameter Value

  93. None
  94. Clean up the Text! - Handle multiple tags - Convert

    to Title Case - Remove Text - Remove double quotes Remove Leasing/Training Spaces
  95. None
  96. None
  97. None
  98. None
  99. None
  100. Visualize it with Word Cloud

  101. • Text Data Wrangling • Joining • Merging

  102. Engagement

  103. Goal • Improve the Conversion Rate • Reduce the Churn

    Rate
  104. None
  105. Lagging Indicator • Metrics to Confirm • It’s too late

    when you find out Leading Indicator • Metrics to Predict • You can take actions when you find out
  106. Lagging Indicator • Conversion Rate • Churn Rate Leading Indicator

    • Engagement
  107. Goal: Want to improve the engagement Challenge: How can we

    measure the engagement?
  108. DAU

  109. MAU

  110. • Active User Metrics (DAU, MAU, etc.) are not good

    indicators of the engagement. • As business grows, they tend to grow regardless of whether users are more engaged or not. • Sales and Marketing can improve the activity, but not necessary the engagement. Activity ≠ Engagement
  111. A Engagement Metric • Measure how often the same users

    visit the site or use the service. • It was popularized by Facebook who was using it to grow the user base at the early stage. DAU / MAU
  112. DAU / MAU /

  113. How can we do with Google Analytics data?

  114. We can use ‘1 Day Active Users (User)’ and ’30

    Day Active Users (User)’.
  115. 115 1 Day Active Users / 30 Day Active Users

    And calculate inside Exploratory!
  116. However…, there is one problem.

  117. Due to the scope limitation, we can’t get these metrics

    data in the same query. (They are aggregated at different levels (1 day vs. 30 days).
  118. Get DAU Get MAU So, we can get those metrics

    data separately…
  119. Get DAU Get MAU Then, join them together later. Join

  120. Join Get DAU Get MAU Join

  121. 121

  122. Get DAU Get MAU Join Calculate DAU / MAU

  123. Create Calculation Get DAU Get MAU Calculate DAU / MAU

    Join
  124. DAU / MAU

  125. Visualize the Engagement Trend

  126. Add a ‘Smooth Curve’ trendline with LOESS method.

  127. • Text Data Wrangling • Joining • Merging

  128. Import Data by specifying the Segment

  129. Name Page View Bootcamp 15 Bootcamp 100 Bootcamp 20 Name

    Page View Not Bootcamp 20 Not Bootcamp 95 Not Bootcamp 30 Name Sales Bootcamp 15 Bootcamp 100 Bootcamp 20 Not Bootcamp 20 Not Bootcamp 95 Not Bootcamp 30 Merge
  130. Merge two data frames.

  131. Data Reproducibility

  132. Data gets updated daily (or hourly). Can we automate the

    data wrangling steps? 132
  133. Click Re-Import Button to download the latest data from GA

    133
  134. Communication Data Access Data Wrangling Visualization Analytics (Statistics / Machine

    Learning) Data Analysis with EXPLORATORY
  135. • Statistical Modeling • Prediction with Machine Learning Models •

    Time Series Forecasting • Causal Impact Analysis • Clustering Analytics
  136. • Statistical Modeling • Prediction with Machine Learning Models •

    Time Series Forecasting • Causal Impact Analysis • Clustering Analytics
  137. Time Series Forecasting with Prophet

  138. • Want to find out how we can prepare our

    web service infrastructure in order to keep the current performance level. • Number of servers, whether moving to higher spec machines, should we create regional (Japan, France, etc.) servers, etc. • Preparing more servers and higher spec machines will increase the cost. • We want to minimize the cost of adding more servers, but at the same time it will damage our business if we are not ready for higher demands. Challenge
  139. • Want to forecast the page accesses in the next

    few months. • Based on the forecast, we can plan to add more servers or reduce the number of servers. • If we can forecast by region, then we can allocate adequate number of machines in the area where we expect higher demands. Challenge
  140. Visualizing Page View Trend

  141. Where will our page views be over the next few

    months?
  142. Time Series Forecasting with Prophet

  143. • A ‘curve fitting’ algorithm to build time series forcasting

    models. • Designed for ease of use without expert knowledge on time series forecasting or statistics. • Built by Data Scientists (Sean J. Taylor & co.) at Facebook and open sourced. (https:// facebook.github.io/prophet) Prophet Sean J. Taylor @seanjtaylor
  144. Build a model by finding a best smooth line which

    can be represented as sum of the following components. • Overall growth trend • Seasonality - Yearly, Weekly, Daily, etc. • Holiday effects - X’mas, New Year, July 4th, etc. • External Predictors Prophet - Additive Model
  145. Weekly Sales Trend

  146. Build a Forecasting Model with Prophet

  147. Assign Order Date to Date/Time and Sales to Value.

  148. The blue line is the actual data (Sales), and the

    orange line is the forecasted data.
  149. The last area is the forecasted period where there is

    no actual data.
  150. The default is 10 units, in this case, that is

    10 weeks.
  151. Set the Forecasting Period to 52 weeks.

  152. Under the Trend tab, you can see the overall trend

    that is used by the model. The blue line is the actual (Sales) data, and the green line is the trend.
  153. The vertical light green bars are Change Points where the

    trend changes.
  154. Under the Yearly tab you can see the Yearly Seasonality.

  155. Every year, the sales doesn’t pick up until June, then

    it goes down in July.
  156. Google Analytics with Prophet • Get the Daily Page View

    data from Google Analytics. • Run a time series forecasting model with Prophet under Analytics view.
  157. None
  158. Forecasting for the next 6 months

  159. None
  160. Forecasting for the next 12 months

  161. None
  162. Yearly Seasonality

  163. Trend

  164. with Repeat By

  165. with Repeat By

  166. Q & A

  167. Information Email kan@exploratory.io Website https://exploratory.io Twitter @KanAugust Training https://exploratory.io/training

  168. EXPLORATORY