Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #27: Google Analytics meets Data Science

Exploratory Seminar #27: Google Analytics meets Data Science

Introducing how to query and import data from Google Analytics and how to use Data Science methods to find deeper insights from your Google Analytics data.

- Introduction to Google Analytics Data
- Visualizing Google Analytics Data
- User Engagement (DAU/MAU) Analysis
- Text Analysis
- Time Series Forecasting with Prophet

Kan Nishida

March 12, 2020
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  4. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  5. Google Analytics Data is Treasure Trove • The pre-built dashboards

    on the Google Analytics page are optimized for general purpose, they are not designed to answer your questions you need to answer for your business. • By downloading the data, visualizing it from various perspectives, wrangling it flexibly, and applying various analytics methods quickly, you can gain deeper insights to answer your own questions.
  6. 1. Select View 2. Select Period 3. Select Dimensions &

    Measures 4. Select Segment (Optional) 5. Import Import Google Analytics Data
  7. • Users who have visited a product page. • Users

    who have converted. • Sessions that came from Google Search (Organic) • Sessions that came from mobile devices Example Segments
  8. Dimensions: They are the attributes of what you are interested

    in measuring for. Landing Page, Country, Device Type, Source, etc. Metrics: Quantitative measurements of what you are interested. Number of Sessions, Page Views, Bounce Rates, Conversion Rates, etc.
  9. Scope • Scope is how Google Analytics collects data. Google

    Analytics collects data at various different levels and summarize the data by the levels. • Each dimension and measure belongs to a particular level of scope. • This means that when you mix the dimensions and measures that belong to different levels you will get inaccurate data.
  10. Top Page Page A Page D Page A Page D

    Add to Cart Purchase Confirm Page Top Page Page A Page B
  11. User : It uses each web browser as a proxy

    of each user. This level of data is measured across the sessions. Session : Collected per Visit. The activities during the time a given user comes to the site and exits. Hit : Collected per Action. The data about each action such as Click.
  12. Session can end without a clear exit • If you

    kept opening the same page with no activity for 30 minutes then the session is considered as Exit. • When the day changes, passing midnight. • Even within the same 30 minutes duration with the same web browser, if you revisit the same site from different source (e.g. Google Search vs. Facebook) a new session starts.
  13. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  14. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  15. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  16. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  17. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  18. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  19. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  20. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  21. • Limit Values - Top 10 • Limit Values -

    Condition • Create ‘Other’ Group • Highlight A few options to address
  22. The Common Challenges for Visualizing GA Data • Want to

    aggregate data by specific date and time levels. • Want to group the data into multiple groups and compare them. • Too many unique values for dimensions. • Want to visualize the variation and the uncertainty of data rather than the summary values.
  23. • Extract Text • Remove Text • Replace Text •

    Convert Text - lower case / UPPER CASE / Title Case
  24. Clean up the Text! - Handle multiple tags - Convert

    to Title Case - Remove Text - Remove double quotes Remove Leasing/Training Spaces
  25. Lagging Indicator • Metrics to Confirm • It’s too late

    when you find out Leading Indicator • Metrics to Predict • You can take actions when you find out
  26. DAU

  27. MAU

  28. • Active User Metrics (DAU, MAU, etc.) are not good

    indicators of the engagement. • As business grows, they tend to grow regardless of whether users are more engaged or not. • Sales and Marketing can improve the activity, but not necessary the engagement. Activity ≠ Engagement
  29. A Engagement Metric • Measure how often the same users

    visit the site or use the service. • It was popularized by Facebook who was using it to grow the user base at the early stage. DAU / MAU
  30. 115 1 Day Active Users / 30 Day Active Users

    And calculate inside Exploratory!
  31. Due to the scope limitation, we can’t get these metrics

    data in the same query. (They are aggregated at different levels (1 day vs. 30 days).
  32. 121

  33. Name Page View Bootcamp 15 Bootcamp 100 Bootcamp 20 Name

    Page View Not Bootcamp 20 Not Bootcamp 95 Not Bootcamp 30 Name Sales Bootcamp 15 Bootcamp 100 Bootcamp 20 Not Bootcamp 20 Not Bootcamp 95 Not Bootcamp 30 Merge
  34. • Statistical Modeling • Prediction with Machine Learning Models •

    Time Series Forecasting • Causal Impact Analysis • Clustering Analytics
  35. • Statistical Modeling • Prediction with Machine Learning Models •

    Time Series Forecasting • Causal Impact Analysis • Clustering Analytics
  36. • Want to find out how we can prepare our

    web service infrastructure in order to keep the current performance level. • Number of servers, whether moving to higher spec machines, should we create regional (Japan, France, etc.) servers, etc. • Preparing more servers and higher spec machines will increase the cost. • We want to minimize the cost of adding more servers, but at the same time it will damage our business if we are not ready for higher demands. Challenge
  37. • Want to forecast the page accesses in the next

    few months. • Based on the forecast, we can plan to add more servers or reduce the number of servers. • If we can forecast by region, then we can allocate adequate number of machines in the area where we expect higher demands. Challenge
  38. • A ‘curve fitting’ algorithm to build time series forcasting

    models. • Designed for ease of use without expert knowledge on time series forecasting or statistics. • Built by Data Scientists (Sean J. Taylor & co.) at Facebook and open sourced. (https:// facebook.github.io/prophet) Prophet Sean J. Taylor @seanjtaylor
  39. Build a model by finding a best smooth line which

    can be represented as sum of the following components. • Overall growth trend • Seasonality - Yearly, Weekly, Daily, etc. • Holiday effects - X’mas, New Year, July 4th, etc. • External Predictors Prophet - Additive Model
  40. The blue line is the actual data (Sales), and the

    orange line is the forecasted data.
  41. Under the Trend tab, you can see the overall trend

    that is used by the model. The blue line is the actual (Sales) data, and the green line is the trend.
  42. Google Analytics with Prophet • Get the Daily Page View

    data from Google Analytics. • Run a time series forecasting model with Prophet under Analytics view.