Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pubcon Vegas 2024: Intro to BigQuery

Pubcon Vegas 2024: Intro to BigQuery

Presentation given at Pubcon Las Vegas 2024. The topics covered include:

- Is BigQuery worth the headache? What you get from leveraging this tool.
- Setting up your first account, and connecting to GA4 and GSC
- Setting up budget alerts so you don't get caught by surprise
- Diving into the data you have and making sense of it (how is GA4 data structured in BQ for example)
- Brief intro to SQL query structure, with a tools showcase to build queries for you (cause why do the work when a tool can do it for you?!)
- Sample query build and walkthrough to Looker Studio report

Sam Torres

October 16, 2024
Tweet

More Decks by Sam Torres

Other Decks in Marketing & SEO

Transcript

  1. @Pubcon @SamTorresATL Hi! I’m Sam Torres Gray Dot Co Official

    role Chief Digital Officer at Gray Dot Co Unofficial (but important) roles Data & dev nerd SEO mermaid Day-to-day focus Creating solutions where search, data, and development intersect. @Pubcon @SamTorresATL
  2. @Pubcon @SamTorresATL Gray Dot Co Have you ever felt like

    shaking your fist at… @Pubcon @SamTorresATL
  3. @Pubcon @SamTorresATL Well, some good news… Gray Dot Co Own

    your data Work with bigger datasets Scale without limitations Connect data from multiple sources I’ll share how BigQuery solves many of the internal data pain points we’ve all experienced. So you can… @Pubcon @SamTorresATL
  4. @Pubcon @SamTorresATL Gray Dot Co We’ll cover… Is BigQuery worth

    it? Setting up your account & connecting to data sources Navigating BigQuery Building aggregate tables How BigQuery pricing works @Pubcon @SamTorresATL
  5. @Pubcon @SamTorresATL First things first, what is it? Gray Dot

    Co “BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery’s serverless architecture lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.” @Pubcon @SamTorresATL
  6. @Pubcon @SamTorresATL Just kidding, I fixed it… Gray Dot Co

    “BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery’s serverless architecture lets you use SQL queries to answer your organization’s biggest questions with zero infrastructure management. BigQuery’s scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.” @Pubcon @SamTorresATL
  7. @Pubcon @SamTorresATL The benefits of leveraging BigQuery Gray Dot Co

    Unsampled data Data scales as you do 1 warehouse for many sources Lower costs & budget alerts Big requests in seconds Access data back to day 1 Platforms report sampled data Data setup isn’t scalable Data is in many places High data-storage costs Big requests break or are slow Historical data is limited Before BQ With BQ @Pubcon @SamTorresATL
  8. @Pubcon @SamTorresATL Create a billing account in Google Cloud Gray

    Dot Co Go to console.cloud.google.com and select Billing SETTING UP YOUR ACCOUNT @Pubcon @SamTorresATL
  9. @Pubcon @SamTorresATL BigQuery’s pricing structure Gray Dot Co Storage Computation

    How much data is contained within BQ Free tier gets you 10 GB of data storage How much handling of that data is BQ doing Free tier gets you 1Tb of computation or processing
  10. @Pubcon @SamTorresATL The sneaky build of compute costs Gray Dot

    Co This single branded impression chart is filtering for specific words across queries, grouping according to month and summing impressions.
  11. @Pubcon @SamTorresATL 1. Create a budget Gray Dot Co It

    can be for one or more projects, or the whole cloud account (which is the default).
  12. @Pubcon @SamTorresATL 1. Create a budget Gray Dot Co It

    can be for one or more projects, or the whole cloud account (which is the default).
  13. @Pubcon @SamTorresATL 2. Specify a budget Gray Dot Co Note

    that you can set this for specific organization, projects or even services (BigQuery is a service).
  14. @Pubcon @SamTorresATL 3. Create alert thresholds Gray Dot Co Need

    to send alerts to different emails than the project owner? You can do that here.
  15. @Pubcon @SamTorresATL Sample Budget Alerts Gray Dot Co Set cost

    thresholds: • Last month’s spend • ANY spending (my personal favorite!) Limit to specific services: • BigQuery Storage • BigQuery Data Transfer Service: Responsible for the scheduled queries
  16. @Pubcon @SamTorresATL Setting up your account & connecting to data

    sources Gray Dot Co @Pubcon @SamTorresATL
  17. @Pubcon @SamTorresATL 1. Create a project in Google Cloud Gray

    Dot Co CONNECTING TO GSC @Pubcon @SamTorresATL
  18. @Pubcon @SamTorresATL 2. Enable the BigQuery API Gray Dot Co

    CONNECTING TO GSC Can’t find this page? Go to > APIs & Services and search for “BigQuery” @Pubcon @SamTorresATL
  19. @Pubcon @SamTorresATL 2. Add your project ID Gray Dot Co

    CONNECTING TO GSC Find your ID in the Project Overview page: > Cloud Overview
  20. @Pubcon @SamTorresATL 3. Set the name & location Gray Dot

    Co CONNECTING TO GSC Useful if you’re bringing in multiple GSC properties Set the country that your data governance is based in
  21. @Pubcon @SamTorresATL 1. Create a project in Google Cloud Gray

    Dot Co CONNECTING TO GA4 Or you can use the same project. Consider data storage size and separate budget alerts/ ceiling needs.
  22. @Pubcon @SamTorresATL 3. Configure streams & events Gray Dot Co

    CONNECTING TO GA4 Selecting the right location is extremely important!
  23. @Pubcon @SamTorresATL 3. Configure streams & events Gray Dot Co

    CONNECTING TO GA4 Limit is 1M events per day. Take what you need.
  24. @Pubcon @SamTorresATL For GSC, your data includes Gray Dot Co

    Site Data Individual URL Data Great for seeing total site performance Includes page-level performance Where SERP Features data is found
  25. @Pubcon @SamTorresATL In GA4, your data includes Gray Dot Co

    Table PER day Nested columns (the details that apply to specific types of data)
  26. @Pubcon @SamTorresATL How do I wade through all of these

    tables?! Gray Dot Co @Pubcon @SamTorresATL
  27. @Pubcon @SamTorresATL The basics of query structure Gray Dot Co

    Retrieve data from tables or expressions. SELECT <field>,<field> FROM <database table> WHERE <field>=’Value’ ORDER BY <field> Which table(s) should the data be retrieved from? Which rows should be retrieved? Which columns should dictate sorting, in which direction?
  28. @Pubcon @SamTorresATL A query can be…simple Gray Dot Co SELECT

    * FROM `table.searchconsole.searchdata_ site_impression` WHERE data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) AND DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) AND country = 'usa' AND query IS NOT NULL
  29. @Pubcon @SamTorresATL A query can be…complex Gray Dot Co SELECT

    CASE WHEN session_source IS NULL THEN 'Direct' WHEN REGEXP_CONTAINS(session_campaign_name, 'cross-network') THEN 'Cross-network' WHEN ( REGEXP_CONTAINS( session_source, 'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart' ) OR REGEXP_CONTAINS( session_campaign_name, '^(.*(([^a-df-z]|^)shop|shopping).*)$' ) ) AND REGEXP_CONTAINS(session_medium, '^(.*cp.*|ppc|paid.*)$') THEN 'Paid Shopping' WHEN REGEXP_CONTAINS( session_source, 'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex' ) AND REGEXP_CONTAINS(session_medium, '^(.*cp.*|ppc|paid.*)$') THEN 'Paid Search' WHEN REGEXP_CONTAINS( session_source, 'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp' ) AND REGEXP_CONTAINS(session_medium, '^(.*cp.*|ppc|paid.*)$') THEN 'Paid Social' WHEN REGEXP_CONTAINS( session_source, 'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube' ) AND REGEXP_CONTAINS(session_medium, '^(.*cp.*|ppc|paid.*)$') THEN 'Paid Video' WHEN session_medium IN ( 'display', 'banner', 'expandable', 'interstitial', 'cpm' ) THEN 'Display' WHEN REGEXP_CONTAINS( session_source, 'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart' ) OR REGEXP_CONTAINS( session_campaign_name, '^(.*(([^a-df-z]|^)shop|shopping).*)$' ) THEN 'Organic Shopping' WHEN REGEXP_CONTAINS( session_source, 'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp' ) OR session_medium IN ( 'social', 'social-network', 'social-media', 'sm', 'social network', 'social media' ) THEN 'Organic Social' WHEN REGEXP_CONTAINS( session_source, 'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube' ) OR REGEXP_CONTAINS(session_medium, '^(.*video.*)$') THEN 'Organic Video' WHEN REGEXP_CONTAINS( session_source, 'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex' ) OR session_medium = 'organic' THEN 'Organic Search' WHEN REGEXP_CONTAINS(session_source, 'email|e-mail|e_mail|e mail') OR REGEXP_CONTAINS(session_medium, 'email|e-mail|e_mail|e mail') THEN 'Email' WHEN session_medium = 'affiliate' THEN 'Affiliates' WHEN session_medium = 'referral' THEN 'Referral' WHEN session_medium = 'audio' THEN 'Audio' WHEN session_medium = 'sms' THEN 'SMS' WHEN session_medium LIKE '%push' OR REGEXP_CONTAINS(session_medium, 'mobile|notification') THEN 'Mobile Push Notifications' ELSE 'Unassigned' END AS session_channel, DATE, device_category, COUNT(DISTINCT session_id) AS sessions, COUNT(DISTINCT new_users) AS new_users, COUNT(DISTINCT user_pseudo_id) AS total_users, SUM(purchases) AS purchases, SUM(revenue) AS revenue, COUNT( DISTINCT CASE WHEN new_users IS NOT NULL AND total_purchasers IS NOT NULL THEN user_pseudo_id ELSE NULL END ) AS first_time_purchasers, COUNT(DISTINCT total_purchasers) AS total_purchasers
  30. @Pubcon @SamTorresATL Either way, tools can write it for you

    Gray Dot Co I generally use GA4SQL but that’s just one example.
  31. @Pubcon @SamTorresATL Gray Dot Co Enter…aggregated tables! CREATE TABLES AND

    JOB TO RUN DAILY CONNECT TO LOOKER STUDIO USE YOUR MEASUREMENT PLAN
  32. @Pubcon @SamTorresATL 1. Decide what kind of data you want

    Gray Dot Co CREATING AGGREGATE TABLES GSC - Queries & metrics aggregated by: • Page • Device Type • Country GA4 - Event data aggregated by: • Entry page • Page path • Channel grouping Start with only a few. You can always add more later - and the query will go through your entire set of data (meaning you can aggregate retroactively!)
  33. @Pubcon @SamTorresATL 2. Run the SQL query in Explorer Gray

    Dot Co CREATING AGGREGATE TABLES Here, you can see the compute cost. If you’re testing, try adding a limit to your query until you’re happy with the data you’re getting. Then, open the query (and number of results). LIMIT 1000
  34. @Pubcon @SamTorresATL 5. Schedule the new query Gray Dot Co

    CREATING AGGREGATE TABLES Don’t remember your Table ID?
  35. @Pubcon @SamTorresATL 6. Use your shiny new tables for Looker

    Studio Gray Dot Co CREATING AGGREGATE TABLES • Don’t forget that you can blend tables in Looker Studio too! • Why does this save on processing & time? ◦ Calculations happen in BQ ◦ Larger grouping calculations only happen once a day
  36. @Pubcon @SamTorresATL Gray Dot Co Thank you! LinkedIn /in/samantha-torres-seo /company/graydotco

    X (Formerly Twitter) @SamTorresATL @GrayDotCo @Pubcon @SamTorresATL