Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Denys Tkalich] Using BigQuery as a data warehouse in B2B startup

[Denys Tkalich] Using BigQuery as a data warehouse in B2B startup

Presentation from GDG DevFest Ukraine 2018 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

__

Setup and usage of Google BigQuery and ETLs in B2B startup Reply.

Google Developers Group Lviv

October 13, 2018
Tweet

More Decks by Google Developers Group Lviv

Other Decks in Programming

Transcript

  1. Case study Denys Tkalich Data Analyst @ Reply Using Google

    BigQuery as Data Warehouse in B2B startup
  2. #dfua Tasks 1. Revenue attribution to different traffic channels and

    ad campaigns. 2. Marketing and product dashboards.
  3. #dfua Data Sources 1. User data in backend database (Microsoft

    Azure). 2. Website and web app sessions/hits/events data in Google Analytics. 3. Ad cost data in Google AdWords, AdRoll and Facebook Ads. 4. Payments and subscriptions data in Chargebee. 5. Customer data in Prosperworks Copper.
  4. #dfua Why Google BigQuery 1. Initial setup takes minutes. 2.

    It’s fast. 3. 3rd party integrations. 4. Built-in support of BigQuery in Google Data Studio.
  5. #dfua Why Google BigQuery 5. Price! Active storage $0.02 per

    GB The first 10 GB is free each month. Long-term storage $0.01 per GB The first 10 GB is free each month. Streaming Inserts $0.01 per 200 MB You are charged for rows that are successfully inserted. Individual rows are calculated using a 1 KB minimum size. Queries (analysis) $5 per TB First 1 TB per month is free. Flat-rate pricing is also available for high-volume customers.
  6. #dfua Google Analytics integration via OWOX BI Pros 1. Fast

    setup. 2. Raw sessions data as GA “sees” it. 3. Retrospective user_id updates. 4. Low price for small data. 5. Built-in attribution. Cons 1. Not always reliable. 2. Infrequent lags up to 3 days. 3. Might be pricy for big data.
  7. #dfua Azure integration via Stitch Pros 1. Fast and easy

    setup. 2. Free for small data. 3. Flexible settings. 4. Tables replication support. Cons 1. Doesn’t support DATETIMEOFFSET. 2. Some integrations are only in paid plans. 3. No data pre-aggregation.
  8. #dfua Iteration One problems 1. Limited data import from Azure

    because of Stitch free plan conditions and its limitations (no DATETIMEOFFSET support). 2. No data pre-aggregation. 3. No out-of-the-box jobs automatization in BigQuery.
  9. #dfua What is BigQuery manager? SQL query Config - Source


    - Destination
 - Schedule
 - Rules Python script + +
  10. #dfua data = { "query" : "users_sql", "name" : "users_sql",

    "querytype" : "sqlreplica", "overwrite":1, "active":1, "rule" : “30 10 * *", "mode":"WRITE_TRUNCATE", "table" : "reply_prod_to_bigquery.Users" } r = requests.post( url=url+"/addconfig", data = json.dumps(data) ) r.text BigQuery manager config example Had to add a slide with code because it’s a dev conference
  11. #dfua Iteration Two Google Analytics BigQuery Azure Cost data Chargebee

    Copper OWOX BI OWOX BI API API BigQuery manager
  12. #dfua Iteration Two list of services 1. Google BigQuery –

    Data Warehouse. 2. OWOX BI – Cost Data → Google Analytics and Google Analytics → BigQuery pipelines. 3. BigQuery manager – Azure → BigQuery pipeline and BigQuery jobs automation.
  13. #dfua Total monthly cost BigQuery $0* OWOX BI $115** BigQuery

    manager $0 Total $115 * up to 10GB of storage and 1TB of queries ** up to 100K monthly unique users
  14. #dfua Resources 1. BigQuery for Data Warehouse Practitioners
 https://cloud.google.com/solutions/bigquery-data-warehouse 2.

    Google BigQuery documentation
 https://cloud.google.com/bigquery/docs/ 3. Standard SQL reference
 https://cloud.google.com/bigquery/docs/reference/standard-sql/