Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Denys Tkalich] Using BigQuery as a data warehouse in B2B startup

[Denys Tkalich] Using BigQuery as a data warehouse in B2B startup

Presentation from GDG DevFest Ukraine 2018 - the biggest community-driven Google tech conference in the CEE.

Learn more at: https://devfest.gdg.org.ua

__

Setup and usage of Google BigQuery and ETLs in B2B startup Reply.

Google Developers Group Lviv

October 13, 2018
Tweet

More Decks by Google Developers Group Lviv

Other Decks in Programming

Transcript

  1. Case study
    Denys Tkalich
    Data Analyst @ Reply
    Using Google BigQuery as
    Data Warehouse in B2B startup

    View Slide

  2. #dfua
    Tasks
    1. Revenue attribution to different traffic channels and ad campaigns.
    2. Marketing and product dashboards.

    View Slide

  3. #dfua
    Data Sources
    1. User data in backend database (Microsoft Azure).
    2. Website and web app sessions/hits/events data in Google Analytics.
    3. Ad cost data in Google AdWords, AdRoll and Facebook Ads.
    4. Payments and subscriptions data in Chargebee.
    5. Customer data in Prosperworks Copper.

    View Slide

  4. #dfua
    Solution
    Start collecting all company’s data in a data warehouse

    View Slide

  5. #dfua
    Data Warehouse requirements
    1. Fast implementation.
    2. Cheap (or even free).
    3. Reliable.

    View Slide

  6. #dfua
    Image source: huntedhive.com
    INEXPENSIVE
    HIGH
    QUALITY
    FAST
    EXPENSIVE
    SLOW
    POOR
    QUALITY
    IDEAL

    View Slide

  7. #dfua
    Image source: huntedhive.com
    INEXPENSIVE
    HIGH
    QUALITY
    FAST
    EXPENSIVE
    SLOW
    POOR
    QUALITY
    BigQuery

    View Slide

  8. #dfua
    Why Google BigQuery
    1. Initial setup takes minutes.
    2. It’s fast.
    3. 3rd party integrations.
    4. Built-in support of BigQuery in Google Data Studio.

    View Slide

  9. #dfua
    Why Google BigQuery
    5. Price!
    Active storage $0.02 per GB The first 10 GB is free each month.
    Long-term
    storage
    $0.01 per GB The first 10 GB is free each month.
    Streaming
    Inserts
    $0.01 per 200 MB
    You are charged for rows that are successfully
    inserted. Individual rows are calculated using a 1
    KB minimum size.
    Queries
    (analysis)
    $5 per TB
    First 1 TB per month is free. Flat-rate pricing is
    also available for high-volume customers.

    View Slide

  10. #dfua
    Schema concept
    Google
    Analytics
    BigQuery
    Azure Cost data Chargebee Copper

    View Slide

  11. #dfua
    Demo time
    Let’s have some fun with data

    View Slide

  12. #dfua
    Iteration One
    Google
    Analytics
    BigQuery
    Azure
    Cost data
    Chargebee
    Copper
    OWOX BI
    OWOX BI
    Stitch API
    API

    View Slide

  13. #dfua
    Google Analytics integration via OWOX BI
    Pros
    1. Fast setup.
    2. Raw sessions data as GA “sees” it.
    3. Retrospective user_id updates.
    4. Low price for small data.
    5. Built-in attribution.
    Cons
    1. Not always reliable.
    2. Infrequent lags up to 3 days.
    3. Might be pricy for big data.

    View Slide

  14. #dfua
    Azure integration via Stitch
    Pros
    1. Fast and easy setup.
    2. Free for small data.
    3. Flexible settings.
    4. Tables replication support.
    Cons
    1. Doesn’t support DATETIMEOFFSET.
    2. Some integrations are only in paid
    plans.
    3. No data pre-aggregation.

    View Slide

  15. #dfua
    Iteration One problems
    1. Limited data import from Azure because of Stitch free plan conditions and its
    limitations (no DATETIMEOFFSET support).
    2. No data pre-aggregation.
    3. No out-of-the-box jobs automatization in BigQuery.

    View Slide

  16. #dfua
    BigQuery manager to the rescue!

    View Slide

  17. #dfua
    What is BigQuery manager?
    SQL query
    Config
    - Source

    - Destination

    - Schedule

    - Rules
    Python script
    + +

    View Slide

  18. #dfua
    data = {
    "query" : "users_sql",
    "name" : "users_sql",
    "querytype" : "sqlreplica",
    "overwrite":1,
    "active":1,
    "rule" : “30 10 * *",
    "mode":"WRITE_TRUNCATE",
    "table" : "reply_prod_to_bigquery.Users"
    }
    r = requests.post(
    url=url+"/addconfig",
    data = json.dumps(data)
    )
    r.text
    BigQuery manager config example
    Had to add a slide with code because it’s a dev conference

    View Slide

  19. #dfua
    What is BigQuery manager?
    BigQuery
    manager
    BigQuery
    Azure

    View Slide

  20. #dfua
    Iteration Two
    Google
    Analytics
    BigQuery
    Azure
    Cost data
    Chargebee
    Copper
    OWOX BI
    OWOX BI
    API
    API
    BigQuery
    manager

    View Slide

  21. #dfua
    Iteration Two list of services
    1. Google BigQuery – Data Warehouse.
    2. OWOX BI – Cost Data → Google Analytics and Google Analytics → BigQuery
    pipelines.
    3. BigQuery manager – Azure → BigQuery pipeline and BigQuery jobs
    automation.

    View Slide

  22. #dfua
    Another demo time
    You all signed the NDA, right?

    View Slide

  23. #dfua
    Total monthly cost
    BigQuery $0*
    OWOX BI $115**
    BigQuery manager $0
    Total $115
    * up to 10GB of storage and 1TB of queries
    ** up to 100K monthly unique users

    View Slide

  24. #dfua
    Resources
    1. BigQuery for Data Warehouse Practitioners

    https://cloud.google.com/solutions/bigquery-data-warehouse
    2. Google BigQuery documentation

    https://cloud.google.com/bigquery/docs/
    3. Standard SQL reference

    https://cloud.google.com/bigquery/docs/reference/standard-sql/

    View Slide

  25. Denys Tkalich
    [email protected]
    Questions?
    Thank you!

    View Slide