are, is evolving into a software company, and by extension a data company. Turning everything into “data” drives innovation Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
modern BigData tools without running a dedicated team for it. Small companies should do BigData - but how? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale - Cost effective ❏ Run Ad-Hoc reports - Without Developer - interactive ❏ Minimal engineering efforts - no dedicated BigData team ❏ Simple Query language (prefered SQL / Javascript) Making analytics accessible to more companies Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Behind the Scenes: Days To Insights Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
by Google (US or EU zone) • Scales into Petabytes • Ridiculously fast • SQL 2011 Standard + Javascript UDF (User Defined Functions) • Familiar DB Structure (table, views, record, nested, JSON) • Open Interfaces (REST, ODBC, Web UI, BQ command line tool) • Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors • Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *Oct 2017 What is BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
Pipelines FluentD Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL Application Servers Servers Cloud Storage archive Load Export Replay Standard Devices HTTPS Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
1. Transform a record 2. Copy event to multiple outputs 3. Store event data in File (for backup/log purposes) 4. Stream to BigQuery (for immediate analyses) Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
@type forest subtype file </store> <store> @type bigquery </store> … </match> Filter plugin mutates incoming data. Add/modify/delete event data transform attributes without a code deploy. 1 2 3 4 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua The copy output plugin copies events to multiple outputs. File(s), multiple databases, DB engines. Great to ship same event to multiple subsystems. The Bigquery output plugin on the fly streams the event to the BigQuery warehouse. No need to write integration. Data is available immediately for querying. Whenever needed other output plugins can be wired in: Kafka, Google Cloud Storage output plugin.
host <record> bq {"insert_id":"${uid}","host":"${host}", "created":"${time.to_i}"} avg ${record["total"] / record["count"]} </record> </filter> syntax: Ruby, easy to use. Great for: - date transformation, - quick normalizations, - calculating something on the fly, and store in clear log/analytics db - renaming without code deploy. 1 Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua 2 3 4
traditional databases • On exploring unstructured data • Not a replacement to traditional DBs, but it compliments the system • Applying Javascript UDF on columnar storage to resolve complex tasks (eg: Javascript for natural language processing) • On streams (forms, Kafka, IoT streams) • Major strength is handling Large datasets Where to use BigQuery? Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
consuming user actions from using 25x more custom events/hits than Google Analytics ➢ Email engagement Having stored every open/click raw data improve: subject line, layout, follow up action emails, assistant like experience by heavy A/B Split Tests on email marketing campaigns (interactive feedback loop) ➢ Funnel Analysis Wrangle all the data to discover: a small improvement, an AI driven upsell personal like experience, pre-sell products configured on the go - not yet in catalog, but easily can be tweaked/customized Achievements - goal reached by measuring everything Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua
idle resources • No maintenance windows • No manual scaling • No file mgmt BigQuery: Serverless Data Warehouse Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua serverless data warehouse depicted
no more focus on large scale execution plan • no more throwing away-, expiring-, aggregating old data. • run raw ad-hoc queries (either by analysts/sales or Devs) • use Javascript in SQL to have an awesome BigData experience wrangling “unstructured” like nerd Our benefits Powering Interactive Data Analysis with Google BigQuery @martonkodok #dfua