Taming JSON with SQL

Taming JSON with SQL: From Raw to Results OakTable World
2015 | #otw15 2015-10-26 Greg Rahn | @GregRahn 1

About Me { "Name": "Greg Rahn", "Twitter": "@GregRahn", "CompaniesWorked": [
{ "Company": "Snowflake Computing", "NumberOfYears": 2.0 }, { "Company": "Cloudera", "NumberOfYears": 1.5 }, { "Company": "Oracle", "NumberOfYears": 8.0 } ] } 2

Why Is JSON So Popular? • JavaScript is (has been)
eating the world • Atwood’s Law: any application that can be written in JavaScript, will eventually be written in JavaScript. • REST APIs produce/consume data in JSON format (IoT, Mobile) • Application logs (feeding into Logstash or Elasticsearch) • Have you ever tried to work with (or read) XML? • A self-describing text format makes data very portable • Open Datasets (data.gov, datasf.org, data.cityofnewyork.us, Yelp) 4

Flexible Data Models 5

MongoDB Aggregation vs. SQL db.zipcodes.aggregate( [ { $group: { _id:
"$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] ) SELECT state, SUM(pop) AS totalPop FROM zipcodes GROUP BY state HAVING totalPop >= (10*1000*1000) 6

Example Data Pipeline 9 Land Load Discovery & Apply Schema
Reﬁnement & Augmentation Publish Query

If it were only that easy… 10

Challenges • Get data in w/o defining a schema •
Performing analytics on text files or string fields is slow • Variable schema data does not fit well with many formats • Need ANSI SQL support and useful extensions & functions for JSON 11

Numerous Options 12

Thinking Differently • Native ingest support for JSON, Avro, &
XML • No schema required — discovered on ingest • VARIANT data type • Optimized storage for both relational and complex types • Optimizations used across both formats 13

SQL> _

Summary 16

Self-Defined Redefined • We can land and ingest "raw" JSON
• We can have data pipelines with both self-defined and pre-defined schemas • We can have SQL on self-defining data and complex types (JSON, etc.) • We can be agile without building fragile systems — taking the best of both worlds • Have your JSON and SQL too! 17

Thank you! Questions? @GregRahn https://speakerdeck.com/gregrahn @SnowﬂakeDB https://snowﬂake.net 18

Taming JSON with SQL

Taming JSON with SQL

Greg Rahn

More Decks by Greg Rahn

Other Decks in Technology

Featured

Transcript