Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming JSON with SQL

Greg Rahn
October 26, 2015

Taming JSON with SQL

Talk for OakTable World 2015

Greg Rahn

October 26, 2015
Tweet

More Decks by Greg Rahn

Other Decks in Technology

Transcript

  1. Taming JSON with SQL:
    From Raw to Results
    OakTable World 2015 | #otw15
    2015-10-26
    Greg Rahn | @GregRahn
    1

    View Slide

  2. About Me
    { "Name": "Greg Rahn",
    "Twitter": "@GregRahn",
    "CompaniesWorked": [
    { "Company": "Snowflake Computing",
    "NumberOfYears": 2.0 },
    { "Company": "Cloudera",
    "NumberOfYears": 1.5 },
    { "Company": "Oracle",
    "NumberOfYears": 8.0 } ]
    }
    2

    View Slide

  3. 3

    View Slide

  4. Why Is JSON So Popular?
    • JavaScript is (has been) eating the world
    • Atwood’s Law: any application that can be written in JavaScript, will
    eventually be written in JavaScript.
    • REST APIs produce/consume data in JSON format (IoT, Mobile)
    • Application logs (feeding into Logstash or Elasticsearch)
    • Have you ever tried to work with (or read) XML?
    • A self-describing text format makes data very portable
    • Open Datasets (data.gov, datasf.org, data.cityofnewyork.us, Yelp)
    4

    View Slide

  5. Flexible Data Models
    5

    View Slide

  6. MongoDB Aggregation vs. SQL
    db.zipcodes.aggregate( [
    { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
    { $match: { totalPop: { $gte: 10*1000*1000 } } }
    ] )
    SELECT state, SUM(pop) AS totalPop
    FROM zipcodes
    GROUP BY state
    HAVING totalPop >= (10*1000*1000)
    6

    View Slide

  7. View Slide

  8. View Slide

  9. Example Data Pipeline
    9
    Land Load
    Discovery &
    Apply Schema
    Refinement &
    Augmentation
    Publish Query

    View Slide

  10. If it were only that easy…
    10

    View Slide

  11. Challenges
    • Get data in w/o defining a schema
    • Performing analytics on text files or string fields is slow
    • Variable schema data does not fit well with many formats
    • Need ANSI SQL support and useful extensions & functions for JSON
    11

    View Slide

  12. Numerous Options
    12

    View Slide

  13. Thinking Differently
    • Native ingest support for JSON, Avro, &
    XML
    • No schema required — discovered on
    ingest
    • VARIANT data type
    • Optimized storage for both relational
    and complex types
    • Optimizations used across both formats
    13

    View Slide

  14. View Slide

  15. SQL> _

    View Slide

  16. Summary
    16

    View Slide

  17. Self-Defined Redefined
    • We can land and ingest "raw" JSON
    • We can have data pipelines with both self-defined and pre-defined schemas
    • We can have SQL on self-defining data and complex types (JSON, etc.)
    • We can be agile without building fragile systems — taking the best of both
    worlds
    • Have your JSON and SQL too!
    17

    View Slide

  18. Thank you!
    Questions?
    @GregRahn
    https://speakerdeck.com/gregrahn
    @SnowflakeDB
    https://snowflake.net
    18

    View Slide