Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Agile Database Development with JSON

Chris
June 18, 2020

Agile Database Development with JSON

An overview of JSON functionality in Oracle Database

Chris

June 18, 2020
Tweet

More Decks by Chris

Other Decks in Technology

Transcript

  1. 1 Agile Database Development with JSON Chris Saxon Developer Advocate,

    @ChrisRSaxon & @SQLDaily blogs.oracle.com/sql youtube.com/c/TheMagicofSQL asktom.oracle.com
  2. Photo by Brannon Naito on Unsplash working in short sprints

    and releasing often So we need to be Agile
  3. 7 Agile Database Development with JSON Chris Saxon Developer Advocate,

    @ChrisRSaxon & @SQLDaily blogs.oracle.com/sql youtube.com/c/TheMagicofSQL asktom.oracle.com
  4. The following is intended to outline our general product direction.

    It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Statements in this presentation relating to Oracle’s future plans, expectations, beliefs, intentions and prospects are “forward-looking statements” and are subject to material risks and uncertainties. A detailed discussion of these factors and other risks that affect our business is contained in Oracle’s Securities and Exchange Commission (SEC) filings, including our most recent reports on Form 10-K and Form 10-Q under the heading “Risk Factors.” These filings are available on the SEC’s website or on Oracle’s website at http://www.oracle.com/investor. All information in this presentation is current as of September 2019 and Oracle undertakes no duty to update any statement in light of new information or future events. Safe Harbor
  5. User Story #1 We must be able to store product

    & order details So we need to create the tables and define CRUD operations on them
  6. create table products ( product_id integer not null primary key,

    product_json ##TODO## not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json ##TODO## not null, check ( order_data is json ) ); The tables are just a primary key, JSON column, & is json constraint
  7. create table products ( product_id integer not null primary key,

    product_json ##TODO## not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json ##TODO## not null, check ( order_data is json ) ); But which data type to use for JSON?!
  8. Which data type should you use for JSON? "Small" documents

    varchar2 "Large" documents ??? <= 4,000 bytes / 32k
  9. "Small" documents varchar2 "Large" documents blob JSON data type in

    21c Avoids character set conversions Less storage than clob
  10. create table products ( product_id integer not null primary key,

    product_json blob not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json blob not null, check ( order_data is json ) );
  11. insert into products ( product_json ) values ( utl_raw.cast_to_raw (

    '{ "productName": "..." }' ) ); BLOBs need extra processing on insert
  12. select json_serialize ( product_json returning clob pretty ) jdata from

    products; JDATA { "productName": "..." } Added in 19c json_serialize converts JSON data to text; which you can pretty print for readability
  13. select json_query ( product_json, '$' returning clob pretty ) jdata

    from products; JDATA { "productName": "..." } In earlier releases use json_query The clob return type was added in 18c
  14. User Story #2 Customers must be able to search by

    price So we need to query the products table for JSON where the unitPrice is in the specified range
  15. { "productName": "GEEKWAGON", "descripion": "Ut commodo in …", "unitPrice": 35.97,

    "bricks": [ { "colour": "red", "shape": "cube", "quantity": 13 }, { "colour": "green", "shape": "cube", "quantity": 17 }, … ] } We need to search for this value in the documents
  16. select * from products p where p.product_json.unitPrice <= :max_price; But

    remember it returns varchar2 => implicit conversion! Use simple dot-notation to access the value
  17. select * from products p where json_value ( product_json, '$.unitPrice'

    returning number ) <= :max_price; json_value gives you more control So this returns number => no implicit conversion! :)
  18. select * from products p where p.product_json.unitPrice.number() <= :max_price; From

    19c you can state the return type with simple dot-notation
  19. User Story #3 Customers must be able to view their

    orders Showing order details and a list of what they bought So we need to join the order productIds to products
  20. { "customerId" : 2, "orderDatetime" : "2019-01-01T03:25:43", "products" : [

    { "productId" : 1, "unitPrice" : 74.95 }, { "productId" : 10, "unitPrice" : 35.97 }, … ] } We need to extract these from the product array
  21. select o.order_json.products[*].productId from orders o; PRODUCTS [2,8,5] [3,9,6] [1,10,7,4] ...

    With simple dot-notation, you can get an array of the values…
  22. select json_query ( order_json, '$.products[*].productId' with array wrapper ) from

    orders o; PRODUCTS [2,8,5] [3,9,6] [1,10,7,4] ... But to join these to products, we need to convert them to rows… …or with json_query
  23. with order_items as ( select order_id, t.* from orders o,

    json_table ( order_json columns ( customerId, nested products[*] columns ( productId, unitPrice ) ) ) t ) Simplified syntax 18c
  24. with order_items as ( select order_id, t.* from orders o,

    json_table ( order_json columns ( customerId, nested products[*] columns ( productId, unitPrice ) ) ) t ) This tells the database to return a row for each element in the products array…
  25. select order_id, p.product_json.productName product, unitPrice from order_items oi join products

    p on oi.productId = p.product_id where customerId = :cust_var order by oi.order_id desc, p.product_id …So you can join output to the products table!
  26. Copyright © 2019 Oracle and/or its affiliates. Soooo… How many

    orders today? …people have lots of questions As always, post release… Ryan McGuire / Gratisography
  27. User Story #4 Sales must be able to view today's

    orders We need to create a dashboard counting orders So we need to search for orders placed today
  28. { "customerId" : 2, "orderDatetime" : "2019-01-01T03:25:43", "products" : [

    { "productId" : 1, "unitPrice" : 74.95 }, { "productId" : 10, "unitPrice" : 35.97 }, … ] } We need to search for this value in the documents
  29. select * from orders o where o.order_json.orderDatetime >= trunc (

    sysdate ); ORA-01861: literal does not match format string Remember the implicit conversions? It fails for dates! Use simple dot-notation to access the value
  30. select * from orders o where json_value ( order_json, '$.orderDatetime'

    returning date ) >= trunc ( sysdate ) So you need to define the return type; JSON dates conform to ISO 8601 date
  31. select * from orders o where json_value ( order_json, '$.orderDatetime'

    returning date ) >= trunc ( sysdate ) But the query is very slow…
  32. select * from orders o where json_value ( order_json, '$.orderDatetime'

    returning date ) >= trunc ( sysdate ) { "customerId": 1, … } { "customerId": 2, … } …
  33. User Story #4b … and make it fast! currently the

    query does a full table scan To speed it up we need to create an index!
  34. create index orders_date_i on orders ( order_json ); ORA-02327: cannot

    create index on expression with datatype LOB You can't index LOB data
  35. create search index orders_json_i on orders ( order_json ) for

    json parameters ( 'sync (on commit)' ); Added in 12.2, a json search index enables JSON queries to use an index JSON Search Indexes
  36. select * from orders o where json_value ( order_json, '$.orderDatetime'

    returning date ) >= trunc ( sysdate ) { "customerId": 1, … } { "customerId": 2, … } …
  37. ----------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------- |

    0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS BY INDEX ROWID| ORDERS | |* 2 | DOMAIN INDEX | ORDERS_JSON_I | ----------------------------------------------------- With the search index in place, the optimizer can use it
  38. Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_VALUE("ORDER_JSON"

    FORMAT JSON , '$.orderDatetime' RETURNING TIMESTAMP NULL ON ERROR) >= TIMESTAMP' 2019-01-15 00:00:00') 2 - access("CTXSYS"."CONTAINS"("O"."ORDER_JSON", 'sdatap(TMS_orderDatetime >= "2019-01-15T00:00:00+00:00" /orderDatetime)')>0) Under the covers, this uses Oracle Text
  39. create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime'

    returning date error on error null on empty ) ); It's more efficient to create a function- based index, matching the search you'll do This has some other benefits…
  40. create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime'

    returning date error on error null on empty ) ); Data validation! If the value is not a JSON date; inserts will raise an exception
  41. create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime'

    returning date error on error null on empty ) ); From 12.2 you can also raise an error if the attribute is not present
  42. ------------------------------------------------------------ | Id | Operation | Name | ------------------------------------------------------------ |

    0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS | |* 2 | INDEX RANGE SCAN | ORDER_DATE_I | ------------------------------------------------------------ The function-based index is more efficient, so the optimizer will choose this over the search index
  43. Search vs. Function-Based Indexes JSON Search Index Function-based Index Applicability

    Any JSON query Matching function Performance Slower Faster Use Ad-hoc queries Application queries
  44. 0 5 10 15 20 25 With the dashboard in

    place, it's clear sales are levelling off We need a way to increase sales!
  45. We need to offer discounts! …discount promotion codes Marketing have

    a brilliant plan… Ryan McGuire / Gratisography
  46. User Story #5 Customers may be able to enter a

    promotion code This will give a discount We need to store the code and discount value
  47. { …, "promotion": { "code": "20OFF", "discountAmount": 20 } }

    The order JSON will include the an promotion object… …so there are no changes needed in the database!
  48. Nothing to do in the database! relax! So you can

    sit back and count the money! Ryan McGuire / Gratisography
  49. 0 20 40 60 80 100 120 Customers love the

    promotion Sales are going through the roof!
  50. Where's the $$$?! …the $$$ tells a different story But

    finance are unhappy… Ryan McGuire / Gratisography
  51. -250 -200 -150 -100 -50 0 50 100 150 Red

    bars = sales Yellow line = profits The discount is too big! We're losing money!
  52. User Story #6 Store unit cost for each brick We

    need to update the product JSON; adding unitCost to every object in the bricks arrays
  53. { …, "bricks": [ { "colour": "red", "shape": "cube", "quantity":

    13 }, { "colour": "green", "shape": "cuboid", "quantity": 17 }, … ] } Add unitCost
  54. "bricks": [ { "colour": "red", "shape": "cube", "quantity": 13 },

    { "colour": "green", "shape": "cuboid", "quantity": 17 }, … ] join on colour, shape We need to combine the spreadsheet data with the stored JSON
  55. select * from external ( ( colour varchar2(30), shape varchar2(30),

    unit_cost number ) default directory tmp location ( 'costs.csv' ) ) From 18c you can query files "on the fly" with an inline external table
  56. select product_id, j.* from products, json_table ( product_json columns (

    nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j Using JSON_table to extract the bricks as rows
  57. select product_id, j.* from products, json_table ( product_json columns (

    nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j
  58. select product_id, j.* from products, json_table ( product_json columns (

    nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j
  59. with costs as ( select * from external … ),

    bricks as ( select product_id, j.* from products, json_table ( … ) ) select … from bricks join costs on … We've joined the data, but how do we convert it back to JSON?
  60. select json_object ( 'colour' value b.colour, 'shape' value b.shape, 'quantity'

    value b.quantity, 'unitCost' value c.cost ) from bricks b join costs c on b.colour = c.colour and b.shape = c.shape; So you can create a brick object with json_object…
  61. select json_mergepatch ( brick, '{ "unitCost": ' || c.cost ||

    '}' ) from bricks b join costs c on b.colour = c.colour and b.shape = c.shape; Add/replace this… …to this document … or use json_mergepatch (19c) to add it to the brick object
  62. { "colour": "red", "shape": "cube", "quantity": 13, "unitCost": 0.59 }

    { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 } This returns a row for each brick To combine them into an array for each product, use json_arrayagg
  63. [ { "colour": "red", "shape": "cube", "quantity": 13, "unitCost": 0.59

    }, { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 }, … ] Make the array into an object with json_object
  64. json_object ( 'bricks' value json_arrayagg ( json_mergepatch ( brick, '{

    "unitCost": ' || cost || '}' ) order by pos ) )
  65. "bricks": [ { "colour": "red", "shape": "cube", "quantity": 13, "unitCost":

    0.59 }, { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 }, … ] And replace this array in the product JSON with json_mergepatch
  66. json_mergepatch ( product, json_object ( 'bricks' value json_arrayagg ( json_mergepatch

    ( brick, '{ "unitCost": ' || cost || '}' ) order by pos ) ) )
  67. { "productName": "GEEKWAGON", "descripion": "Ut commodo in …", "unitPrice": 35.97,

    "bricks": [ { …, "unitCost": 0.59 }, { …, "unitCost": 0.39 }, … ] } Finally! We've added unitCost to every element in the array We just need to update the table…
  68. update products set product_json = ( with costs as (

    select * from external … ), bricks as ( select … ) select json_mergepatch … )
  69. …at least we can view order profitability now That was

    hard work Ryan McGuire / Gratisography
  70. User Story #7 Create report prices - discount – total

    cost We've got the data; but want an easier way to query it…
  71. dbms_json.add_virtual_columns ( 'orders', 'order_json' ); JSON Data Guide Added in

    12.2, the JSON Data Guide enables you to expose attributes as virtual columns in the table. To do this, the column must have a json search index
  72. desc orders Name Null? Type ORDER_ID NOT NULL NUMBER(38) ORDER_JSON

    NOT NULL BLOB ORDER_JSON$customerId NUMBER ORDER_JSON$orderDatetime VARCHAR2(32) ORDER_JSON$code VARCHAR2(8) ORDER_JSON$discountAmount NUMBER Sadly it only exposes scalar (non-array) values
  73. select product_id, "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour from product_bricks_vw order by

    product_id, shape, colour You can now query the view to see JSON as rows-and-columns
  74. PRODUCT_ID SHAPE COLOUR 1 cube green 1 cube red 1

    cylinder blue 1 cylinder blue 1 cylinder green 1 cylinder green … … … The unique key for a brick is (colour, shape) Some products have duplicate entries in the bricks array! We're shipping too many bricks!
  75. User Story #8 FIX ALL THE DATAZ! We need to

    remove all the duplicate entries from the product brick arrays
  76. { ..., "bricks" : [ { "colour" : "red", "shape"

    : "cylinder", "quantity" : 20, "unitCost" : 0.39 }, { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 } { ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 8, "unitCost" : 0.39 }, { "colour" : "blue", "shape" : "cylinder", "quantity" : 10, "unitCost" : 0.98 } Comparing the brick arrays for two products shows unitCost is duplicated
  77. { ..., "bricks" : [ { "colour" : "red", "shape"

    : "cylinder", "quantity" : 20, "unitCost" : 0.39 }, { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 } { ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 8, "unitCost" : 0.39 }, { "colour" : "blue", "shape" : "cylinder", "quantity" : 10, "unitCost" : 0.98 } And the brick itself is duplicated within an array
  78. Wrong Data Model PRODUCTS BRICKS The JSON models the relationship

    between products and bricks as 1:M This is the wrong data model the relationship is M:M
  79. Fixed It! PRODUCTS BRICKS PRODUCT_BRICKS unique ( product_id, brick_id )

    { JSON } { JSON } { JSON } You need a junction table between products and bricks This avoids duplication & enables constraints
  80. Copyright © 2019 Oracle and/or its affiliates. "The more I

    work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand" - Florents Tselai https://tselai.com/modern-data-practice-and-the-sql-tradition.html
  81. select distinct "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour, "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw

    Moving from 1:M to M:M Using the JSON Data Guide view, you can find all the unique brick types…
  82. with vals as ( select distinct "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour,

    "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw ) select rownum brick_id, v.* from vals v; …assign a unique ID to each ( colour, shape ) …
  83. create table bricks as with vals as ( select distinct

    "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour, "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw ) select rownum brick_id, v.* from vals v; …and create a table from the results!
  84. create table bricks as with vals as ( select distinct

    "PRODUCT_JSON$shape" "shape", "PRODUCT_JSON$colour" "colour", "PRODUCT_JSON$unitCost" "unitCost" from product_bricks_vw ) select rownum brick_id, json_object ( v.* ) brick_json from vals v; 19c simplification (Storing the values as JSON if you want)
  85. json_mergepatch ( product_json, '{ "bricks": null }' ) If you

    pass a null value for an attribute to JSON_mergepatch, it's removed from the source Removing the bricks array from products
  86. Storing JSON can be the right choice for… JSON responses

    - 3rd party APIs - IoT devices Schema extensions - flex fields - sparse columns 1 2
  87. Further Reading How to Store, Query, and Create JSON Documents

    in Oracle Database Blog Post http://bit.ly/json-in-oracle-db Presentation Live SQL Scripts http://bit.ly/agile-json-livesql Copyright © 2019 Oracle and/or its affiliates.
  88. Copyright © 2019 Oracle and/or its affiliates. VS Some people

    suggest JSON and relational are fundamentally different
  89. Copyright © 2019 Oracle and/or its affiliates. VS This is

    not the case! However you store data, you still need to normalize it to avoid duplication and errors
  90. Copyright © 2019 Oracle and/or its affiliates. How you store

    the data is a spectrum from just rows-and-columns to wholly JSON and everything in-between