Agile Database Development with JSON Chris Saxon Developer Advocate, @ChrisRSaxon & @SQLDaily

We're creating a new online store Selling boxes of brick models

We need to respond to customer feedback…

…and evolve the application rapidly

working in short sprints and releasing often So we need to be Agile

{ JSON } To support this we'll store data as

Agile Database Development with JSON Chris Saxon Developer Advocate, @ChrisRSaxon & @SQLDaily

User Story #1 We must be able to store product & order details So we need to create the tables and define CRUD operations on them

create table products ( product_id integer not null primary key, product_json ##TODO## not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json ##TODO## not null, check ( order_data is json ) ); The tables are just a primary key, JSON column, & is json constraint

create table products ( product_id integer not null primary key, product_json ##TODO## not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json ##TODO## not null, check ( order_data is json ) ); But which data type to use for JSON?!

Which data type should you use for JSON? "Small" documents varchar2 "Large" documents ??? <= 4,000 bytes / 32k

"Small" documents varchar2 "Large" documents blob JSON data type in 21c Avoids character set conversions Less storage than clob

create table products ( product_id integer not null primary key, product_json blob not null, check ( product_data is json ) ); create table orders ( order_id integer not null primary key, order_json blob not null, check ( order_data is json ) );

insert into products ( product_json ) values ( utl_raw.cast_to_raw ( '{ "productName": "..." }' ) ); BLOBs need extra processing on insert

select product_json from products; PRODUCT_JSON 7B202274686973223A20227468617422207D and select to make them human readable

select json_serialize ( product_json returning clob pretty ) jdata from products; JDATA { "productName": "..." } Added in 19c json_serialize converts JSON data to text; which you can pretty print for readability

select json_query ( product_json, '$' returning clob pretty ) jdata from products; JDATA { "productName": "..." } In earlier releases use json_query The clob return type was added in 18c

User Story #2 Customers must be able to search by price So we need to query the products table for JSON where the unitPrice is in the specified range

{ "productName": "GEEKWAGON", "descripion": "Ut commodo in …", "unitPrice": 35.97, "bricks": [ { "colour": "red", "shape": "cube", "quantity": 13 }, { "colour": "green", "shape": "cube", "quantity": 17 }, … ] } We need to search for this value in the documents

select * from products p where p.product_json.unitPrice <= :max_price; But remember it returns varchar2 => implicit conversion! Use simple dot-notation to access the value

select * from products p where json_value ( product_json, '$.unitPrice' returning number ) <= :max_price; json_value gives you more control So this returns number => no implicit conversion! :)

select * from products p where p.product_json.unitPrice.number() <= :max_price; From 19c you can state the return type with simple dot-notation

User Story #3 Customers must be able to view their orders Showing order details and a list of what they bought So we need to join the order productIds to products

{ "customerId" : 2, "orderDatetime" : "2019-01-01T03:25:43", "products" : [ { "productId" : 1, "unitPrice" : 74.95 }, { "productId" : 10, "unitPrice" : 35.97 }, … ] } We need to extract these from the product array

select o.order_json.products[*].productId from orders o; PRODUCTS [2,8,5] [3,9,6] [1,10,7,4] ... With simple dot-notation, you can get an array of the values…

select json_query ( order_json, '$.products[*].productId' with array wrapper ) from orders o; PRODUCTS [2,8,5] [3,9,6] [1,10,7,4] ... But to join these to products, we need to convert them to rows… …or with json_query

json_table With json_table you can convert JSON… …to relational rows-and-columns

with order_items as ( select order_id, t.* from orders o, json_table ( order_json columns ( customerId, nested products[*] columns ( productId, unitPrice ) ) ) t ) Simplified syntax 18c

with order_items as ( select order_id, t.* from orders o, json_table ( order_json columns ( customerId, nested products[*] columns ( productId, unitPrice ) ) ) t ) This tells the database to return a row for each element in the products array…

select order_id, p.product_json.productName product, unitPrice from order_items oi join products p on oi.productId = p.product_id where customerId = :cust_var order by oi.order_id desc, p.product_id …So you can join output to the products table!

Minimum viable product complete! Ship it!

Soooo… How many orders today? …people have lots of questions As always, post release…

User Story #4 Sales must be able to view today's orders We need to create a dashboard counting orders So we need to search for orders placed today

{ "customerId" : 2, "orderDatetime" : "2019-01-01T03:25:43", "products" : [ { "productId" : 1, "unitPrice" : 74.95 }, { "productId" : 10, "unitPrice" : 35.97 }, … ] } We need to search for this value in the documents

select * from orders o where o.order_json.orderDatetime >= trunc ( sysdate ); ORA-01861: literal does not match format string Remember the implicit conversions? It fails for dates! Use simple dot-notation to access the value

select * from orders o where json_value ( order_json, '$.orderDatetime' returning date ) >= trunc ( sysdate ) So you need to define the return type; JSON dates conform to ISO 8601 date

2019-01-01 ISO 8601 date Which is YYYY-MM-DD for dates There is no time component in an ISO date!

2019-01-01T03:25:43 ISO 8601 timestamp Use ISO timestamps to include times Note the "T" between the date and time!

select * from orders o where json_value ( order_json, '$.orderDatetime' returning date ) >= trunc ( sysdate ) But the query is very slow…

select * from orders o where json_value ( order_json, '$.orderDatetime' returning date ) >= trunc ( sysdate ) { "customerId": 1, … } { "customerId": 2, … } …

User Story #4b … and make it fast! currently the query does a full table scan To speed it up we need to create an index!

create index orders_date_i on orders ( order_json ); ORA-02327: cannot create index on expression with datatype LOB You can't index LOB data

create search index orders_json_i on orders ( order_json ) for json parameters ( 'sync (on commit)' ); Added in 12.2, a json search index enables JSON queries to use an index JSON Search Indexes

select * from orders o where json_value ( order_json, '$.orderDatetime' returning date ) >= trunc ( sysdate ) { "customerId": 1, … } { "customerId": 2, … } …

----------------------------------------------------- | Id | Operation | Name | ----------------------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS BY INDEX ROWID| ORDERS | |* 2 | DOMAIN INDEX | ORDERS_JSON_I | ----------------------------------------------------- With the search index in place, the optimizer can use it

Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_VALUE("ORDER_JSON" FORMAT JSON , '$.orderDatetime' RETURNING TIMESTAMP NULL ON ERROR) >= TIMESTAMP' 2019-01-15 00:00:00') 2 - access("CTXSYS"."CONTAINS"("O"."ORDER_JSON", 'sdatap(TMS_orderDatetime >= "2019-01-15T00:00:00+00:00" /orderDatetime)')>0) Under the covers, this uses Oracle Text

create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime' returning date error on error null on empty ) ); It's more efficient to create a function- based index, matching the search you'll do This has some other benefits…

create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime' returning date error on error null on empty ) ); Data validation! If the value is not a JSON date; inserts will raise an exception

create index order_date_i on orders ( json_value ( order_json, '$.orderDatetime' returning date error on error null on empty ) ); From 12.2 you can also raise an error if the attribute is not present

------------------------------------------------------------ | Id | Operation | Name | ------------------------------------------------------------ | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS | |* 2 | INDEX RANGE SCAN | ORDER_DATE_I | ------------------------------------------------------------ The function-based index is more efficient, so the optimizer will choose this over the search index

Search vs. Function-Based Indexes JSON Search Index Function-based Index Applicability Any JSON query Matching function Performance Slower Faster Use Ad-hoc queries Application queries

0 5 10 15 20 25 With the dashboard in place, it's clear sales are levelling off We need a way to increase sales!

We need to offer discounts! …discount promotion codes Marketing have a brilliant plan…

User Story #5 Customers may be able to enter a promotion code This will give a discount We need to store the code and discount value

{ …, "promotion": { "code": "20OFF", "discountAmount": 20 } } The order JSON will include the an promotion object… …so there are no changes needed in the database!

Nothing to do in the database! relax! So you can sit back and count the money!

0 20 40 60 80 100 120 Customers love the promotion Sales are going through the roof!

Cake for everyone! The promotion is a success!

Where's the $$$?! …the $$$ tells a different story But finance are unhappy…

-250 -200 -150 -100 -50 0 50 100 150 Red bars = sales Yellow line = profits The discount is too big! We're losing money!

Finance need to view order profitability They need to understand where we're losing money

User Story #6 Store unit cost for each brick We need to update the product JSON; adding unitCost to every object in the bricks arrays

{ …, "bricks": [ { "colour": "red", "shape": "cube", "quantity": 13 }, { "colour": "green", "shape": "cuboid", "quantity": 17 }, … ] } Add unitCost

"Luckily" we have the costs in a spreadsheet!

"bricks": [ { "colour": "red", "shape": "cube", "quantity": 13 }, { "colour": "green", "shape": "cuboid", "quantity": 17 }, … ] join on colour, shape We need to combine the spreadsheet data with the stored JSON

Step 1: transform JSON to rows-and-columns Step 3: convert back to JSON Step 2: join the costs

Buckle up! This will be a bumpy ride!

select * from external ( ( colour varchar2(30), shape varchar2(30), unit_cost number ) default directory tmp location ( 'costs.csv' ) ) From 18c you can query files "on the fly" with an inline external table

select product_id, j.* from products, json_table ( product_json columns ( nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j Using JSON_table to extract the bricks as rows

select product_id, j.* from products, json_table ( product_json columns ( nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j

select product_id, j.* from products, json_table ( product_json columns ( nested bricks[*] columns ( pos for ordinality, colour path '$.colour', shape path '$.shape', brick format json path '$' ) ) ) j

with costs as ( select * from external … ), bricks as ( select product_id, j.* from products, json_table ( … ) ) select … from bricks join costs on … We've joined the data, but how do we convert it back to JSON?

json_object json_objectagg json_array json_arrayagg (12.2) JSON Generation Functions

select json_object ( 'colour' value b.colour, 'shape' value b.shape, 'quantity' value b.quantity, 'unitCost' value c.cost ) from bricks b join costs c on b.colour = c.colour and b.shape = c.shape; So you can create a brick object with json_object…

select json_mergepatch ( brick, '{ "unitCost": ' || c.cost || '}' ) from bricks b join costs c on b.colour = c.colour and b.shape = c.shape; Add/replace this… …to this document … or use json_mergepatch (19c) to add it to the brick object

{ "colour": "red", "shape": "cube", "quantity": 13, "unitCost": 0.59 } { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 } This returns a row for each brick To combine them into an array for each product, use json_arrayagg

json_arrayagg ( json_mergepatch ( brick, '{ "unitCost": ' || cost || '}' ) order by pos )

[ { "colour": "red", "shape": "cube", "quantity": 13, "unitCost": 0.59 }, { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 }, … ] Make the array into an object with json_object

json_object ( 'bricks' value json_arrayagg ( json_mergepatch ( brick, '{ "unitCost": ' || cost || '}' ) order by pos ) )

"bricks": [ { "colour": "red", "shape": "cube", "quantity": 13, "unitCost": 0.59 }, { "colour": "green", "shape": "cuboid", "quantity": 17, "unitCost": 0.39 }, … ] And replace this array in the product JSON with json_mergepatch

json_mergepatch ( product, json_object ( 'bricks' value json_arrayagg ( json_mergepatch ( brick, '{ "unitCost": ' || cost || '}' ) order by pos ) ) )

{ "productName": "GEEKWAGON", "descripion": "Ut commodo in …", "unitPrice": 35.97, "bricks": [ { …, "unitCost": 0.59 }, { …, "unitCost": 0.39 }, … ] } Finally! We've added unitCost to every element in the array We just need to update the table…

update products set product_json = ( with costs as ( select * from external … ), bricks as ( select … ) select json_mergepatch … )

…at least we can view order profitability now That was hard work

User Story #7 Create report prices - discount – total cost We've got the data; but want an easier way to query it…

dbms_json.add_virtual_columns ( 'orders', 'order_json' ); JSON Data Guide Added in 12.2, the JSON Data Guide enables you to expose attributes as virtual columns in the table. To do this, the column must have a json search index

desc orders Name Null? Type ORDER_ID NOT NULL NUMBER(38) ORDER_JSON NOT NULL BLOB ORDER_JSON$customerId NUMBER ORDER_JSON$orderDatetime VARCHAR2(32) ORDER_JSON$code VARCHAR2(8) ORDER_JSON$discountAmount NUMBER Sadly it only exposes scalar (non-array) values

dbms_json.create_view_on_path ( 'product_bricks_vw', 'products', 'product_json', '$' ); …using json_table on this! Create this view… Luckily you can create a view instead!

select product_id, "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour from product_bricks_vw order by product_id, shape, colour You can now query the view to see JSON as rows-and-columns

PRODUCT_ID SHAPE COLOUR 1 cube green 1 cube red 1 cylinder blue 1 cylinder blue 1 cylinder green 1 cylinder green … … … The unique key for a brick is (colour, shape) Some products have duplicate entries in the bricks array! We're shipping too many bricks!

User Story #8 FIX ALL THE DATAZ! We need to remove all the duplicate entries from the product brick arrays

{ ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 }, { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 } { ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 8, "unitCost" : 0.39 }, { "colour" : "blue", "shape" : "cylinder", "quantity" : 10, "unitCost" : 0.98 } Comparing the brick arrays for two products shows unitCost is duplicated

{ ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 }, { "colour" : "red", "shape" : "cylinder", "quantity" : 20, "unitCost" : 0.39 } { ..., "bricks" : [ { "colour" : "red", "shape" : "cylinder", "quantity" : 8, "unitCost" : 0.39 }, { "colour" : "blue", "shape" : "cylinder", "quantity" : 10, "unitCost" : 0.98 } And the brick itself is duplicated within an array

Wrong Data Model PRODUCTS BRICKS The JSON models the relationship between products and bricks as 1:M This is the wrong data model the relationship is M:M

Fixed It! PRODUCTS BRICKS PRODUCT_BRICKS unique ( product_id, brick_id ) { JSON } { JSON } { JSON } You need a junction table between products and bricks This avoids duplication & enables constraints

You still need to model { JSON } data!

Copyright © 2019 Oracle and/or its affiliates. "The more I work with existing NoSQL deployments however, the more I believe that their schemaless nature has become an excuse for sloppiness and unwillingness to dwell on a project’s data model beforehand" - Florents Tselai

select distinct "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour, "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw Moving from 1:M to M:M Using the JSON Data Guide view, you can find all the unique brick types…

with vals as ( select distinct "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour, "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw ) select rownum brick_id, v.* from vals v; …assign a unique ID to each ( colour, shape ) …

create table bricks as with vals as ( select distinct "PRODUCT_JSON$shape" shape, "PRODUCT_JSON$colour" colour, "PRODUCT_JSON$unitCost" unit_cost from product_bricks_vw ) select rownum brick_id, v.* from vals v; …and create a table from the results!

create table bricks as with vals as ( select distinct "PRODUCT_JSON$shape" "shape", "PRODUCT_JSON$colour" "colour", "PRODUCT_JSON$unitCost" "unitCost" from product_bricks_vw ) select rownum brick_id, json_object ( v.* ) brick_json from vals v; 19c simplification (Storing the values as JSON if you want)

create table product_bricks as select distinct product_id, brick_id from product_bricks_vw join bricks on ... Create the Join Table

json_mergepatch ( product_json, '{ "bricks": null }' ) If you pass a null value for an attribute to JSON_mergepatch, it's removed from the source Removing the bricks array from products

When should I store { JSON }?

Storing JSON can be the right choice for… JSON responses - 3rd party APIs - IoT devices Schema extensions - flex fields - sparse columns 1 2

Further Reading How to Store, Query, and Create JSON Documents in Oracle Database Blog Post Presentation Live SQL Scripts

Some people suggest JSON and relational are fundamentally different

This is not the case! However you store data, you still need to normalize it to avoid duplication and errors

How you store the data is a spectrum from just rows-and-columns to wholly JSON and everything in-between

Oracle Database supports it all! However you store your data

#MakeDataGreatAgain