Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From XML to JSON and beyond (english version)

From XML to JSON and beyond (english version)

JSON has been often described as an XML alternative or even a replacement. This can be true in several applications, but many often XML persists due to its more richer characteristics and pretty different architectural scopes. So crucial is the build of conversion tools and mapping conventions, to let the data communication effective between the different formats. Despite many defined conventions also XPath 3.1 put a step in this direction. So don’t trash old XML data, but use them for what is better suited and convert them to send chunks of JSON data to other systems or to submit to other applications.

Davide Brunato

June 05, 2022
Tweet

More Decks by Davide Brunato

Other Decks in Programming

Transcript

  1. 3 XML and JSON • XML (eXtensible Markup Language -

    1998) – similar to HTML but without predefined tags to use – is a subset of SGML – interoperability with both SGML and HTML • JSON (JavaScript Object Notation - 1999) – is a lightweight data-interchange format – is easy for humans to read and write – is easy for machines to parse and generate
  2. 4 JSON is not a replacement of XML • No

    standard way of representing meta-data • No comments • Easier to process (easier to read?) • More secure at low level – Unicode based by default (no entities needed) – No processing instructions – No external entities
  3. 5 JSON is replacing XML? 2010 2011 2012 2013 2014

    2015 2016 2017 2018 2019 2020 2021 2022 0 10 20 30 40 50 60 70 80 90 100 Data source: Google Trends (https://www.google.com/trends) XML JSON Title
  4. 6 XML ecosystem • XML Schema • XPath/XQuery • XLink

    • XSLT • XPointer • XML-RPC • SOAP
  5. 7 JSON ecosystem • JSON Schema (Draft 2020-12) • JSONPath

    • JSON Pointer • JSON-RPC • Tipi di dati JSON JSON Python object dict array list string str number (int) int number (real) float true True false False null None
  6. 8 Can XML and JSON be friends? XML JSON PNG

    image released under Creative Commons (CC BY-NC 4.0); author: Lydia Simmons; source: https://freepngimg.com/png/109077-story-toy-free-hd-image
  7. 9 New Python packages for XML (Don’t trust if someone

    tell you “but the schema will not change anymore”) • https://github.com/sissaschool/xmlschema – XML Schema validator and decoder (2016) ➔ lxml.etree.XMLSchema (XSD 1.0) ➔ Many alternatives for decoding but no-schema based • https://github.com/sissaschool/elementpath – XPath processor (2018) ➔ lxml.etree supports XPath 1.0
  8. 10 XML to JSON (xmltodict) { "http://example.com/ns/collection:collection": { "object": [

    { "@id": "b0836217462", "@available": "true", "@xmlns": {"col": "http://example.com/ns/collection"}, "position": "1", "title": "The Umbrellas", "year": "1886", "author": { "@id": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": "10000.00" }, { "@id": "b0836217463", "@available": "true", "position": "2", "title": null, "year": "1925", "author": { "@id": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  9. 11 Default converter (xmlschema) { "@xmlns:col": "http://example.com/ns/collection", "object": [ {

    "@id": "b0836217462", "@available": true, "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "@id": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "@id": "b0836217463", "@available": true, "position": 2, "title": null, "year": "1925", "author": { "@id": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> • Many options to change the prefixes or to preserve the root element
  10. 12 Conventions over conversions • Open311 website – A collaborative

    model and open standard for civic issue tracking – http://wiki.open311.org/JSON_and_XML_Conversion/ – XML to JSON Conventions • Parker • BadgerFish • Abdera • JsonML (http://www.jsonml.org) • Others ... (Spark, GData, oData)
  11. 13 Parker convention { "object": [ { "position": 1, "title":

    "The Umbrellas", "year": "1886", "author": { "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "position": 2, "title": null, "year": "1925", "author": { "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  12. 14 Abdera convention <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462"

    available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> { "object": [ { "attributes": { "id": "b0836217462", "available": true }, "children": [ { "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "attributes": {"id": "PAR"}, "children": [{ "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }] }, "estimation": 10000.0 }] }, { "attributes": { "id": "b0836217463", "available": true }, "children": [{ "position": 2, "title": [], "year": "1925", "author": { "attributes": {"id": "JM"}, "children": [{ "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" }] } }] }] } ➔ Doesn’t map namespaces ➔ Objects for storing attributes and children
  13. 15 BadgerFish convention { "@xmlns": {"col": "http://example.com/ns/collection"}, "col:collection": { "object":

    [ { "@id": "b0836217462", "@available": true, "position": {"$": 1}, "title": {"$": "The Umbrellas"}, "year": {"$": "1886"}, "author": { "@id": "PAR", "name": {"$": "Pierre-Auguste Renoir"}, "born": {"$": "1841-02-25"}, "dead": {"$": "1919-12-03"}, "qualification": {"$": "painter"} }, "estimation": {"$": 10000.0} }, { "@id": "b0836217463", "@available": true, "position": {"$": 2}, "title": {}, "year": {"$": "1925"}, "author": { "@id": "JM", "name": {"$": "Joan Mir\u00f3"}, "born": {"$": "1893-04-20"}, "dead": {"$": "1983-12-25"}, "qualification": {"$": "painter, sculptor, ceramicist"} } } ] } } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  14. 16 JsonML convention <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462"

    available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> • JSON arrays to represent XML elements • JSON objects to represent attributes • JSON strings to represent text nodes [ "col:collection", { "xmlns:col": "http://example.com/ns/collection" }, [ "object", { "id": "b0836217462", "available": true }, ["position", 1], ["title", "The Umbrellas"], ["year", "1886"], ["author", {"id": "PAR"}, ["name", "Pierre-Auguste Renoir"], ["born","1841-02-25"], ["dead", "1919-12-03"], ["qualification", "painter"] ], ["estimation", 10000.0] ], [ "object", { "id": "b0836217463", "available": true }, ["position", 2], ["title"], ["year", "1925"], [ "author", {"id": "JM"}, ["name", "Joan Mir\u00f3"], ["born", "1893-04-20"], ["dead", "1983-12-25"], ["qualification", "painter, sculptor, ceramicist"] ] ] ]
  15. 17 Columnar converter (xmlschema) <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object

    id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> { "collection": { "object": [ { "objectid": "b0836217462", "objectavailable": true, "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "authorid": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "objectid": "b0836217463", "objectavailable": true, "position": 2, "title": null, "year": "1925", "author": { "authorid": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } } ➔ Proposed by a user to produce JSON data for converting to Parquet format (Spark) ➔ Attributes renamed with tag-based prefix
  16. 18 Conversion of big data files • Supported by lazy

    mode in xmlschema • Requires a custom JSONEncoder class def get_lazy_json_encoder(errors: List[XMLSchemaValidationError]) -> Type[json.JSONEncoder]: class JSONLazyEncoder(json.JSONEncoder): def default(self, obj: Any) -> Any: if isinstance(obj, Iterator): while True: result = next(obj, None) if isinstance(result, XMLSchemaValidationError): errors.append(result) else: return result return json.JSONEncoder.default(self, obj) return JSONLazyEncoder
  17. 20 XPath 3.1 (2017) https://www.w3.org/TR/xpath/ “this version of XPath supports

    JSON as well as XML, adding maps and arrays to the data model and supporting them with new expressions in the language and new functions” • maps • arrays • functions for JSON encode/decode – fn:parse-json – fn:json-doc – fn:json-to-xml – fn:xml-to-json
  18. 21 XPath maps • Key/value mappings like Python dictionaries •

    Key must be an atomic value • Value can also be a node • Lookup with a function call eg.: $b("book")("title") • Ambiguity resolution ✗ map{a:b} ✔ map{a :b} ✔ map{a: b} ✔ map{a:b:c} ✔ map{a:*:c} ✔ map{*:b:c} map { "book": map { "title": "Data on the Web", "year": 2000, "author": [ map { "last": "Abiteboul", "first": "Serge" }, map { "last": "Buneman", "first": "Peter" }, map { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }
  19. 22 XPath arrays • An indexable sequence similar to a

    Python list • Two constructors: – Square array constructor: • [ 1, 2, 5, 7 ] • [ (), (27, 17, 0)] – Curly array constructor: • array { 1, 2, 5, 7 } • array { (), (27, 17, 0) } • Evaluation using a function call: – array { (), (27, 17, 0) }(1) evaluates to 27 – [ [1, 2, 3], [4, 5, 6]](2)(2) evaluates to 5 – [ 'a', 123, <name>Robert Johnson</name> ](3) evaluates to <name>Robert Johnson</name>
  20. 23 fn:json-to-xml • XML Representation of JSON – Representing any

    valid JSON in XML – Lossless conversion, but information loss if • duplicate key values appear within a JSON object • representation of a number in a double-precision floating point • Use an implicit XSD schema • Signatures fn:json-to-xml($json-text as xs:string?) as document-node()? fn:json-to-xml($json-text as xs:string?, $options as map(*)) as document-node()?
  21. 24 XML Representation of JSON { "desc" : "Distances between

    several cities.", "updated" : "2014-02-04T18:50:45", "uptodate": true, "author" : null, "cities" : { "Brussels": [ {"to": "Paris", "distance": 265}, {"to": "Amsterdam", "distance": 173} ], "Amsterdam": [ {"to": "Brussels", "distance": 173}, {"to": "Paris", "distance": 431} ] } } <map xmlns="http://www.w3.org/2005/xpath-functions"> <string key='desc'>Distances between several cities.</string> <string key='updated'>2014-02-04T18:50:45</string> <boolean key="uptodate">true</boolean> <null key="author"/> <map key='cities'> <array key="Brussels"> <map> <string key="to">Paris</string> <number key="distance">265</number> </map> <map> <string key="to">Amsterdam</string> <number key="distance">173</number> </map> </array> <array key="Amsterdam"> <map> <string key="to">Brussels</string> <number key="distance">173</number> </map> <map> <string key="to">Paris</string> <number key="distance">431</number> </map> </array> </map> </map>
  22. 25 fn:xml-to-json • Convert an XML tree into a string

    conforming to the JSON Grammar – The XML must be conformant to the XML representation of JSON • Signatures fn:xml-to-json($input as node()?) as xs:string? fn:xml-to-json($input as node()?, $options as map(*)) as xs:string? • Examples <array xmlns="http://www.w3.org/2005/xpath-functions"> <number>1</number> <string>is</string> <boolean>1</boolean> </array> ➔ [1,"is",true] <map xmlns="http://www.w3.org/2005/xpath-functions"> <number key="Sunday">1</number> <number key="Monday">2</number> </map> ➔ {"Sunday":1,"Monday":2}
  23. 26 fn:parse-json/fn:json-doc • A different approach: parse of JSON data,

    giving back a result composed by map and array expressions – json-doc is identical to parse-json but decode JSON data from a resource reference • Signatures fn:parse-json($json-text as xs:string?) as item()? fn:parse-json($json-text as xs:string?, $options as map(*)) as item()? fn:json-doc($href as xs:string?) as item()? – fn:json-doc($href as xs:string?, $options as map(*)) as item()? • Examples – parse-json('"abcd"') restituisce "abcd" – parse-json('{"x":1, "y":[3,4,5]}') restituisce map{"x":1e0,"y":[3e0,4e0,5e0]}
  24. 27 Beyond ... • Data conversion is a necessity –

    Machine-to-machine – Big data analysis – There is no a joker card or a silver bullet to make everyone agree • xmlschema as decoder/encoder – Much in demand from the start – Already added several option for decoding (filler, fill_missing, keep_unknown, process_skipped, max_depth, depth_filler, value_hook) • XPath 3.1 implementation in elementpath – A language inside a programming language (301 tokens, with 201 functions) – XPath nodes issue (7 types of nodes) – ElementTree problem with the namespace map