Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From XML to JSON and beyond (italian version)

From XML to JSON and beyond (italian version)

JSON has been often described as an XML alternative or even a replacement. This can be true in several applications, but many often XML persists due to its more richer characteristics and pretty different architectural scopes. So crucial is the build of conversion tools and mapping conventions, to let the data communication effective between the different formats. Despite many defined conventions also XPath 3.1 put a step in this direction. So don’t trash old XML data, but use them for what is better suited and convert them to send chunks of JSON data to other systems or to submit to other applications.

Note: the presentation is in Italian language.

Davide Brunato

June 05, 2022
Tweet

More Decks by Davide Brunato

Other Decks in Programming

Transcript

  1. 3 XML e JSON • XML (eXtensible Markup Language -

    1998) – Simile ad HTML ma senza tag predefiniti da usare – È un sottoinsieme di SGML – Interoperabilità sia con SGML che con HTML • JSON (JavaScript Object Notation - 1999) – è un formato leggero per lo scambio di dati – è facile da leggere e scrivere per le persone – è facilmente analizzabile e generabile dalle macchine
  2. 4 JSON non è un sostituto di XML • Nessun

    modo standard per rappresentare metadati • Non ha i commenti • Più facile da elaborare (più facile da leggere?) • Più sicuro a basso livello: – Basato su Unicode per default (niente entità di sostituzione necessarie) – Niente entità esterne
  3. 5 JSON sta sostituendo XML? 2010 2011 2012 2013 2014

    2015 2016 2017 2018 2019 2020 2021 2022 0 10 20 30 40 50 60 70 80 90 100 Data source: Google Trends (https://www.google.com/trends) XML JSON Title
  4. 6 Ecosistema di XML • XML Schema • XPath/XQuery •

    XLink • XSLT • XPointer • XML-RPC • SOAP
  5. 7 Ecosistema di JSON • JSON Schema (Draft 2020-12) •

    JSONPath • JSON Pointer • JSON-RPC • Tipi di dati JSON JSON Python object dict array list string str number (int) int number (real) float true True false False null None
  6. 8 XML e JSON possono essere amici? XML JSON PNG

    image released under Creative Commons (CC BY-NC 4.0); author: Lydia Simmons; source: https://freepngimg.com/png/109077-story-toy-free-hd-image
  7. 9 Nuovi pacchetti Python per XML (Non fidarti se qualcuno

    ti dice “ma lo schema non cambierà più”) • https://github.com/sissaschool/xmlschema – Validatore e decodificatore XML Schema (2016) ➔ lxml.etree.XMLSchema (XSD 1.0) ➔ Diverse alternative per la decodifica ma non basate su schemi • https://github.com/sissaschool/elementpath – Processore XPath (2018) ➔ lxml.etree supporta XPath 1.0
  8. 10 XML to JSON (xmltodict) { "http://example.com/ns/collection:collection": { "object": [

    { "@id": "b0836217462", "@available": "true", "@xmlns": {"col": "http://example.com/ns/collection"}, "position": "1", "title": "The Umbrellas", "year": "1886", "author": { "@id": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": "10000.00" }, { "@id": "b0836217463", "@available": "true", "position": "2", "title": null, "year": "1925", "author": { "@id": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  9. 11 Default converter (xmlschema) { "@xmlns:col": "http://example.com/ns/collection", "object": [ {

    "@id": "b0836217462", "@available": true, "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "@id": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "@id": "b0836217463", "@available": true, "position": 2, "title": null, "year": "1925", "author": { "@id": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> • Molte opzioni per modificare i prefissi o per preservare l'elemento radice
  10. 12 Convenzioni sulle conversioni • Sito web Open311 – A

    collaborative model and open standard for civic issue tracking – http://wiki.open311.org/JSON_and_XML_Conversion/ – XML to JSON Conventions • Parker • BadgerFish • Abdera • JsonML (http://www.jsonml.org) • Altre ... (Spark, GData, oData)
  11. 13 Convenzione Parker { "object": [ { "position": 1, "title":

    "The Umbrellas", "year": "1886", "author": { "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "position": 2, "title": null, "year": "1925", "author": { "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  12. 14 Convenzione Abdera <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462"

    available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> { "object": [ { "attributes": { "id": "b0836217462", "available": true }, "children": [ { "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "attributes": {"id": "PAR"}, "children": [{ "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }] }, "estimation": 10000.0 }] }, { "attributes": { "id": "b0836217463", "available": true }, "children": [{ "position": 2, "title": [], "year": "1925", "author": { "attributes": {"id": "JM"}, "children": [{ "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" }] } }] }] } ➔ Non mappa gli spazi dei nomi ➔ Usa oggetti JSON per attributi e figli
  13. 15 Convenzione BadgerFish { "@xmlns": {"col": "http://example.com/ns/collection"}, "col:collection": { "object":

    [ { "@id": "b0836217462", "@available": true, "position": {"$": 1}, "title": {"$": "The Umbrellas"}, "year": {"$": "1886"}, "author": { "@id": "PAR", "name": {"$": "Pierre-Auguste Renoir"}, "born": {"$": "1841-02-25"}, "dead": {"$": "1919-12-03"}, "qualification": {"$": "painter"} }, "estimation": {"$": 10000.0} }, { "@id": "b0836217463", "@available": true, "position": {"$": 2}, "title": {}, "year": {"$": "1925"}, "author": { "@id": "JM", "name": {"$": "Joan Mir\u00f3"}, "born": {"$": "1893-04-20"}, "dead": {"$": "1983-12-25"}, "qualification": {"$": "painter, sculptor, ceramicist"} } } ] } } <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection>
  14. 16 Convenzione JsonML <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object id="b0836217462"

    available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> • Array JSON per rappresentare elementi XML • Oggetti JSON per rappresentare gli attributi • Stringhe JSON per rappresentare i nodi di testo [ "col:collection", { "xmlns:col": "http://example.com/ns/collection" }, [ "object", { "id": "b0836217462", "available": true }, ["position", 1], ["title", "The Umbrellas"], ["year", "1886"], ["author", {"id": "PAR"}, ["name", "Pierre-Auguste Renoir"], ["born","1841-02-25"], ["dead", "1919-12-03"], ["qualification", "painter"] ], ["estimation", 10000.0] ], [ "object", { "id": "b0836217463", "available": true }, ["position", 2], ["title"], ["year", "1925"], [ "author", {"id": "JM"}, ["name", "Joan Mir\u00f3"], ["born", "1893-04-20"], ["dead", "1983-12-25"], ["qualification", "painter, sculptor, ceramicist"] ] ] ]
  15. 17 Columnar converter (xmlschema) <?xml version="1.0" encoding="UTF-8"?> <col:collection xmlns:col="http://example.com/ns/collection"> <object

    id="b0836217462" available="true"> <position>1</position> <title>The Umbrellas</title> <year>1886</year> <author id="PAR"> <name>Pierre-Auguste Renoir</name> <born>1841-02-25</born> <dead>1919-12-03</dead> <qualification>painter</qualification> </author> <estimation>10000.00</estimation> </object> <object id="b0836217463" available="true"> <position>2</position> <title/> <year>1925</year> <author id="JM"> <name>Joan Miró</name> <born>1893-04-20</born> <dead>1983-12-25</dead> <qualification>painter, sculptor, ceramicist</qualification> </author> </object> </col:collection> { "collection": { "object": [ { "objectid": "b0836217462", "objectavailable": true, "position": 1, "title": "The Umbrellas", "year": "1886", "author": { "authorid": "PAR", "name": "Pierre-Auguste Renoir", "born": "1841-02-25", "dead": "1919-12-03", "qualification": "painter" }, "estimation": 10000.0 }, { "objectid": "b0836217463", "objectavailable": true, "position": 2, "title": null, "year": "1925", "author": { "authorid": "JM", "name": "Joan Mir\u00f3", "born": "1893-04-20", "dead": "1983-12-25", "qualification": "painter, sculptor, ceramicist" } } ] } } ➔ Sviluppato da un utente per convertire i dati JSON in formato Parquet (Spark) ➔ Attributi rinominati con prefisso basato su tag
  16. 18 Conversione di file di grosse dimensioni • Supportato con

    il modo lazy (pigro) in xmlschema • Necessita di una classe JSONEncoder custom def get_lazy_json_encoder(errors: List[XMLSchemaValidationError]) -> Type[json.JSONEncoder]: class JSONLazyEncoder(json.JSONEncoder): def default(self, obj: Any) -> Any: if isinstance(obj, Iterator): while True: result = next(obj, None) if isinstance(result, XMLSchemaValidationError): errors.append(result) else: return result return json.JSONEncoder.default(self, obj) return JSONLazyEncoder
  17. 20 XPath 3.1 (2017) https://www.w3.org/TR/xpath/ “this version of XPath supports

    JSON as well as XML, adding maps and arrays to the data model and supporting them with new expressions in the language and new functions” • maps • arrays • funzioni per la codifica/decodifica JSON – fn:parse-json – fn:json-doc – fn:json-to-xml – fn:xml-to-json
  18. 21 XPath maps • Mappe chiave/valore come i dizionari Python

    • La chiave deve essere un valore atomico • Il valore può anche essere un nodo • Lookup con chiamata di funzione eg.: $b("book")("title") • Risoluzione delle ambiguità ✗ map{a:b} ✔ map{a :b} ✔ map{a: b} ✔ map{a:b:c} ✔ map{a:*:c} ✔ map{*:b:c} map { "book": map { "title": "Data on the Web", "year": 2000, "author": [ map { "last": "Abiteboul", "first": "Serge" }, map { "last": "Buneman", "first": "Peter" }, map { "last": "Suciu", "first": "Dan" } ], "publisher": "Morgan Kaufmann Publishers", "price": 39.95 } }
  19. 22 XPath arrays • Una sequenza indicizzabile come una lista

    Python • Prevede due costruttori: – Square array constructor: • [ 1, 2, 5, 7 ] • [ (), (27, 17, 0)] – Curly array constructor: • array { 1, 2, 5, 7 } • array { (), (27, 17, 0) } • Valutazione con la chiamata di funzione: – array { (), (27, 17, 0) }(1) evaluates to 27 – [ [1, 2, 3], [4, 5, 6]](2)(2) evaluates to 5 – [ 'a', 123, <name>Robert Johnson</name> ](3) evaluates to <name>Robert Johnson</name>
  20. 23 fn:json-to-xml • XML Representation of JSON – Rappresentazione di

    un qualsiasi JSON valido in XML – Conversione senza perdita di dati – Perdita di informazioni se • chiavi duplicate appaiono all’interno di un oggetto JSON • difformità nella rappresentazione di floating point a doppia precisione • Usa uno schema XSD implicito • Signatures fn:json-to-xml($json-text as xs:string?) as document-node()? fn:json-to-xml($json-text as xs:string?, $options as map(*)) as document-node()?
  21. 24 XML Representation of JSON { "desc" : "Distances between

    several cities.", "updated" : "2014-02-04T18:50:45", "uptodate": true, "author" : null, "cities" : { "Brussels": [ {"to": "Paris", "distance": 265}, {"to": "Amsterdam", "distance": 173} ], "Amsterdam": [ {"to": "Brussels", "distance": 173}, {"to": "Paris", "distance": 431} ] } } <map xmlns="http://www.w3.org/2005/xpath-functions"> <string key='desc'>Distances between several cities.</string> <string key='updated'>2014-02-04T18:50:45</string> <boolean key="uptodate">true</boolean> <null key="author"/> <map key='cities'> <array key="Brussels"> <map> <string key="to">Paris</string> <number key="distance">265</number> </map> <map> <string key="to">Amsterdam</string> <number key="distance">173</number> </map> </array> <array key="Amsterdam"> <map> <string key="to">Brussels</string> <number key="distance">173</number> </map> <map> <string key="to">Paris</string> <number key="distance">431</number> </map> </array> </map> </map>
  22. 25 fn:xml-to-json • Converte un albero XML in una stringa

    conforme alla grammatica JSON – L’albero XML deve essere conforme alla rappresentazione di un JSON • Signatures fn:xml-to-json($input as node()?) as xs:string? fn:xml-to-json($input as node()?, $options as map(*)) as xs:string? • Esempi <array xmlns="http://www.w3.org/2005/xpath-functions"> <number>1</number> <string>is</string> <boolean>1</boolean> </array> ➔ [1,"is",true] <map xmlns="http://www.w3.org/2005/xpath-functions"> <number key="Sunday">1</number> <number key="Monday">2</number> </map> ➔ {"Sunday":1,"Monday":2}
  23. 26 fn:parse-json/fn:json-doc • Un approccio differente: parse di un JSON,

    restituendo un risultato nella forma di elementi map e array – Json-doc è identica a parse-json ma decodifica il JSON partendo da un riferimento ad una risorsa • Signatures fn:parse-json($json-text as xs:string?) as item()? fn:parse-json($json-text as xs:string?, $options as map(*)) as item()? fn:json-doc($href as xs:string?) as item()? – fn:json-doc($href as xs:string?, $options as map(*)) as item()? • Esempi – parse-json('"abcd"') restituisce "abcd" – parse-json('{"x":1, "y":[3,4,5]}') restituisce map{"x":1e0,"y":[3e0,4e0,5e0]}
  24. 27 Oltre ... • La conversione dei dati è una

    necessità – Machine-to-machine – Analisi big-data – Non c’è il jolly o un proiettile d’argento per mettere d’accordo tutti • xmlschema come decodificatore/codificatore – Molto richiesto fin dall’inizio – Varie opzioni per operare sulla decodifica dei dati (filler, fill_missing, keep_unknown, process_skipped, max_depth, depth_filler, value_hook) • Implementazione di XPath 3.1 in elementpath – Un linguaggio nel linguaggio (301 token, con 201 funzioni) – Problema nodi XPath (7 tipi di nodi) – Problema ElementTree e la mappa dei namespace