Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Validazione e decodifica di file XML con Python

Validazione e decodifica di file XML con Python

Gli standard per rappresentare schemi per documenti XML, in particolare XML Schema. Analisi dei software disponibili per la validazione e la decodifica XML. Sviluppo di una nuova libreria Python per decodificare XML secondo schemi XML Schema.

Davide Brunato

April 08, 2017
Tweet

More Decks by Davide Brunato

Other Decks in Science

Transcript

  1. Agenda – Il contesto – Schemi XML – Disponibilità –

    Necessità – Soluzioni – Conclusioni
  2. Cos’è la SISSA? • Centro scientifico di eccellenza italiano ed

    internazionale • Si trova a Trieste • Fondata nel 1978 • Ricerca e didattica in 3 aree: – Fisica – Matematica – Neuroscienze • Circa 260 studenti di PhD • Attivati 12 corsi di PhD • 2 Masters in: – Comunicazione della Scienza – High Performance Computing
  3. Materials design at the eXascale • Centro di eccellenza finanziato

    con fondi dell’Unione Europea (H2020-EINFRA-2015-1, project ID: 676598) • Riguarda alcuni software di simulazione quantistica
  4. Quantum ESPRESSO • ESPRESSO sta per “opEn Source Package for

    Research in Electronic Structure, Simulation, and Optimization” • Suite integrata di programmi di simulazione a livello atomico basato su DFT, pseudo-potenziali e onde piane • Strutturato su alcuni programmi di simulazione progettati per interoperare anche con altri software • Free open source software (licenza GPL2) • Iniziativa coordinata dalla Quantum ESPRESSO Foundation, con la partecipazione di SISSA, EPFL, ICTP, CINECA, CNR, University of North Texas e con molti partner in Europa e in tutto il mondo
  5. I numeri di Quantum ESPRESSO • 260000+ linee di codice

    Fortran/C • 53 sviluppatori registrati • 1000+ utenti registrati • 4000+ download per ogni nuova release • 3 siti web • 1 mailing list con 2500+ messaggi/anno • 30 scuole e workshop dal 2002 con 1200+ partecipanti
  6. Extensible Markup Language • Un meta-linguaggio • Forma semplificata di

    SGML – SGML ideato per codifiche complete di lungo periodo • Pensato anche per sostituire HTML ... non è stato così: – HTML 5 – Sintassi e semantica associata • Ricco di linguaggi di manipolazione: – XPath: selezionare nodi di un documento XML – XSLT: Trasformazione di documenti XML – XQuery: ricerca in documenti XML
  7. Schemi XML • Descrizione della struttura di documenti XML: –

    Elementi e attributi che possono essere inclusi – Ordine e ripetizione degli elementi – Tipi per elementi e attributi – Predicati sul contenuto – Vincoli su tipi, attributi, elementi • Principali linguaggi per definire schemi XML: – Grammar-based (struttura, forma e sintassi) • Document Type Definition (DTD) • XML Schema • Relax NG (relaxing) – Rule-based (relazioni tra i dati) • Schematron
  8. Document Type Definition • Linguaggio di definizione eredidato da SGML

    • PRO: – Relativamente semplice e compatto – Ampiamente supportato • CONTRO: – Supporto ai tipi di dati e vincoli molto limitato – Non supporta nativamente i namespace – Non ha una sintassi XML <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE vehicles [ <!ELEMENT vehicles (cars|bikes)*> <!ELEMENT cars (car)*> <!ELEMENT car (#PCDATA)> <!ATTLIST car make CDATA #REQUIRED model CDATA #REQUIRED> <!ELEMENT bikes (bike)*> <!ELEMENT bike (#PCDATA)> <!ATTLIST bike make CDATA #REQUIRED model CDATA #REQUIRED> ]> <vehicles> <cars> <car make="Porsche" model="911" /> <car make="Porsche" model="911" /> </cars> <bikes> <bike make="Harley-Davidson" model="WL" /> <bike make="Yamaha" model="XS650" /> </bikes> </vehicles>
  9. XML Schema • Sviluppato da W3C – v.1.0 del 2001,

    v. 1.1 del 2012 – Preceduto da XDR, DCD, SOX, DDML ... • PRO: – Include dati predefiniti – Supporto per definire nuovi tipi di dati – Vincoli sui tipi e sul modello – Asserzioni (dalla v 1.1) – Definizioni locali e globali • CONTRO: – Abbastanza complesso – Schemi più lunghi <xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="vehicles"> <xs:complexType> <xs:sequence> <xs:element name="cars"> <xs:complexType> <xs:sequence> <xs:element name="car" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="make" use="optional"/> <xs:attribute type="xs:string" name="model" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="bikes"> <xs:complexType> <xs:sequence> <xs:element name="bike" maxOccurs="unbounded" minOccurs="0"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute type="xs:string" name="make" use="optional"/> <xs:attribute type="xs:string" name="model" use="optional"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
  10. Relax NG • Sviluppato da un comitato tecnico OASIS •

    PRO: – Più semplice rispetto a XML Schema – Lo schema del documento – Migliore supporto per modelli non ordinati – Disponibile anche in formato compatto non-XML • CONTRO: – Non ha tipi di dato predefiniti – Poco supporto per definire tipi <grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema- datatypes"> <start> <element name="vehicles"> <element name="cars"> <oneOrMore> <element name="car"> <attribute name="make"> <data type="NCName"/> </attribute> <attribute name="model"> <data type="integer"/> </attribute> </element> </oneOrMore> </element> <element name="bikes"> <oneOrMore> <element name="bike"> <attribute name="make"> <data type="NCName"/> </attribute> <attribute name="model"> <data type="NCName"/> </attribute> </element> </oneOrMore> </element> </element> </start> </grammar>
  11. Relax NG: forma compatta <grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema- datatypes"> <start>

    <element name="vehicles"> <element name="cars"> <oneOrMore> <element name="car"> <attribute name="make"> <data type="NCName"/> </attribute> <attribute name="model"> <data type="integer"/> </attribute> </element> </oneOrMore> </element> <element name="bikes"> <oneOrMore> <element name="bike"> <attribute name="make"> <data type="NCName"/> </attribute> <attribute name="model"> <data type="NCName"/> </attribute> </element> </oneOrMore> </element> </element> </start> </grammar> default namespace = "" start = element vehicles { element cars { element car { attribute make { xsd:NCName }, attribute model { xsd:integer } }+ }, element bikes { element bike { attribute make { xsd:NCName }, attribute model { xsd:NCName } }+ } }
  12. Schematron • Basato su regole XPath – ISO Schematron (2006,

    2016) – XML valido se non viola regole <schema xmlns="http://purl.oclc.org/dsdl/schematron"> <title>Vehicles</title> <pattern id="vehicles-rules"> <rule context="vehicles"> <assert test="cars">No list of cars!</assert> <assert test="bikes">No list of bikes!</assert> </rule> <rule context="vehicles:cars"> <report test="car">You have a car!</report> </rule> <rule context="vehicles:bikes"> <report test="bike">You have a bike!</report> </rule> </pattern> </schema> • PRO – Facile da imparare e usare – Espressività e flessibilità • CONTRO – Non ha modello di dati – Niente tipi né defaults – Impatto errori nello schema
  13. XML Schema: elementi <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vh="http://example.com/vehicles" targetNamespace="http://example.com/vehicles" elementFormDefault="qualified"> <xs:include schemaLocation="cars.xsd"/>

    <xs:include schemaLocation="bikes.xsd"/> <xs:element name="vehicles"> <xs:complexType> <xs:sequence> <xs:element ref="vh:cars" /> <xs:element ref="vh:bikes" /> </xs:sequence> </xs:complexType> </xs:element> <xs:attribute type="xs:positiveInteger" name="step"/> </xs:schema> all alternative annotation any anyAttribute appinfo assert assertion attribute attributeGroup choice complexContent complexType element enumeration extension field fractionDigits group import include key length list maxExclusive maxInclusive maxLength minExclusive minInclusive minLength notation pattern redefine schema selector sequence simpleContent simpleType totalDigits union unique whiteSpace
  14. XML Schema: strutture • Tipi Semplici • Attributi • Gruppi

    di Attributi • Elementi • Gruppi (modello) • Tipi Complessi
  15. XML Schema: tipi semplici • Si applicano ad attributi ed

    elementi • Derivazione come: – Restrizione – Lista – Unione <simpleType final = (#all | (list | union | restriction)) id = ID name = NCName {any attributes with non-schema namespace . . .}> Content: (annotation?, (restriction | list | union)) </simpleType> <simpleType name="lowhighType"> <restriction base="string"> <enumeration value="low"/> <enumeration value="high"/> </restriction> </simpleType> <simpleType name="constr_parms_listType"> <restriction> <simpleType> <list itemType="double"/> </simpleType> <length value="4"/> </restriction> </simpleType>
  16. XML Schema: restrizioni • Definizione vincoli (facets) • Ammessa anche

    vuota • 12 facets (14 XSD 1.1) <restriction base = QName id = ID {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | length | minLength | maxLength | enumeration | whiteSpace | pattern)*)) </restriction> <xs:simpleType name="farenheitWaterTemp"> <xs:restriction base="xs:decimal"> <xs:fractionDigits value="2"/> <xs:minExclusive value="0.00"/> <xs:maxExclusive value="100.00"/> </xs:restriction> </xs:simpleType>
  17. XML Schema: attributi • Attributi ammessi in un tipo complesso

    – Nome o riferimento – Valore di default – Valore fisso – Uso <attribute default = string fixed = string form = (qualified | unqualified) id = ID name = NCName ref = QName type = QName use = (optional | prohibited | required) : optional {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleType?)) </attribute> <xs:attribute name="space"> <xs:simpleType> <xs:restriction base="xs:NCName"> <xs:enumeration value="default"/> <xs:enumeration value="preserve"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="base" type="xs:anyURI"/> <xs:attribute name="id" type="xs:ID"/>
  18. XML Schema: gruppi di attributi • Attributi ammessi in un

    tipo complesso • Solo definizioni globali • Non partecipano alla validazione <attributeGroup id = ID name = NCName ref = QName {any attributes with non-schema namespace . . .}> Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?)) </attributeGroup> <xs:attributeGroup name="i18n"> <xs:attribute name="lang" type="LanguageCode"/> <xs:attribute ref="xml:lang"/> <xs:attribute name="dir"> <xs:simpleType> <xs:restriction base="xs:token"> <xs:enumeration value="ltr"/> <xs:enumeration value="rtl"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:attributeGroup> <xs:attributeGroup name="attrs"> <xs:attributeGroup ref="coreattrs"/> <xs:attributeGroup ref="i18n"/> <xs:attributeGroup ref="events"/> </xs:attributeGroup>
  19. XML Schema: elementi • Elementi ammessi – Tipo semplice o

    complesso – Nome o riferimento – Valore di default – Valore fisso – Occorrenza <element abstract = boolean : false block = (#all | List of (extension | restriction | substitution)) default = string final = (#all | List of (extension | restriction)) fixed = string form = (qualified | unqualified) id = ID maxOccurs = (nonNegativeInteger | unbounded) : 1 minOccurs = nonNegativeInteger : 1 name = NCName nillable = boolean : false ref = QName substitutionGroup = QName type = QName {any attributes with non-schema namespace . . .}> Content: (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*)) </element> <xs:element name="PurchaseOrder" type="PurchaseOrderType"/> <xs:element name="gift"> <xs:complexType> <xs:sequence> <xs:element name="birthday" type="xs:date"/> <xs:element ref="PurchaseOrder"/> </xs:sequence> </xs:complexType> </xs:element>
  20. XML Schema: gruppi • Definizioni solo globali • Definiscono il

    modello – Scelte – Sequenze ordinate – Sequenze non ordinate – Annidamento di gruppi – Occorrenza <group id = ID maxOccurs = (nonNegativeInteger | unbounded) : 1 minOccurs = nonNegativeInteger : 1 name = NCName ref = QName {any attributes with non-schema namespace . . .}> Content: (annotation?, (all | choice | sequence)?) </group> <xs:group name="special.pre"> <xs:choice> <xs:element ref="br"/> <xs:element ref="span"/> <xs:element ref="bdo"/> <xs:element ref="map"/> </xs:choice> </xs:group> <xs:group name="special"> <xs:choice> <xs:group ref="special.pre"/> <xs:element ref="object"/> <xs:element ref="img"/> </xs:choice> </xs:group>
  21. XML Schema: tipi complessi • Definiscono tipi con attributi •

    Tipo di contenuto – Semplice – Complesso • Derivazione: – Restrizione – Estensione <complexType abstract = boolean : false block = (#all | List of (extension | restriction)) final = (#all | List of (extension | restriction)) id = ID mixed = boolean : false name = NCName {any attributes with non-schema namespace . . .}> Content: (annotation?, (simpleContent | complexContent | ((group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?)))) </complexType> <xs:complexType name="PurchaseOrderType"> <xs:sequence> <xs:element name="shipTo" type="USAddress"/> <xs:element name="billTo" type="USAddress"/> <xs:element ref="comment" minOccurs="0"/> <xs:element name="items" type="Items"/> </xs:sequence> <xs:attribute name="orderDate" type="xs:date"/> </xs:complexType>
  22. XML Schema: schema <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" targetNamespace="http://example.com/vehicles"> <xs:include schemaLocation="cars.xsd"/> <xs:include

    schemaLocation="bikes.xsd"/> <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/> <xs:import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/> <!-- seguono le dichiarazioni globali → </xs:schema>
  23. XML Schema: considerazioni Adatto per dati strutturati Molto espressivo Problemi

    di interoperabilità delle implementazioni – XSD 1.1 da implementare Complicato!? – Eventualmente il Primer W3C: http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/ – Poi un libro buono (ce ne sono pochi) Alcuni errori sono anche difficili da interpretare: • “When <simpleContent> is used, the base type must be a complexType whose content type is simple, or, only if restriction is specified, a complex type with mixed content and emptiable particle, or, only if extension is specified, a simple type.”
  24. Validazione di documenti XML • Verifica che un documento XML

    soddisfi entrambe le condizioni: – sia ben formato (well-formed) – sia valido rispetto ad un determinato schema XML schema (.xsd) well-formed XML instance (.xml) XML Schema validator XML instance is valid/invalid
  25. PSVI • Post Schema Validation Infoset – Concetto introdotto con

    XML Schema – Esprime un documento XML come modello di dati – Documento XML come oggetto • PSVI contiene – Vocabolario (nomi di elementi e attributi) – Il modello di contenuto – I tipi di dati
  26. Validatori XML Schema • libxml2 – Parte indipendente del framework

    di GNOME – Scritta in C – Implementazione XSD 1.0 quasi completa – xmllint (tool di libxml2) – lxml (Python bindings di libxml2) • Xerces – Supportato dalla Apache Foundation – Disponibile per Java, C++, Perl – Implementa completamente XSD 1.0 – Usato in molti software (PyCharm)
  27. lxml • Libreria Python per parsing di documenti XML e

    HTML • Rispetto alla libreria standard: – Interfaccia Element Tree estesa – Più veloce – Supporto XPath completo – Validazione (DTD, XML Schema, Relax NG) • PSVI ? – Non disponibile dopo la validazione – Non nelle C-API (file etreepublic.pxd) “lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.”
  28. Validare con lxml >>> import lxml.etree as etree >>> import

    io >>> f = io.StringIO("""<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> ... <xs:element name="A" type="TipoA"/> ... <xs:complexType name="TipoA"> ... <xs:sequence> ... <xs:element name="B" type="xs:string"/> ... </xs:sequence> ... </xs:complexType> ... </xs:schema> ... """) >>> xs_document = etree.parse(f) >>> xs = etree.XMLSchema(xs_document) >>> xml_document = etree.fromstring("<A><B></B></A>") >>> xs.validate(xml_document) True >>> xml_document = etree.fromstring("<A><C></C><D/></A>") >>> xs.validate(xml_document) False >>> xs.assertValid(xml_document) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "src/lxml/lxml.etree.pyx", line 3491, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:186663) lxml.etree.DocumentInvalid: Element 'C': This element is not expected. Expected is ( B )., line 1 >>> xs.error_log <string>:1:0:ERROR:SCHEMASV:SCHEMAV_ELEMENT_CONTENT: Element 'C': This element is not expected. Expected is ( B ). >>>
  29. XML in QE • Output già in XML, scritto da

    libreria proprietaria • Standardizzare l’uso di XML: 1.Fortran XML library (FoX) 2.Uso di schemi XSD per descrivere l’input e l’output • Scelto XML Schema: – Tipi di dato predefiniti – Orientato alle strutture dati – Già usato da altri sw del gruppo &CONTROL calculation='relax' dipfield=.false. disk_io='low' dt=20.0 etot_conv_thr=0.0001 forc_conv_thr=0.001 input_xml_schema_file='Al001_relax_bfgs.xml' iprint=100000 max_seconds=10000000 outdir='/tmp/espresso/tempdir' prefix='Al' pseudo_dir='/tmp/espresso/pseudo' restart_mode='from_scratch' title='' tprnfor=.false. tstress=.false. verbosity='high' wf_collect=.false. / &SYSTEM degauss=0.05 ecutwfc=12.0 force_symmorphic=.false. ibrav=0 input_dft='PZ' lspinorb=.false. nat=7 no_t_rev=.false. noinv=.false. noncolin=.false. nosym=.false. nosym_evc=.false. nspin=1 /
  30. XML Schema in QE • Uno schema per applicativo: –

    PW (2016) – Phonon (2016) – Neb (2017) – ... • Uso dello schema: – Convertire l’input XML nel formato namelist Fortran – Strutturare l’output XML – Costruire codice Fortran per i dati di output – Validare i dati prodotti ...
  31. Convertitore input di PW • Problema iniziale: – Non c’è

    corrispondenza biunivoca tra Input XML e Namelist – Alcuni parametri namelist sono funzione di più parametri XML – Python per il post-processing di QE • Decomposizione del problema: XML to Namelist translator PW XML Input PW input file (namelists and cards) XML Schema Decoder XML Python dictionary Namelist translator Fortran namelist qes.xsd
  32. Decoder XML Schema • Semplice costruire un dizionario • Non

    banale convertire i tipi base • Più complicato con tipi derivati <complexType name="total_energyType"> <sequence> <element type="double" name="etot"/> <element type="double" name="eband" minOccurs="0"/> <element type="double" name="ehart" minOccurs="0"/> <element type="double" name="vtxc" minOccurs="0"/> <element type="double" name="etxc" minOccurs="0"/> <element type="double" name="ewald" minOccurs="0"/> <element type="double" name="demet" minOccurs="0"/> <element type="double" name="efieldcorr" minOccurs="0"/> <element type="double" name="potentiostat_contr" minOccurs="0"/> </sequence> </complexType> <simpleType name="doubleListType"> <list itemType="double"/> </simpleType> <complexType name="matrixType"> <simpleContent> <extension base="qes:doubleListType" /> </simpleContent> </complexType>
  33. Oltre il decoder XSD • Altri schemi per QE da

    definire • Modifiche agli schemi esistenti • Complessità parsing XSD • Codice testato anche con altri schemi • Validatore lxml non utile per decodifica ... Convertitore XML + PyCon 7 (jsonschema) xmlschema
  34. xmlschema https://github.com/brunato/xmlschema >>> import xmlschema >>> import io >>> f

    = io.StringIO(""" ... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> ... <xs:element name="A" type="TipoA"/> ... <xs:complexType name="TipoA"> ... <xs:sequence> ... <xs:element name="B" type="xs:string"/> ... </xs:sequence> ... </xs:complexType> ... </xs:schema>""") >>> xs = xmlschema.XMLSchema(f) >>> xs.is_valid("<A><B></B></A>") True >>> xs.is_valid("<A><C></C></A>") False
  35. xmlschema: errore di validazione >>> xs.validate("<A><B></B></A>") >>> xs.validate("<A><C></C></A>") Traceback (most

    recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/site-packages/xmlschema/schema.py", line 385, in validate raise error xmlschema.exceptions.XMLSchemaValidationError: failed validating 'C' with <XsdGroup 'None' at 0x7fc7f5d45d68>. Reason: element not in schema! Schema: <xs:sequence xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="B" type="xs:string" /> </xs:sequence> Instance: <C />
  36. xmlschema: generazione errori >>> [e for e in xs.iter_errors("<A><B></B></A>")] []

    >>> [e for e in xs.iter_errors("<A><C></C></A>")] [XMLSchemaValidationError(<XsdGroup 'None' at 0x7fc7f5d45d68>, 'C', 'element not in schema!', <Element 'C' at 0x7fc7f430ab88>, <Element '{http://www.w3.org/2001/XMLSchema}sequence' at 0x7fc7f430a408>), XMLSchemaValidationError(<XsdElement 'B' at 0x7fc7f5d45da0>, <Element 'A' at 0x7fc7f430aae8>, "tag 'B' expected.")] >>>
  37. xmschema: XSD errato >>> f = io.StringIO(""" ... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    ... <xs:element name="A" type="TipoAx"/> ... <xs:complexType name="TipoA"> ... <xs:sequence> ... <xs:element name="B" type="xs:string"/> ... </xs:sequence> ... </xs:complexType> ... </xs:schema>""") >>> xs = xmlschema.XMLSchema(f) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/site-packages/xmlschema/schema.py", line 266, in __init__ self.maps.build() File "/usr/lib/python3.5/site-packages/xmlschema/schema.py", line 152, in build build_xsd_elements(self.elements, XSD_ELEMENT_TAG, **kwargs) File "/usr/lib/python3.5/site-packages/xmlschema/xsdbase.py", line 403, in build_xsd_map raise errors[0] if errors else XMLSchemaParseError(message=str(err), elem=elem) File "/usr/lib/python3.5/site-packages/xmlschema/xsdbase.py", line 381, in build_xsd_map elem, schema, is_global=True, **kwargs File "/usr/lib/python3.5/site-packages/xmlschema/factories.py", line 60, in xsd_factory_wrapper result = factory_function(elem, schema, instance, **kwargs) File "/usr/lib/python3.5/site-packages/xmlschema/factories.py", line 733, in xsd_element_factory element_type = xsd_lookup(type_qname, schema.maps.types) File "/usr/lib/python3.5/site-packages/xmlschema/xsdbase.py", line 305, in xsd_lookup raise XMLSchemaLookupError("Missing XSD reference %r!" % qname) xmlschema.exceptions.XMLSchemaLookupError: Missing XSD reference 'TipoAx'!
  38. xmlschema: validazione XSD >>> f = io.StringIO(""" ... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    ... <xs:element name="A" type="TipoA"/> ... <xs:complexType name="TipoA"> ... <xs:sequence> ... <xs:element name="B" type="xs:string"/> ... </xs:sequence> ... </xs:complexType> ... <xs:simplexType name="lista_interi"> ... <xs:list itemType="xs:int"/> ... </xs:simplexType> ... </xs:schema>""") >>> xs = xmlschema.XMLSchema(f, check_schema=True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.5/site-packages/xmlschema/schema.py", line 261, in __init__ self.check_schema(self.root) File "/usr/lib/python3.5/site-packages/xmlschema/schema.py", line 310, in check_schema raise error xmlschema.exceptions.XMLSchemaValidationError: failed validating '{http://www.w3.org/2001/XMLSchema}simplexType' with <XsdGroup 'None' at 0x7f562900bc88>. Reason: element not in schema! Schema: <xs:complexContent xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:extension base="xs:openAttrs"> ... </xs:complexContent> Instance: <xs:simplexType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="lista_interi"> <xs:list itemType="xs:int" /> </xs:simplexType>
  39. xmlschema: PSVI >>> xs.types {'TipoA': <XsdComplexType 'TipoA' at 0x7fc7f5d45320>} >>>

    xs.elements {'A': <XsdElement 'A' at 0x7fc7f62e19b0>} >>> xs.namespaces {'': '', 'xs': 'http://www.w3.org/2001/XMLSchema', 'xml': ' http://www.w3.org/XML/1998/namespace'} >>> xs.target_namespace '' >>>
  40. xmlschema: la decodifica >>> import xmlschema >>> import io >>>

    f = io.StringIO(""" ... <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> ... <xs:element name="A" type="TipoA"/> ... <xs:complexType name="TipoA"> ... <xs:sequence> ... <xs:element name="B" type="xs:string"/> ... <xs:element name="C" type="xs:int"/> ... <xs:element name="D" type="xs:float"/> ... </xs:sequence> ... </xs:complexType> ... </xs:schema>""") >>> xs = xmlschema.XMLSchema(f) >>> xs.to_dict("<A><B>Nome</B><C>99</C><D>100.921</D></A>") {'D': 100.921, 'B': 'Nome', 'C': 99} >>> d = xs.to_dict("<A><B>Nome</B><C>99.9</C><D>100.921</D></A>") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/brunato/Development/projects/xmlschema/xmlschema/schema.py", line 410, in to_dict return self.maps.elements[xml_root.tag].decode(xml_root) File "/home/brunato/Development/projects/xmlschema/xmlschema/xsdbase.py", line 531, in decode raise obj xmlschema.exceptions.XMLSchemaDecodeError: cannot decode '99.9' using the type <class 'int'> of validator <XsdAtomicBuiltin '{http://www.w3.org/2001/XMLSchema}int' at 0x7fd245289240>.
  41. xmlschema: builtins • Alcuni schemi sono di base: – XML

    – XSD (1.0/1.1) – XSI (XML Schema Instance) – XLink – HFP (Has Facet and Property) • Tipi built-in definiti direttamente: – Decodifica/codifica tipi semplici – Maggior velocità – Agganciati al meta-schema XSD
  42. xmlschema: stato • Prima versione stabile: febbraio 2017 • Testato

    con 200+ schemi • Implementa ~80% XSD 1.0 • Utilizzato da QE • Da adottare in altri software MaX (plugins Aiida)
  43. xmlschema: note • Decodifica e validazione strettamente connesse • OOP

    per rappresentare strutture XSD – Fabric functions interconnesse (kwargs!) – Decoratori – Closures – Type Checking (con moderazione …) • La libreria di Python aiuta ... • TODO: – XSD 1.0 e 1.1 al 100% – XPath – Encoding completo