Upgrade to Pro — share decks privately, control downloads, hide ads and more …

XML and Related Technologies - Lecture 7 - Web Technologies (1019888BNR)

Beat Signer
November 05, 2023

XML and Related Technologies - Lecture 7 - Web Technologies (1019888BNR)

This lecture forms part of the course Web Technologies given at the Vrije Universiteit Brussel.

Beat Signer

November 05, 2023
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005 Web Technologies XML and Related Technologies Prof.

    Beat Signer Department of Computer Science Vrije Universiteit Brussel beatsigner.com
  2. Beat Signer - Department of Computer Science - [email protected] 2

    November 7, 2023 What is XML? ▪ Standardised text format for (semi-)structured information ▪ Meta markup language (Extensible Markup Language) ▪ tool for defining other markup languages - e.g. XHTML, WML, VoiceXML, SVG, Office Open XML (OOXML) ▪ Data surrounded by text markup that describes the data ▪ ordered labelled tree <note date="2023-11-07"> <to>Maxim Van de Wynckel</to> <from>Beat Signer</from> <content>Let us discuss exercise 7 this afternoon ...</content> </note>
  3. Beat Signer - Department of Computer Science - [email protected] 3

    November 7, 2023 ... and What is it Not? ▪ XML is not a programming language ▪ however, it can be used to represent program instructions, configuration files etc. ▪ note that there is an XML application (XSLT) which is Turing complete ▪ XML is not a database ▪ XML is often used to store long-term data, but it lacks many database management system (DBMS) features ▪ many existing databases offer an XML import/export ▪ more recently there also exist native XML databases - e.g. BaseX or eXist
  4. Beat Signer - Department of Computer Science - [email protected] 4

    November 7, 2023 XML Example <?xml version="1.0"?> <publications> <publication type="inproceedings"> <title>Towards Cross-Media Information Spaces and Architectures</title> <author> <surname>Signer</surname> <forename>Beat</forename> </author> <howpublished>Proceedings of RCIS 2019</howpublished> <month>5</month> <year>2019</year> </publication> <publication type="article"> ... </publications>
  5. Beat Signer - Department of Computer Science - [email protected] 5

    November 7, 2023 Evolution of XML ▪ Descendant of Standard Generalized Markup Language (SGML) ▪ SGML is more powerful but (too) complex ▪ HTML is an SGML application ▪ XML was developed as an “SGML-Lite” version ▪ XML 1.0 published in February 1998 ▪ Since the initial XML release numerous associated standards have been published
  6. Beat Signer - Department of Computer Science - [email protected] 6

    November 7, 2023 Why has XML been so Successful? ▪ Simple ▪ General ▪ Accepted ▪ Many associated standards ▪ Many (freely) available tools
  7. Beat Signer - Department of Computer Science - [email protected] 7

    November 7, 2023 XML Specification ▪ Provides a grammar for XML documents in terms of ▪ placement of tags ▪ legal element names ▪ how attributes are attached to elements ▪ ... ▪ General tools ▪ parsers that can parse any XML document regardless of particular application tags ▪ editors (e.g. XMLSpy) and various programming APIs ▪ Specification available at https://www.w3.org/TR/xml/
  8. Beat Signer - Department of Computer Science - [email protected] 8

    November 7, 2023 XML Tree Document Structure ▪ An XML document tree can contain 7 types of nodes ▪ root node - always exactly one root node ▪ element nodes - element node with optional attribute nodes ▪ attribute nodes - name/value pairs ▪ text nodes - text belonging to an element or attribute ▪ comment nodes (<!-- ... -->) ▪ processing instruction nodess - pass information to a specific application via <? ... ?> ▪ namespace nodes - e.g.<publications xmlns:a="http://…" xmlns:b="http://...">
  9. Beat Signer - Department of Computer Science - [email protected] 9

    November 7, 2023 Well-Formedness and Validity ▪ An XML document is well-formed if it follows the rules of the XML specification ▪ correct nesting, only valid names, attributes in quotes, … ▪ An XML document can be valid according to its Document Type Definition (DTD) or XML Schema ▪ completely self-describing about its structure and content through - the document content - auxiliary files referred to in the document ▪ validity can be checked by a validating XML parser - online validation service available at https://validator.w3.org <ELEMENT publication (title, author+, howpublished?, month, year)> <ELEMENT title (#PCDATA)> <ELEMENT author (surname, forename)> <ATTLIST publication type CDATA> ...
  10. Beat Signer - Department of Computer Science - [email protected] 10

    November 7, 2023 Differences Between XML and HTML ▪ XML is a tool for specifying markup languages rather than a markup language itself ▪ specify “special markup languages for special applications” ▪ XML is not a presentation language ▪ defines content rather than presentation ▪ HTML mixes content, structure and presentation ▪ XML was designed to support a number of applications and not just web browsing ▪ XML documents should be well-formed and valid ▪ XML documents are easier to process by a program (parser)
  11. Beat Signer - Department of Computer Science - [email protected] 11

    November 7, 2023 Differences Between XML and HTML ... ▪ Readability is more important than conciseness ▪ e.g. <tablerow> rather than <tr> ▪ Matching of tags is case sensitive ▪ e.g. start tag <Bold> does not match end tag </BOLD> ▪ Markup requires matching start and end tags ▪ e.g. <p> and </p> ▪ exceptions are special non-enclosing tags e.g. <br/> or <image ... />
  12. Beat Signer - Department of Computer Science - [email protected] 12

    November 7, 2023 XHTML ▪ XHTML is a reformulation of HTML to make it an XML application ▪ we accept that HTML is here to stay ▪ improve HTML by using XML (with minimal effort) ▪ W3C stopped their work on XHTML (as discussed in lecture 4) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Vrije Universiteit Brussel</title> </head> <body> ... </body> </html>
  13. Beat Signer - Department of Computer Science - [email protected] 13

    November 7, 2023 Differences Between XHTML and HTML ▪ Documents must be valid ▪ XHTML namespace must be declared in <html> element ▪ <head> and <body> elements cannot be omitted ▪ <title> element must be the first element in the <head> ▪ End tags are required for non-empty clauses ▪ empty elements must consist of a start-tag and end-tag pair or an empty element (e.g. <br/>) ▪ Element and attribute names must be in lowercase ▪ Attribute values must always be quoted ▪ Attribute names cannot be used without a value
  14. Beat Signer - Department of Computer Science - [email protected] 14

    November 7, 2023 XML Technologies XPointer XLink XPath XQuery XSLT
  15. Beat Signer - Department of Computer Science - [email protected] 15

    November 7, 2023 Overview of XML Technologies ▪ XPath and XPointer ▪ addressing of XML elements and parts of elements ▪ XSL (Extensible Stylesheet Language) ▪ transforming XML documents (XSLT and XSL:FO) ▪ XLink (XML Linking Language) ▪ linking in XML ▪ XQuery (XML Query Language) ▪ querying XML documents ▪ Document Type Definition (DTD) and XML Schema ▪ definition of schemas for XML documents ▪ DTDs have a limited expressive power ▪ XML Schema introduces datatypes, inheritance etc.
  16. Beat Signer - Department of Computer Science - [email protected] 16

    November 7, 2023 Overview of XML Technologies ... ▪ SAX (Simple API for XML) ▪ event-based programming API for reading XML documents ▪ DOM (Document Object Model) ▪ programming API to access and manipulate XML documents as tree structures
  17. Beat Signer - Department of Computer Science - [email protected] 17

    November 7, 2023 Document Object Model (DOM) ▪ Defines a language neutral API for accessing and manipulating XML documents as a tree structure ▪ have already seen the HTML DOM model ▪ The entire document must be read and parsed before it can be used by a DOM application ▪ DOM parser not suited for large documents! ▪ Two different types of DOM Core interfaces for accessing supported content types ▪ generic node interface ▪ node type-specific interfaces (for each of the 7 node types) ▪ Various available DOM parsers ▪ e.g. JDOM parser specifically for Java
  18. Beat Signer - Department of Computer Science - [email protected] 18

    November 7, 2023 XPath ▪ Expression language to address elements of an XML document (used in XPointer, XSLT and XQuery) ▪ A location path is a sequence of location steps separated by a slash (/) ▪ various navigation axes such as child, parent, following etc. ▪ have a look at our XSLT/XPath reference document that is available on Canvas for all the details about XPath ▪ XPath expressions look similar to file pathnames /publications/publication /publications/publication[year>2008]/title //author[3] //title[@lang='eng']
  19. Beat Signer - Department of Computer Science - [email protected] 19

    November 7, 2023 XML Pointer Language (XPointer) ▪ Address points or ranges in an XML document ▪ Uses XPath expressions ▪ Introduces addressing relative to elements ▪ supports links to points without anchors URI#xpointer(publications/publication[1]) // relative to URI URI#xpointer(publications/publication[1]/range-to(publications/publication[3]/howpublished) // range
  20. Beat Signer - Department of Computer Science - [email protected] 20

    November 7, 2023 XML Linking Language (XLink) ▪ Standard way for creating links in XML documents ▪ Fixes limitations of HTML links where ▪ anchors must be placed within documents ▪ only entire documents or predefined marks (#) can be linked ▪ only one-to-one unidirectional links are supported ▪ XLinks can be defined in separate documents ▪ third-party link (metadata) server ▪ Two types of links ▪ simple links - associate exactly one local and one remote resource (similar to HTML links) ▪ extended links - associate an arbitrary number of resources
  21. Beat Signer - Department of Computer Science - [email protected] 21

    November 7, 2023 XML Linking Language (XLink) ... ▪ other attributes ▪ xlink:show: new, replace, embed ▪ xlink:actuate: onLoad, onRequest <!-- Simple Link --> <book xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://...">Touching the Void</book> <!-- Extended Link --> <book xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="extended"> <author xlink:type="resource" xlink:label="a1" xlink:title="Click Here"/> <style xlink:type="locator" xlink:href="http://..." xlink:label="s1"/> <style xlink:type="locator" xlink:href="http://..." xlink:label="s2"/> <path xlink:type="arc" xlink:from="a1" xlink:to="s1"/> </book>
  22. Beat Signer - Department of Computer Science - [email protected] 22

    November 7, 2023 XML Linking Language (XLink) ... ▪ Other XLink features ▪ linking parts of resources ▪ typed links ▪ The Annotea project uses XLink for managing external annotations ▪ for example used in the Amaya Web Browser ▪ Microsoft Edge web browser ▪ originally supported annotation of arbitrary webpages (functionality has been “temporarily” removed) Annotation in the Amaya Browser
  23. Beat Signer - Department of Computer Science - [email protected] 23

    November 7, 2023 Simple API for XML (SAX) ▪ Event-based API for XML document parsing ▪ many free SAX parsers available (e.g. Apache Xerces) ▪ Scans the document from start to end ▪ invokes callback methods ▪ Different kinds of events ▪ start of document ▪ end of document ▪ start tag of an element ▪ end tag of an element ▪ character data ▪ processing instruction ▪ SAX parser needs less memory than DOM parser ▪ DOM parser often uses SAX parser to build the DOM tree
  24. Beat Signer - Department of Computer Science - [email protected] 24

    November 7, 2023 XML Transformations ▪ Developers want to be able to transform data from one format to another ▪ processing of XML documents - XML to XML transformation ▪ post-processing of documents - e.g. XML to XHTML, XML to WML, XML to PDF, ... ▪ The Extensible Stylesheet Language Transformations (XSLT) language can be used for that purpose
  25. Beat Signer - Department of Computer Science - [email protected] 25

    November 7, 2023 XSLT Processor ▪ The XSLT processor (e.g. Xalan) applies an XSLT stylesheet to an XML document and produces the corresponding output document DTD Source Tree Result Tree Stylesheet Tree DTD XSLT Stylesheet XML Document XHTML, WML, ... DOM Parser XSLT Processor Input Document Output Document
  26. Beat Signer - Department of Computer Science - [email protected] 26

    November 7, 2023 XSL Transformations (XSLT) ▪ Most important part of XSL ▪ uses XPath for the navigation ▪ XSLT is an expression-based language based on functional programming concepts ▪ XSLT uses ▪ pattern matching to select parts of documents ▪ templates to perform transformations ▪ Most web browsers support XSLT ▪ transformation can be done on the client side based on an XML document and an associated XSLT document
  27. Beat Signer - Department of Computer Science - [email protected] 27

    November 7, 2023 Example <?xml version="1.0"?> <publications> <publication type="inproceedings"> <title>An Architecture for Open Cross-Media Annotation Services</title> <author> <surname>Signer</surname> <forename>Beat</forename> </author> <author> <surname>Norrie</surname> <forename>Moira</forename> </author> <howpublished>Proceedings of WISE 2009</howpublished> <month>10</month> <year>2009</year> </publication> <publication type="article"> ... </publications>
  28. Beat Signer - Department of Computer Science - [email protected] 28

    November 7, 2023 XSLT Stylesheet <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http.w3.org/1999/XSL/Transform"> ... <xsl:template match="author"> <p> <xsl:value-of select="surname"/> </p> </xsl:template> ... </xsl:stylesheet> <?xml version="1.0" encoding="utf-8"?> <html> ... <p>Signer</p> <p>Norrie</p> ... </html> output
  29. Beat Signer - Department of Computer Science - [email protected] 29

    November 7, 2023 Other XSLT Statements ▪ <xsl:for-each select="..."> ▪ select every XML element of a specified node-set ▪ <xsl:if test="..."> ▪ conditional test ▪ <xsl:sort select="..."/> ▪ sort the output ▪ ... ▪ Have a look at the XSLT/XPath reference document that is available on Canvas ▪ in exercise 7 you will have the chance to implement and execute different XSLT transformations
  30. Beat Signer - Department of Computer Science - [email protected] 30

    November 7, 2023 XML for Data Interchange ▪ Standard representation to exchange information between different systems ▪ General way to query data from different systems ▪ e.g. via the XML Query (XQuery) language ▪ Connect applications running on different operating systems and computers with different architectures ▪ XML Remote Procedure Call (XML-RPC) ▪ Simple Object Access Protocol (SOAP) which is a successor of XML-RPC and used for accessing Big Web Services - discussed later in the course
  31. Beat Signer - Department of Computer Science - [email protected] 31

    November 7, 2023 XML Remote Procedure Call (XML-RPC) ▪ XML-RPC specification released in April 1998 ▪ Advantages ▪ XML-based lingua franca understood by different applications ▪ HTTP as carrier protocol ▪ not tied to a single object model (as for example in CORBA) ▪ easy to implement (based on HTTP and XML standards) ▪ lightweight protocol ▪ built-in error handling ▪ Disadvantages ▪ slower than specialised protocols that are used in closed networks
  32. Beat Signer - Department of Computer Science - [email protected] 32

    November 7, 2023 XML-RPC Request and Response POST /RPC2 HTTP/1.0 User-Agent: Java1.2 Host: macrae.vub.ac.be Content-Type: text/xml;charset=UTF-8 Content-length: 245 <?xml version="1.0" encoding="ISO-8859-1"?> <methodCall> <methodName>Math.multiply</methodName> <params> <param> <value><double>128.0</double></value> </param> <param> <value><double>256.0</double></value> </param> </params> </methodCall> HTTP/1.1 200 OK Connection: close Content-Length: 159 Content-Type: text/xml Server: macbain.vub.ac.be <?xml version="1.0" encoding="ISO-8859-1"?> <methodResponse> <params> <param> <value><double>32768.0</double></value> </param> </params> </methodResponse> XML-RPC Request XML-RPC Response
  33. Beat Signer - Department of Computer Science - [email protected] 33

    November 7, 2023 XML-RPC Error Message HTTP/1.1 200 OK Connection: close Content-Length: 159 Content-Type: text/xml Server: macbain.vub.ac.be <?xml version="1.0" encoding="ISO-8859-1"?> <methodResponse> <fault> <value> <struct> <member> <name>faultCode</name> <value><int>873</int></value> </member> <member> <name>faultString</name> <value><string>Error message</string></value> </member> </struct> </value> </fault> </methodResponse> XML-RPC Response
  34. Beat Signer - Department of Computer Science - [email protected] 34

    November 7, 2023 XML-RPC Scalar Values XML-Tag Type Corresponding Java Type <i4> or <int> four-byte signed integer Integer <boolean> 0 or 1 Boolean <string> ASCII string String <double> double-precision signed float Double <dateTime.iso8601> date/time Date <base64> base64-encoded binary byte[]
  35. Beat Signer - Department of Computer Science - [email protected] 35

    November 7, 2023 XML-RPC Composed Values ▪ Complex data types can be represented by nested <struct> and <array> structures XML-Tag Type Corresponding Java Type <struct> A structure contains <member> elements and each member contains a <name> and a <value> element Hashtable <array> An array contains a single <data> element which can contain any number of <value> elements Vector
  36. Beat Signer - Department of Computer Science - [email protected] 36

    November 7, 2023 OMX-FS XML-RPC Example: GOMES ▪ Object-Oriented GUI for the Object Model Multi- User Extended Filesystem ▪ GOMES is implemented in Java and uses XML-RPC to communicate with the Object Model Multi-user Extended File System (OMX-FS) which was im- plemented in the Oberon programming language XML-RPC
  37. Beat Signer - Department of Computer Science - [email protected] 37

    November 7, 2023 Framework for Universal Client Access ▪ Generic database interface instead of developing a new interface from scratch for each new device type ▪ The presented eXtensible Information Management Architecture (XIMA) is based on ▪ OMS Java object database - managing the application data ▪ Java Servlet Technology ▪ generic XML database interface - separation of content and representation ▪ XSLT - appropriate XSLT stylesheet chosen based on User-Agent HTTP header field
  38. Beat Signer - Department of Computer Science - [email protected] 38

    November 7, 2023 XIMA Architecture OMS Java Workspace OMS Java API XML Server HTML Servlet WML Servlet VXML Servlet HTML Browser WML Browser VXML Browser Delegation Builds XML based on JDOM XML + XSLT → Response OM Model Collections, Associations, multiple inheritance and multiple instantiation Main Entry Servlet
  39. Beat Signer - Department of Computer Science - [email protected] 39

    November 7, 2023 Generic XIMA Interfaces XHTML Interface WML Interface
  40. Beat Signer - Department of Computer Science - [email protected] 40

    November 7, 2023 Voice Interfaces ▪ Trend for ubiquitous information services ▪ small screens, keyboards etc. often clumsy to use ▪ Sometimes it is necessary to have hand-free interfaces ▪ e.g. while driving or operating a machine ▪ Alternative input modality for visually impaired users ▪ Voice interfaces can be accessed by a regular phone ▪ no new device is required ▪ no installation effort ▪ Improvements in speech recognition and text-to-speech synthesis make automatic voice interfaces more feasible ▪ e.g. for call centres
  41. Beat Signer - Department of Computer Science - [email protected] 41

    November 7, 2023 VoiceXML Architecture Speech Recogniser Converts voice input into text Speech model Language Analyser Extracts meaning from text Grammar Application Server Gets data (text) from database Application database Speech Synthesiser Generates speech output Pronounciation rules Meaning Text Text Voice Input Voice Output Speech Speech
  42. Beat Signer - Department of Computer Science - [email protected] 42

    November 7, 2023 VoiceXML Architecture (for XIMA) XIMA Framework Apache Web Server Tomcat OMS Java Database Websphere Voice Server SDK BeVocal Voice Portal
  43. Beat Signer - Department of Computer Science - [email protected] 43

    November 7, 2023 Basic VoiceXML Concepts ▪ Dialogue ▪ conversational state in a form or menu ▪ form - interaction that collects values for field item variables ▪ menu - presents user with a choice of options - transition to next dialogue based on choice ▪ Input ▪ recognition of spoken input (or recording of spoken input) ▪ recognition of DTMF (dual-tone multi-frequency) input ▪ Output ▪ speech synthesis (TTS) ▪ recorded audio files
  44. Beat Signer - Department of Computer Science - [email protected] 44

    November 7, 2023 VoiceXML Form Example <?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0"> <form id="drinkForm"> <field name="drink"> <prompt>Would you like to order beer, wine, whisky, or nothing?</prompt> <grammar src="drinks.grxml" type="application/srgs+xml"/> </field> <block> <submit next="http://www.wise.vub.ac.be/drinks.php"/> </block> </form> </vxml>
  45. Beat Signer - Department of Computer Science - [email protected] 45

    November 7, 2023 associations collections objects The database contains #Collections and #Associations Would you like to go to the collections, to the associations, directly to an object or back to the main menu? The database contains the following # associations Choose an association Association 'name' contains #A Would you like to list the members or go back? Association 'name' contains the following # associations Choose a 'domaintype' or a 'rangetype' or say back Object 'oID' is dressed with type 'type' and currently viewed as type 'type'. It contains #Attr, #Links, and #Methods Choose a link or say back The object contains the following # attributes Would you like to hear the attributes, the links or the methods or go back? You can choose among the following links You can choose among the following methods You can view the object as the following types The database contains the following # collections Choose a collection Collection 'name' contains #M Would you like to list the members or go back? Collection 'name' contains the following # members Choose one of the members The database contains #Objects Choose an object or say back Choose a method or say back Choose one of the types or say back The result of the method is Result
  46. Beat Signer - Department of Computer Science - [email protected] 46

    November 7, 2023 Example: Avalanche Forecasting System Project to provide WAP and voice access
  47. Beat Signer - Department of Computer Science - [email protected] 47

    November 7, 2023 Other XML Applications ▪ Synchronized Multimedia Integration Language (SMIL) ▪ animations (timing, transitions etc.) ▪ Mathematical Markup Language (MathML) ▪ mathematical notations (content and structure) ▪ Scalable Vector Graphics (SVG) ▪ two-dimensional vector graphics (static or dynamic) ▪ Ink Markup Language (InkML) ▪ digital ink representation (e.g. from digital pen) ▪ Note that XML standards can also be combined ▪ e.g. XHTML+Voice Profile 1.0
  48. Beat Signer - Department of Computer Science - [email protected] 48

    November 7, 2023 Other XML Applications … ▪ Office Open XML (OOXML) ▪ file format (ZIP) for representing word processing documents, presentations etc. (e.g. *.docx, *.pptx and *.xlsx) - various XML files within these ZIP documents - specific markup languages for different domains (wordprocessingML, presentationML, spreadsheetML, …) <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"> ... <a:p> <a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" /> <a:t>Other XML</a:t> </a:r> <a:r><a:rPr lang="en-GB" dirty="0" smtClean="0" /> <a:t>Applications ...</a:t> </a:r> <a:endParaRPr lang="en-GB" dirty="0" /> </a:p> ... </p:sld> single slide from a pptx file
  49. Beat Signer - Department of Computer Science - [email protected] 49

    November 7, 2023 References ▪ Elliotte Rusty Harold and W. Scott Means, XML in a Nutshell, O'Reilly Media, September 2004 ▪ XML and XML Technology Tutorials ▪ https://www.w3schools.com/xml/ ▪ Masoud Kalali, Using XML in Java ▪ https://dzone.com/refcardz/using-xml-java ▪ VoiceXML Version 2.0 ▪ https://www.w3.org/TR/voicexml20/ ▪ XML-RPC Homepage ▪ http://www.xmlrpc.com
  50. Beat Signer - Department of Computer Science - [email protected] 50

    November 7, 2023 References ... ▪ B. Signer et al., Aural Interfaces to Databases Based on VoiceXML, Proceedings of VDB6, Brisbane, Australia, 2002 ▪ https://beatsigner.com/publications/signer_VDB6.pdf ▪ eXtensible Information Management Architecture (XIMA) ▪ https://beatsigner.com/xima.html