Upgrade to Pro — share decks privately, control downloads, hide ads and more …

XML and Related Technologies - Lecture 7 - Web Technologies (1019888BNR)

Beat Signer
November 05, 2023

XML and Related Technologies - Lecture 7 - Web Technologies (1019888BNR)

This lecture forms part of the course Web Technologies given at the Vrije Universiteit Brussel.

Beat Signer

November 05, 2023
Tweet

More Decks by Beat Signer

Other Decks in Education

Transcript

  1. 2 December 2005
    Web Technologies
    XML and Related Technologies
    Prof. Beat Signer
    Department of Computer Science
    Vrije Universiteit Brussel
    beatsigner.com

    View full-size slide

  2. Beat Signer - Department of Computer Science - [email protected] 2
    November 7, 2023
    What is XML?
    ▪ Standardised text format for (semi-)structured
    information
    ▪ Meta markup language (Extensible Markup Language)
    ▪ tool for defining other markup languages
    - e.g. XHTML, WML, VoiceXML, SVG, Office Open XML (OOXML)
    ▪ Data surrounded by text markup that describes the data
    ▪ ordered labelled tree

    Maxim Van de Wynckel
    Beat Signer
    Let us discuss exercise 7 this afternoon ...

    View full-size slide

  3. Beat Signer - Department of Computer Science - [email protected] 3
    November 7, 2023
    ... and What is it Not?
    ▪ XML is not a programming language
    ▪ however, it can be used to represent program
    instructions, configuration files etc.
    ▪ note that there is an XML application (XSLT) which is
    Turing complete
    ▪ XML is not a database
    ▪ XML is often used to store long-term data, but it lacks many
    database management system (DBMS) features
    ▪ many existing databases offer an XML import/export
    ▪ more recently there also exist native XML databases
    - e.g. BaseX or eXist

    View full-size slide

  4. Beat Signer - Department of Computer Science - [email protected] 4
    November 7, 2023
    XML Example



    Towards Cross-Media Information Spaces and Architectures

    Signer
    Beat

    Proceedings of RCIS 2019
    5
    2019


    ...

    View full-size slide

  5. Beat Signer - Department of Computer Science - [email protected] 5
    November 7, 2023
    Evolution of XML
    ▪ Descendant of Standard Generalized Markup
    Language (SGML)
    ▪ SGML is more powerful but (too) complex
    ▪ HTML is an SGML application
    ▪ XML was developed as an “SGML-Lite” version
    ▪ XML 1.0 published in February 1998
    ▪ Since the initial XML release numerous associated
    standards have been published

    View full-size slide

  6. Beat Signer - Department of Computer Science - [email protected] 6
    November 7, 2023
    Why has XML been so Successful?
    ▪ Simple
    ▪ General
    ▪ Accepted
    ▪ Many associated standards
    ▪ Many (freely) available tools

    View full-size slide

  7. Beat Signer - Department of Computer Science - [email protected] 7
    November 7, 2023
    XML Specification
    ▪ Provides a grammar for XML documents in terms of
    ▪ placement of tags
    ▪ legal element names
    ▪ how attributes are attached to elements
    ▪ ...
    ▪ General tools
    ▪ parsers that can parse any XML document regardless of
    particular application tags
    ▪ editors (e.g. XMLSpy) and various programming APIs
    ▪ Specification available at https://www.w3.org/TR/xml/

    View full-size slide

  8. Beat Signer - Department of Computer Science - [email protected] 8
    November 7, 2023
    XML Tree Document Structure
    ▪ An XML document tree can contain 7 types of nodes
    ▪ root node
    - always exactly one root node
    ▪ element nodes
    - element node with optional attribute nodes
    ▪ attribute nodes
    - name/value pairs
    ▪ text nodes
    - text belonging to an element or attribute
    ▪ comment nodes ()
    ▪ processing instruction nodess
    - pass information to a specific application via ... ?>
    ▪ namespace nodes
    - e.g.

    View full-size slide

  9. Beat Signer - Department of Computer Science - [email protected] 9
    November 7, 2023
    Well-Formedness and Validity
    ▪ An XML document is well-formed if it follows
    the rules of the XML specification
    ▪ correct nesting, only valid names, attributes in quotes, …
    ▪ An XML document can be valid according to its
    Document Type Definition (DTD) or XML Schema
    ▪ completely self-describing about its structure and content through
    - the document content
    - auxiliary files referred to in the document
    ▪ validity can be checked by a validating XML parser
    - online validation service available at https://validator.w3.org




    ...

    View full-size slide

  10. Beat Signer - Department of Computer Science - [email protected] 10
    November 7, 2023
    Differences Between XML and HTML
    ▪ XML is a tool for specifying markup languages rather
    than a markup language itself
    ▪ specify “special markup languages for special applications”
    ▪ XML is not a presentation language
    ▪ defines content rather than presentation
    ▪ HTML mixes content, structure and presentation
    ▪ XML was designed to support a number of applications
    and not just web browsing
    ▪ XML documents should be well-formed and valid
    ▪ XML documents are easier to process by a program (parser)

    View full-size slide

  11. Beat Signer - Department of Computer Science - [email protected] 11
    November 7, 2023
    Differences Between XML and HTML ...
    ▪ Readability is more important than conciseness
    ▪ e.g. rather than
    ▪ Matching of tags is case sensitive
    ▪ e.g. start tag does not match end tag
    ▪ Markup requires matching start and end tags
    ▪ e.g. and
    ▪ exceptions are special non-enclosing tags
    e.g.
    or

    View full-size slide

  12. Beat Signer - Department of Computer Science - [email protected] 12
    November 7, 2023
    XHTML
    ▪ XHTML is a reformulation of HTML to make
    it an XML application
    ▪ we accept that HTML is here to stay
    ▪ improve HTML by using XML (with minimal effort)
    ▪ W3C stopped their work on XHTML (as discussed in lecture 4)
    "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


    Vrije Universiteit Brussel


    ...


    View full-size slide

  13. Beat Signer - Department of Computer Science - [email protected] 13
    November 7, 2023
    Differences Between XHTML and HTML
    ▪ Documents must be valid
    ▪ XHTML namespace must be declared in element
    ▪ and elements cannot be omitted
    ▪ element must be the first element in the
    ▪ End tags are required for non-empty clauses
    ▪ empty elements must consist of a start-tag and end-tag pair or an
    empty element (e.g.
    )
    ▪ Element and attribute names must be in lowercase
    ▪ Attribute values must always be quoted
    ▪ Attribute names cannot be used without a value

    View full-size slide

  14. Beat Signer - Department of Computer Science - [email protected] 14
    November 7, 2023
    XML Technologies
    XPointer
    XLink
    XPath
    XQuery
    XSLT

    View full-size slide

  15. Beat Signer - Department of Computer Science - [email protected] 15
    November 7, 2023
    Overview of XML Technologies
    ▪ XPath and XPointer
    ▪ addressing of XML elements and parts of elements
    ▪ XSL (Extensible Stylesheet Language)
    ▪ transforming XML documents (XSLT and XSL:FO)
    ▪ XLink (XML Linking Language)
    ▪ linking in XML
    ▪ XQuery (XML Query Language)
    ▪ querying XML documents
    ▪ Document Type Definition (DTD) and XML Schema
    ▪ definition of schemas for XML documents
    ▪ DTDs have a limited expressive power
    ▪ XML Schema introduces datatypes, inheritance etc.

    View full-size slide

  16. Beat Signer - Department of Computer Science - [email protected] 16
    November 7, 2023
    Overview of XML Technologies ...
    ▪ SAX (Simple API for XML)
    ▪ event-based programming API for reading XML documents
    ▪ DOM (Document Object Model)
    ▪ programming API to access and manipulate XML documents as
    tree structures

    View full-size slide

  17. Beat Signer - Department of Computer Science - [email protected] 17
    November 7, 2023
    Document Object Model (DOM)
    ▪ Defines a language neutral API for accessing and
    manipulating XML documents as a tree structure
    ▪ have already seen the HTML DOM model
    ▪ The entire document must be read and parsed before it
    can be used by a DOM application
    ▪ DOM parser not suited for large documents!
    ▪ Two different types of DOM Core interfaces for
    accessing supported content types
    ▪ generic node interface
    ▪ node type-specific interfaces (for each of the 7 node types)
    ▪ Various available DOM parsers
    ▪ e.g. JDOM parser specifically for Java

    View full-size slide

  18. Beat Signer - Department of Computer Science - [email protected] 18
    November 7, 2023
    XPath
    ▪ Expression language to address elements of an XML
    document (used in XPointer, XSLT and XQuery)
    ▪ A location path is a sequence of location steps separated
    by a slash (/)
    ▪ various navigation axes such as child, parent, following etc.
    ▪ have a look at our XSLT/XPath reference document that is
    available on Canvas for all the details about XPath
    ▪ XPath expressions look similar to file pathnames
    /publications/publication
    /publications/publication[year>2008]/title
    //author[3]
    //title[@lang='eng']

    View full-size slide

  19. Beat Signer - Department of Computer Science - [email protected] 19
    November 7, 2023
    XML Pointer Language (XPointer)
    ▪ Address points or ranges in an XML document
    ▪ Uses XPath expressions
    ▪ Introduces addressing relative to elements
    ▪ supports links to points without anchors
    URI#xpointer(publications/publication[1]) // relative to URI
    URI#xpointer(publications/publication[1]/range-to(publications/publication[3]/howpublished) // range

    View full-size slide

  20. Beat Signer - Department of Computer Science - [email protected] 20
    November 7, 2023
    XML Linking Language (XLink)
    ▪ Standard way for creating links in XML documents
    ▪ Fixes limitations of HTML links where
    ▪ anchors must be placed within documents
    ▪ only entire documents or predefined marks (#) can be linked
    ▪ only one-to-one unidirectional links are supported
    ▪ XLinks can be defined in separate documents
    ▪ third-party link (metadata) server
    ▪ Two types of links
    ▪ simple links
    - associate exactly one local and one remote resource (similar to HTML links)
    ▪ extended links
    - associate an arbitrary number of resources

    View full-size slide

  21. Beat Signer - Department of Computer Science - [email protected] 21
    November 7, 2023
    XML Linking Language (XLink) ...
    ▪ other attributes
    ▪ xlink:show: new, replace, embed
    ▪ xlink:actuate: onLoad, onRequest

    xlink:type="simple" xlink:href="http://...">Touching the Void

    xlink:type="extended">





    View full-size slide

  22. Beat Signer - Department of Computer Science - [email protected] 22
    November 7, 2023
    XML Linking Language (XLink) ...
    ▪ Other XLink features
    ▪ linking parts of resources
    ▪ typed links
    ▪ The Annotea project
    uses XLink for managing
    external annotations
    ▪ for example used in the
    Amaya Web Browser
    ▪ Microsoft Edge web browser
    ▪ originally supported annotation of arbitrary webpages
    (functionality has been “temporarily” removed)
    Annotation in the Amaya Browser

    View full-size slide

  23. Beat Signer - Department of Computer Science - [email protected] 23
    November 7, 2023
    Simple API for XML (SAX)
    ▪ Event-based API for XML document parsing
    ▪ many free SAX parsers available (e.g. Apache Xerces)
    ▪ Scans the document from start to end
    ▪ invokes callback methods
    ▪ Different kinds of events
    ▪ start of document
    ▪ end of document
    ▪ start tag of an element
    ▪ end tag of an element
    ▪ character data
    ▪ processing instruction
    ▪ SAX parser needs less memory than DOM parser
    ▪ DOM parser often uses SAX parser to build the DOM tree

    View full-size slide

  24. Beat Signer - Department of Computer Science - [email protected] 24
    November 7, 2023
    XML Transformations
    ▪ Developers want to be able to transform data from one
    format to another
    ▪ processing of XML documents
    - XML to XML transformation
    ▪ post-processing of documents
    - e.g. XML to XHTML, XML to WML, XML to PDF, ...
    ▪ The Extensible Stylesheet Language Transformations
    (XSLT) language can be used for that purpose

    View full-size slide

  25. Beat Signer - Department of Computer Science - [email protected] 25
    November 7, 2023
    XSLT Processor
    ▪ The XSLT processor (e.g. Xalan) applies an XSLT stylesheet to an
    XML document and produces the corresponding output document
    DTD
    Source Tree Result Tree
    Stylesheet Tree
    DTD
    XSLT Stylesheet
    XML Document XHTML, WML, ...
    DOM
    Parser
    XSLT
    Processor
    Input Document Output Document

    View full-size slide

  26. Beat Signer - Department of Computer Science - [email protected] 26
    November 7, 2023
    XSL Transformations (XSLT)
    ▪ Most important part of XSL
    ▪ uses XPath for the navigation
    ▪ XSLT is an expression-based language based on
    functional programming concepts
    ▪ XSLT uses
    ▪ pattern matching to select parts of documents
    ▪ templates to perform transformations
    ▪ Most web browsers support XSLT
    ▪ transformation can be done on the client side based on an XML
    document and an associated XSLT document

    View full-size slide

  27. Beat Signer - Department of Computer Science - [email protected] 27
    November 7, 2023
    Example



    An Architecture for Open Cross-Media Annotation Services

    Signer
    Beat


    Norrie
    Moira

    Proceedings of WISE 2009
    10
    2009


    ...

    View full-size slide

  28. Beat Signer - Department of Computer Science - [email protected] 28
    November 7, 2023
    XSLT Stylesheet


    ...





    ...



    ...
    Signer
    Norrie
    ...

    output

    View full-size slide

  29. Beat Signer - Department of Computer Science - [email protected] 29
    November 7, 2023
    Other XSLT Statements

    ▪ select every XML element of a specified node-set

    ▪ conditional test

    ▪ sort the output
    ▪ ...
    ▪ Have a look at the XSLT/XPath reference document that
    is available on Canvas
    ▪ in exercise 7 you will have the chance to implement and execute
    different XSLT transformations

    View full-size slide

  30. Beat Signer - Department of Computer Science - [email protected] 30
    November 7, 2023
    XML for Data Interchange
    ▪ Standard representation to exchange information
    between different systems
    ▪ General way to query data from different systems
    ▪ e.g. via the XML Query (XQuery) language
    ▪ Connect applications running on different operating
    systems and computers with different architectures
    ▪ XML Remote Procedure Call (XML-RPC)
    ▪ Simple Object Access Protocol (SOAP) which is a successor
    of XML-RPC and used for accessing Big Web Services
    - discussed later in the course

    View full-size slide

  31. Beat Signer - Department of Computer Science - [email protected] 31
    November 7, 2023
    XML Remote Procedure Call (XML-RPC)
    ▪ XML-RPC specification released in April 1998
    ▪ Advantages
    ▪ XML-based lingua franca understood by different applications
    ▪ HTTP as carrier protocol
    ▪ not tied to a single object model (as for example in CORBA)
    ▪ easy to implement (based on HTTP and XML standards)
    ▪ lightweight protocol
    ▪ built-in error handling
    ▪ Disadvantages
    ▪ slower than specialised protocols that are used in closed
    networks

    View full-size slide

  32. Beat Signer - Department of Computer Science - [email protected] 32
    November 7, 2023
    XML-RPC Request and Response
    POST /RPC2 HTTP/1.0
    User-Agent: Java1.2
    Host: macrae.vub.ac.be
    Content-Type: text/xml;charset=UTF-8
    Content-length: 245


    Math.multiply


    128.0


    256.0



    HTTP/1.1 200 OK
    Connection: close
    Content-Length: 159
    Content-Type: text/xml
    Server: macbain.vub.ac.be




    32768.0



    XML-RPC Request XML-RPC Response

    View full-size slide

  33. Beat Signer - Department of Computer Science - [email protected] 33
    November 7, 2023
    XML-RPC Error Message
    HTTP/1.1 200 OK
    Connection: close
    Content-Length: 159
    Content-Type: text/xml
    Server: macbain.vub.ac.be






    faultCode
    873


    faultString
    Error message





    XML-RPC Response

    View full-size slide

  34. Beat Signer - Department of Computer Science - [email protected] 34
    November 7, 2023
    XML-RPC Scalar Values
    XML-Tag Type Corresponding Java Type
    or four-byte signed integer Integer
    0 or 1 Boolean
    ASCII string String
    double-precision signed float Double
    date/time Date
    base64-encoded binary byte[]

    View full-size slide

  35. Beat Signer - Department of Computer Science - [email protected] 35
    November 7, 2023
    XML-RPC Composed Values
    ▪ Complex data types can be represented by nested
    and structures
    XML-Tag Type Corresponding Java Type
    A structure contains
    elements and
    each member contains a
    and a
    element
    Hashtable
    An array contains a single
    element which can
    contain any number of
    elements
    Vector

    View full-size slide

  36. Beat Signer - Department of Computer Science - [email protected] 36
    November 7, 2023
    OMX-FS
    XML-RPC Example: GOMES
    ▪ Object-Oriented GUI for
    the Object Model Multi-
    User Extended Filesystem
    ▪ GOMES is implemented in
    Java and uses XML-RPC
    to communicate with the
    Object Model Multi-user
    Extended File System
    (OMX-FS) which was im-
    plemented in the Oberon
    programming language
    XML-RPC

    View full-size slide

  37. Beat Signer - Department of Computer Science - [email protected] 37
    November 7, 2023
    Framework for Universal Client Access
    ▪ Generic database interface instead of developing a new
    interface from scratch for each new device type
    ▪ The presented eXtensible Information Management
    Architecture (XIMA) is based on
    ▪ OMS Java object database
    - managing the application data
    ▪ Java Servlet Technology
    ▪ generic XML database interface
    - separation of content and representation
    ▪ XSLT
    - appropriate XSLT stylesheet chosen based on User-Agent HTTP header field

    View full-size slide

  38. Beat Signer - Department of Computer Science - [email protected] 38
    November 7, 2023
    XIMA Architecture
    OMS Java Workspace
    OMS Java API
    XML Server
    HTML Servlet WML Servlet VXML Servlet
    HTML
    Browser
    WML
    Browser
    VXML
    Browser
    Delegation
    Builds XML
    based on JDOM
    XML + XSLT
    → Response
    OM Model
    Collections, Associations,
    multiple inheritance and
    multiple instantiation
    Main Entry Servlet

    View full-size slide

  39. Beat Signer - Department of Computer Science - [email protected] 39
    November 7, 2023
    Generic XIMA Interfaces
    XHTML Interface WML Interface

    View full-size slide

  40. Beat Signer - Department of Computer Science - [email protected] 40
    November 7, 2023
    Voice Interfaces
    ▪ Trend for ubiquitous information services
    ▪ small screens, keyboards etc. often clumsy to use
    ▪ Sometimes it is necessary to have hand-free interfaces
    ▪ e.g. while driving or operating a machine
    ▪ Alternative input modality for visually impaired users
    ▪ Voice interfaces can be accessed by a regular phone
    ▪ no new device is required
    ▪ no installation effort
    ▪ Improvements in speech recognition and text-to-speech
    synthesis make automatic voice interfaces more feasible
    ▪ e.g. for call centres

    View full-size slide

  41. Beat Signer - Department of Computer Science - [email protected] 41
    November 7, 2023
    VoiceXML Architecture
    Speech
    Recogniser
    Converts voice
    input into text
    Speech model
    Language
    Analyser
    Extracts meaning
    from text
    Grammar
    Application
    Server
    Gets data (text)
    from database
    Application
    database
    Speech
    Synthesiser
    Generates
    speech output
    Pronounciation
    rules
    Meaning
    Text Text
    Voice Input Voice Output
    Speech Speech

    View full-size slide

  42. Beat Signer - Department of Computer Science - [email protected] 42
    November 7, 2023
    VoiceXML Architecture (for XIMA)
    XIMA Framework
    Apache
    Web Server
    Tomcat
    OMS Java
    Database
    Websphere Voice
    Server SDK
    BeVocal
    Voice Portal

    View full-size slide

  43. Beat Signer - Department of Computer Science - [email protected] 43
    November 7, 2023
    Basic VoiceXML Concepts
    ▪ Dialogue
    ▪ conversational state in a form or menu
    ▪ form
    - interaction that collects values for field item variables
    ▪ menu
    - presents user with a choice of options
    - transition to next dialogue based on choice
    ▪ Input
    ▪ recognition of spoken input (or recording of spoken input)
    ▪ recognition of DTMF (dual-tone multi-frequency) input
    ▪ Output
    ▪ speech synthesis (TTS)
    ▪ recorded audio files

    View full-size slide

  44. Beat Signer - Department of Computer Science - [email protected] 44
    November 7, 2023
    VoiceXML Form Example

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.w3.org/2001/vxml
    http://www.w3.org/TR/voicexml20/vxml.xsd" version="2.0">


    Would you like to order beer, wine, whisky, or nothing?







    View full-size slide

  45. Beat Signer - Department of Computer Science - [email protected] 45
    November 7, 2023
    associations
    collections objects
    The database contains #Collections and #Associations
    Would you like to go to the collections, to the associations,
    directly to an object or back to the main menu?
    The database contains the
    following # associations
    Choose an association
    Association 'name' contains #A
    Would you like to list the
    members or go back?
    Association 'name' contains the
    following # associations
    Choose a 'domaintype' or
    a 'rangetype' or say back
    Object 'oID' is dressed with type 'type' and currently viewed as type 'type'. It contains #Attr, #Links, and #Methods
    Choose a link
    or say back
    The object contains the
    following # attributes
    Would you like to hear the attributes, the links or
    the methods or go back?
    You can choose among
    the following links
    You can choose among
    the following methods
    You can view the object
    as the following types
    The database contains the
    following # collections
    Choose a collection
    Collection 'name' contains #M
    Would you like to list the
    members or go back?
    Collection 'name' contains the
    following # members
    Choose one of the members
    The database contains #Objects
    Choose an object or say back
    Choose a method
    or say back
    Choose one of the
    types or say back
    The result of the
    method is Result

    View full-size slide

  46. Beat Signer - Department of Computer Science - [email protected] 46
    November 7, 2023
    Example: Avalanche Forecasting System
    Project to provide WAP
    and voice access

    View full-size slide

  47. Beat Signer - Department of Computer Science - [email protected] 47
    November 7, 2023
    Other XML Applications
    ▪ Synchronized Multimedia Integration Language (SMIL)
    ▪ animations (timing, transitions etc.)
    ▪ Mathematical Markup Language (MathML)
    ▪ mathematical notations (content and structure)
    ▪ Scalable Vector Graphics (SVG)
    ▪ two-dimensional vector graphics (static or dynamic)
    ▪ Ink Markup Language (InkML)
    ▪ digital ink representation (e.g. from digital pen)
    ▪ Note that XML standards can also be combined
    ▪ e.g. XHTML+Voice Profile 1.0

    View full-size slide

  48. Beat Signer - Department of Computer Science - [email protected] 48
    November 7, 2023
    Other XML Applications …
    ▪ Office Open XML (OOXML)
    ▪ file format (ZIP) for representing word processing documents,
    presentations etc. (e.g. *.docx, *.pptx and *.xlsx)
    - various XML files within these ZIP documents
    - specific markup languages for different domains (wordprocessingML,
    presentationML, spreadsheetML, …)

    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
    ...

    Other XML


    Applications ...


    ...
    single slide from a pptx file

    View full-size slide

  49. Beat Signer - Department of Computer Science - [email protected] 49
    November 7, 2023
    References
    ▪ Elliotte Rusty Harold and W. Scott Means,
    XML in a Nutshell, O'Reilly Media, September 2004
    ▪ XML and XML Technology Tutorials
    ▪ https://www.w3schools.com/xml/
    ▪ Masoud Kalali, Using XML in Java
    ▪ https://dzone.com/refcardz/using-xml-java
    ▪ VoiceXML Version 2.0
    ▪ https://www.w3.org/TR/voicexml20/
    ▪ XML-RPC Homepage
    ▪ http://www.xmlrpc.com

    View full-size slide

  50. Beat Signer - Department of Computer Science - [email protected] 50
    November 7, 2023
    References ...
    ▪ B. Signer et al., Aural Interfaces to Databases
    Based on VoiceXML, Proceedings of VDB6, Brisbane,
    Australia, 2002
    ▪ https://beatsigner.com/publications/signer_VDB6.pdf
    ▪ eXtensible Information Management Architecture (XIMA)
    ▪ https://beatsigner.com/xima.html

    View full-size slide

  51. 2 December 2005
    Next Lecture
    Web 2.0 Patterns and Technologies

    View full-size slide