Upgrade to Pro — share decks privately, control downloads, hide ads and more …

XML: A New Standard for Data

XML: A New Standard for Data

Overview of XML and its use on the web and in academic libraries.

Daniel Stout

May 30, 2003
Tweet

More Decks by Daniel Stout

Other Decks in Technology

Transcript

  1. Find  this  presentation  Online n To  find  this  presentation  online,

     visit: http://staffweb.lib.uiowa.edu/dstout/xml.htm n Or  will  be  up  on  Libraries  Intranet
  2. XML:  What  is  it? n Extensible  Markup  Language (XML) n

    What’s  a  Markup  Language? ¨ Example:  HTML–Hypertext  Markup  Language ¨ It’s  just  a  text  file… ¨ …which  makes  it  easy  to  transfer  on  the  Web. n It  has  a  variety  of  functions,  such  as…
  3. What  does  XML  do  exactly? n Standardized  method  for  encapsulating

    data   and  digital  objects. n It  is  a  wrapper  that  goes  around  digital   information  – text,  images,  video. n XML  can  encode  metadata… n …but  also  can  define  the  features  of  a  document   (e.g.  TOC,  formatting) n XML  is  a  way  to  describe  document  structure – like  the  structure  of  a  book,  for  example.
  4. XML  is  between  the  brackets n It  uses  tags in

     brackets,  just  like  HTML. ¨ HTML  example  file: <html> <head> <title>This is My Web Page</title> </head> <body background=“#FFFFFF”> <p>Hello, World! </body> </html>
  5. XML  can  look  very  simple n A  very  basic  and

     valid  XML  file: <?xml version="1.0"?> <oldjoke> <burns>Say <quote>goodnight</quote>, Gracie.</burns> <allen><quote>Goodnight, Gracie.</quote></allen> <applause /> </oldjoke>
  6. A  MARC  Record  in  XML <fixfield id="1">" 90178038 "</fixfield> <fixfield

    id="3">"DLC"</fixfield> <fixfield id="5">"19900814092959.1"</fixfield> <fixfield id="8">"900724s1974 po af 000 0 fre "</fixfield> <varfield id="10" i1=" " i2=" "> <subfield label="a">90178038</subfield> </varfield> <varfield id="40" i1=" " i2=" "> <subfield label="a">DLC</subfield> <subfield label="c">DLC</subfield> </varfield>
  7. But  XML  can  be  complicated n Less  readable  than  HTML…

    n …because  it  is  more  powerful. <xml xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:p="urn:schemas-microsoft- com:office:powerpoint" xmlns:oa="urn:schemas- microsoft-com:office:activation"> <p:presentation sizeof="screen" gridspacingx="49152" gridspacingy="49152"> <p:master id="8" slidesn="1C00DA9,3702FA30" type="main" href="master08.htm" xmlhref="master08.xml" template="Pixel" layout="title_body" slots="title,body,dateTime,footer,slideNumber"> <p:schemes>
  8. Why  XML  and  not  HTML? n Unlimited  tagsets  and  definitions

    ¨ XML  is  a  metalanguage n HTML  describes  a  web  page ¨ XML  describes  all  manner  of  “documents” n That  is,  HTML  is  fixed,  limited  and  informal ¨ XML  is  versatile,  multifaceted  and  formal
  9. Advantages  of  XML n Rigorous  Grammar  – all  tags  are

     balanced n Open  Standard  – anyone  can  use  XML n Flexibility  – can  define  many  types  of  data n Relatively  Simple  – concepts  are  easy
  10. XML  has  a  Rigorous  Grammar n Balanced  Tags n Tags

     come  in  sets <strong>This is some bold text</strong> <ol><li>One Item</li> <li>Second Item</li></ol> n Individual  tags  must  have  a  terminator <br /> n Tags  must  be  nested  – cannot  overlap <strong><em>Invalid</strong></em>
  11. XML  has  a  Rigorous  Grammar n DTD  – Document  Type

     Definition ¨ DTD  defines  how  the  document  is  structured,  that  is,  allowable  tags  and   grammar ¨ Sets  rules  for  the  document,  such  as: A  <p>  is  part  of  a  <chapter>  which  is  part  of  a  <book>  -­-­ but  don’t  allow  a  <p>   in  a  <toc> n Schemas  – A  Restriction  of  DTD ¨ Can  use  multiple  schemas  with  a  given  DTD n Rigorous  Grammar  =  Machine  Readable ¨ Platform  independent…software  independent ¨ If  you  know  the  DTD,  you  can  write  software  to  read  that  type  of  XML   file. ¨ Correctly  formatted  XML  can  be  parsed.
  12. XML  is  an  Open  Standard n W3C  has  control  of

     the  XML  specification ¨ World  Wide  Web  Consortium-­Cambridge,  MA ¨ http://www.w3.org/XML/Core/#Publications n Anyone  can  use  the  standard  – no  fees n Only  the  W3C  can  maintain  and  update n W3C  maintains  many  web  standards… …such  as:  HTML,  XHTML,  CSS,  PNG
  13. XML  is  Flexible n No  predefined  tags… n …DTD  defines

     the  grammar… n …which  means  that  XML  can  contain n Text,  Graphics,  Video  …  and  so  on. n Many  new  languages  appearing  that  are   based  on  XML. ¨ Such  as….
  14. Flexibility  – XML-­based  Languages n XHTML:Extensible  HyperText  Markup  Language n

    MetaL:  Meta  Programming  Language n MML:  Music  Markup  Language n XBRL:  Extensible  Business  Reporting  Language n MathML:  Mathematical  Markup  Language n OML:  Weather  Observation  Definition  Format n Adex:  Newspaper  Classified  Ads  Format n AML:  Astronomical  Markup  Language n rezML:  Resume  and  Job  Listing  Markup  Lang.
  15. XML  as  a  concept is  simple n Designed  as  a

     common  platform  for   electronic  delivery  of  data n The  Swiss  Army  Knife  of  file  formats   n Simpler  than  SGML ¨ XML  is  actually  a  simplified  subset  of  SGML ¨ Standard  Generalized  Markup  Language ¨ SGML  &  XML  were  both  initially  intended  to   facilitate  large-­scale  electronic  publishing
  16. Why  XML  and  not  SGML? n Simpler  structure ¨ Easier

     to  parse…  and  therefore… ¨ …easier  to  build  software ¨ SGML  systems  are  complex  &  expensive ¨ XML-­based  systems  are  much  easier  to  build n …easier  to  transmit  on  the  Internet. n Greater  degree  of  flexibility… …with  less  complicated  grammar.
  17. Can  I  parse  it  and  does  it  validate? n Properly

     formatted  documents  can  be   mechanically  validated  for  correctness n Validation  ensures  proper  structure… …does  not  ensure  correct  content n All  XML-­based  languages  can  be  validated n XHTML  @  http://validator.w3.org/
  18. XML  and  XSL/XSLT n Extensible  Stylesheet  Language n Like  Cascading

     StyleSheets  in  HTML n Defines  the  look of  an  XML  document n …that  is,  how  individual  tags  are   presented  in,  say,  a  browser  or  software n Multiple  stylesheets  for  multiple  uses         (i.e.  print,  on-­screen,  etc.)
  19. RSS:  Really  Simple   Syndication A  New  Way  to  Read

     and   Receive  News  on  the  Internet
  20. The  RSS  Format n Really  Simple  Syndication  …  or,  

    n RDF  Site  Summary   n A  way  to  provide  headlines  and  content  through   a  method  of  syndication n Exciting  new  format  being  used…   n …by  the  press  and  by  individuals  (e.g.  blogs)   n You  can  “subscribe”  to  an  RSS  news  feed.
  21. RSS  Readers n A  program  designed  to  read  RSS  feeds.

      n SharpReader,  Syndirella,  Radio  Userland   n Common:  3-­pane  window  (like  email)   n Also:  some  use  a  web-­based  reader   n The  reader  automatically  updates  the   feeds  on  a  regular  basis.   n Full  text  messages  vs.  Summaries
  22. RSS  is  another  example  of  XML n RSS  is  an

     XML-­based  language n Profusion  of  versions  and  formats ¨ 7  different  versions ¨ And  2  significantly  different  formats ¨ A  problem  with  non-­proprietary  standards n RDF  – Resource  Description  Framework
  23. XML  in  Libraries n Uses: ¨ Digital  Collections  /  Digital

     Libraries ¨ Metadata  &  Cataloging ¨ Document  delivery ¨ Archival  storage
  24. XML  &  Digital  Collections/Libraries n Storage  format  for  digital  objects

    n Encoded  Archival  Description  (EAD) – uses  SGML  – shift  to  XML http://www.loc.gov/ead/ n XML:  the  new  standard n Interoperability  – less  likely  obsolescence
  25. XML  &  Metadata/Cataloging n Metadata  Encoding  and  Description  Standard  (METS)

    http://www.loc.gov/standards/mets/ n Dublin  Core  XML  Schemas http://www.dublincore.org/schemas/xmls/ n Open  Archives  Initiative  Protocol  for  Metadata   Harvesting  (OAI-­PMH) -­-­ a  schema  for  MARC  records  in  XML http://www.openarchives.org/OAI/2.0/guidelines-­ oai_marc.htm n RDF – Dublin  Core,  Open  Directory  and  General   Purpose  Catalogs http://www.w3.org/RDF/#gen-­col
  26. XML  &  Archival  Storage n TEI:  Text  Encoding  Initiative using

     an  SGML  encoding  scheme  that  is  maximally   expressive  and  minimally  obsolescent http://www.tei-­c.org/ n HPSS:  High  Performance  Storage  System   http://www.sdsc.edu/hpss/ n ADSM n The  Question:  Is  XML  an  Archival  Format?
  27. HYPERLINKS  to  RESOURCES n http://www.w3.org/XML/ n http://www.xml.com/ n http://www.xml.com/pub/a/98/10/guide0.html n

    http://www.tei-­c.org/ n http://www.dublincore.org/schemas/xmls/ n http://validator.w3.org/ n http://www.ucc.ie:8080/cocoon/xmlfaq