Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Flat ODF: the under-estimated flavour of Open Document

Flat ODF: the under-estimated flavour of Open Document

Fridrich Strba

May 07, 2014
Tweet

More Decks by Fridrich Strba

Other Decks in Technology

Transcript

  1. Whois?  Software Engineer in SUSE Linux Enterprise  Used

    to work for SUSE on LibreOffice and OpenOffice  Diverse background  FLOSS enthusiast  Working in free time on various projects including LibreOffice  Document Liberation Project
  2. “Normal” ODF File  Package OpenDocument File  Chapter 3.1.3

    of ODF 1.2 specifications  Collection of xml files in a ZIP container  <office:document-content>  <office:document-styles>  <office:document-meta>  <office:document-settings>  Images and OLE objects sub-streams in the ZIP container  Referenced by xlink:href attribute
  3. How it looks inside 47 b- stor 14-May-05 22:23 mimetype

    26087 b- stor 14-May-05 22:23 Thumbnails/thumbnail.png 1526 bl defN 14-May-05 22:23 meta.xml 11315 bl defN 14-May-05 22:23 settings.xml 44472 bl defN 14-May-05 22:23 content.xml 37148 b- stor 14-May-05 22:23 Pictures/100002010000038D00000226C198A9BD.png 99784 bl defN 14-May-05 22:23 Pictures/100185C800006D7C00004242BCCE6537.svg 385 b- stor 14-May-05 22:23 Pictures/100002010000001E0000001E57F6E610.png 181031 bl defN 14-May-05 22:23 styles.xml 0 b- stor 14-May-05 22:23 Configurations2/popupmenu/ 0 b- stor 14-May-05 22:23 Configurations2/menubar/ 0 b- stor 14-May-05 22:23 Configurations2/images/Bitmaps/ 0 b- stor 14-May-05 22:23 Configurations2/toolpanel/ 0 b- stor 14-May-05 22:23 Configurations2/progressbar/ 0 b- stor 14-May-05 22:23 Configurations2/statusbar/ 0 bl defN 14-May-05 22:23 Configurations2/accelerator/current.xml 0 b- stor 14-May-05 22:23 Configurations2/floater/ 0 b- stor 14-May-05 22:23 Configurations2/toolbar/ 1367 bl defN 14-May-05 22:23 META-INF/manifest.xml
  4. “Flat” ODF file  Single OpenDocument XML Files  Chapter

    3.1.2 of ODF 1.2 specifications  Contains the whole document  <office:document> root element  office:mimetype and office:version attributes  Images and OLE objects inlined  <office:binary-data> [base64 data] </office:binary-data>
  5. ZIP Storage (1)  Main entry point is the Central

    Directory End located at the end of the ZIP file.  Scanning for it towards the end of the file  Whole file must be present  Pointing at the Central Directory containing entries  For name access, iterate over them  Need to seek back to read the entry  Which contains pointer to Local File Header  Seek to local file-header offset  Need to seek back again  The content of the stream comes after the header
  6. ZIP Storage (2) FILE E NTRY 1 <data> FILE E

    NTRY 2 <data> FILE E NTRY 3 <data> FILE E NTRY 4 <data> Local header 1 Local header 2 Local header 3 Local header n R elative offset 2 R elative offset 1 R elative offset 3 R elative offset n File entry 1 File entry 2 File entry 3 n C E NTR A L D IR E C TO RY
  7. No Need for Special tools  No need for compression

    and stream extraction tools  ODF documents can be generated manually  Developer modifying a document  XSLT tools  Programmatic generation of documents  Further processing of documents  Easier parsing of document and extraction of relevant information
  8. Sequential Access to File  Difference with ZIP file 

    No need to know the end of the file to start to parse it  Exchange file-format par excellence  Possibility to communicate document as SAX messages  Communicate XML to an application sequentially  Possibility to stream document over wire  Collaboration protocol?  Communication with CMS systems?
  9. Simple API for XML (SAX)  A class containing typical

    functions  startDocument();  startElement(const char* name, std::map<std::string, std::string> &attributes);  characters(const char* characters);  endElement(const char* name);  endDocument();  An XML producer calls functions of the class  Passing to it a content in abstract XML form  An XML consumer is inheriting from the class  Processing the content received from producer
  10. Duplication of embedded binaries  Package ODF file  The

    embedded images and ole objects are in a special storage  Referenced by link  Possibility to reference the same file several times and save space  Flat ODF file  Embedded binary objects are in-lined as base64 data  Inlined on every reference  Potentially huge files in corner cases.  Possible solution in specification extension  Data section listing binaries as we do with gradients or bitmap fills  Needs “political” will and pass through specification process.
  11. Bitrot and underspecification  Few ODF producers support it out

    of the box  LibreOffice is supporting this file-format out of the box  New features disregard it  Unspecified how some features in package transform to flat  Risk of implementation specific solutions  Risk of abandon and bitrot  Possible solution in specification extension  Need some extra work and “political” will
  12. LibreOffice UNO Filter API  XSLT based filters  com.sun.star.comp.Writer.XMLOasisImporter

     pushes to LibreOffice flat ODT  com.sun.star.comp.Calc.XMLOasisExporter  receives from LibreOffice flat ODS  XML based filters  com::sun::star::xml::sax::XDocumentHandler  com.sun.star.document.ImportFilter pushes SAX messages to com::sun::star::xml::sax::XDocumentHandler  com.sun.star.document.ExportFilter is itself a com::sun::star::xml::sax::XDocumentHandler and receives SAX messages from LibreOffice
  13. File importers  libodfgen generators  OdtGenerator – generates ODT

    from librevenge::RVNGTextInterface  Used by libwpd, libwps, libmwaw, libabw, libetonyek, libe-book,...  OdgGenerator – generates ODG from librevenge::RVNGDrawingInterface  Used by libwpg, libvisio, libcdr, libmspub, libpagemaker, …  OdpGenerator – generates ODP from librevenge::RVNGPresentationInterface  Added for the use in libetonyek  OdsGenerator – generates ODS from librevenge::RVNGSpreadsheetInterface  Added for needs of libwps and libmwaw