Slide 1

Slide 1 text

Fridrich Štrba, Software Engineer Flat ODF: the underestimated flavour of Open Document

Slide 2

Slide 2 text

Whois?  Software Engineer in SUSE Linux Enterprise  Used to work for SUSE on LibreOffice and OpenOffice  Diverse background  FLOSS enthusiast  Working in free time on various projects including LibreOffice  Document Liberation Project

Slide 3

Slide 3 text

What is Flat ODF

Slide 4

Slide 4 text

“Normal” ODF File  Package OpenDocument File  Chapter 3.1.3 of ODF 1.2 specifications  Collection of xml files in a ZIP container      Images and OLE objects sub-streams in the ZIP container  Referenced by xlink:href attribute

Slide 5

Slide 5 text

How it looks inside 47 b- stor 14-May-05 22:23 mimetype 26087 b- stor 14-May-05 22:23 Thumbnails/thumbnail.png 1526 bl defN 14-May-05 22:23 meta.xml 11315 bl defN 14-May-05 22:23 settings.xml 44472 bl defN 14-May-05 22:23 content.xml 37148 b- stor 14-May-05 22:23 Pictures/100002010000038D00000226C198A9BD.png 99784 bl defN 14-May-05 22:23 Pictures/100185C800006D7C00004242BCCE6537.svg 385 b- stor 14-May-05 22:23 Pictures/100002010000001E0000001E57F6E610.png 181031 bl defN 14-May-05 22:23 styles.xml 0 b- stor 14-May-05 22:23 Configurations2/popupmenu/ 0 b- stor 14-May-05 22:23 Configurations2/menubar/ 0 b- stor 14-May-05 22:23 Configurations2/images/Bitmaps/ 0 b- stor 14-May-05 22:23 Configurations2/toolpanel/ 0 b- stor 14-May-05 22:23 Configurations2/progressbar/ 0 b- stor 14-May-05 22:23 Configurations2/statusbar/ 0 bl defN 14-May-05 22:23 Configurations2/accelerator/current.xml 0 b- stor 14-May-05 22:23 Configurations2/floater/ 0 b- stor 14-May-05 22:23 Configurations2/toolbar/ 1367 bl defN 14-May-05 22:23 META-INF/manifest.xml

Slide 6

Slide 6 text

“Flat” ODF file  Single OpenDocument XML Files  Chapter 3.1.2 of ODF 1.2 specifications  Contains the whole document  root element  office:mimetype and office:version attributes  Images and OLE objects inlined  [base64 data]

Slide 7

Slide 7 text

ZIP Storage (1)  Main entry point is the Central Directory End located at the end of the ZIP file.  Scanning for it towards the end of the file  Whole file must be present  Pointing at the Central Directory containing entries  For name access, iterate over them  Need to seek back to read the entry  Which contains pointer to Local File Header  Seek to local file-header offset  Need to seek back again  The content of the stream comes after the header

Slide 8

Slide 8 text

ZIP Storage (2) FILE E NTRY 1 FILE E NTRY 2 FILE E NTRY 3 FILE E NTRY 4 Local header 1 Local header 2 Local header 3 Local header n R elative offset 2 R elative offset 1 R elative offset 3 R elative offset n File entry 1 File entry 2 File entry 3 n C E NTR A L D IR E C TO RY

Slide 9

Slide 9 text

Advantages of Flat ODF

Slide 10

Slide 10 text

No Need for Special tools  No need for compression and stream extraction tools  ODF documents can be generated manually  Developer modifying a document  XSLT tools  Programmatic generation of documents  Further processing of documents  Easier parsing of document and extraction of relevant information

Slide 11

Slide 11 text

Sequential Access to File  Difference with ZIP file  No need to know the end of the file to start to parse it  Exchange file-format par excellence  Possibility to communicate document as SAX messages  Communicate XML to an application sequentially  Possibility to stream document over wire  Collaboration protocol?  Communication with CMS systems?

Slide 12

Slide 12 text

Simple API for XML (SAX)  A class containing typical functions  startDocument();  startElement(const char* name, std::map &attributes);  characters(const char* characters);  endElement(const char* name);  endDocument();  An XML producer calls functions of the class  Passing to it a content in abstract XML form  An XML consumer is inheriting from the class  Processing the content received from producer

Slide 13

Slide 13 text

Issues with Flat ODF

Slide 14

Slide 14 text

Duplication of embedded binaries  Package ODF file  The embedded images and ole objects are in a special storage  Referenced by link  Possibility to reference the same file several times and save space  Flat ODF file  Embedded binary objects are in-lined as base64 data  Inlined on every reference  Potentially huge files in corner cases.  Possible solution in specification extension  Data section listing binaries as we do with gradients or bitmap fills  Needs “political” will and pass through specification process.

Slide 15

Slide 15 text

Bitrot and underspecification  Few ODF producers support it out of the box  LibreOffice is supporting this file-format out of the box  New features disregard it  Unspecified how some features in package transform to flat  Risk of implementation specific solutions  Risk of abandon and bitrot  Possible solution in specification extension  Need some extra work and “political” will

Slide 16

Slide 16 text

Examples of Flat ODF use

Slide 17

Slide 17 text

LibreOffice UNO Filter API  XSLT based filters  com.sun.star.comp.Writer.XMLOasisImporter  pushes to LibreOffice flat ODT  com.sun.star.comp.Calc.XMLOasisExporter  receives from LibreOffice flat ODS  XML based filters  com::sun::star::xml::sax::XDocumentHandler  com.sun.star.document.ImportFilter pushes SAX messages to com::sun::star::xml::sax::XDocumentHandler  com.sun.star.document.ExportFilter is itself a com::sun::star::xml::sax::XDocumentHandler and receives SAX messages from LibreOffice

Slide 18

Slide 18 text

File importers  libodfgen generators  OdtGenerator – generates ODT from librevenge::RVNGTextInterface  Used by libwpd, libwps, libmwaw, libabw, libetonyek, libe-book,...  OdgGenerator – generates ODG from librevenge::RVNGDrawingInterface  Used by libwpg, libvisio, libcdr, libmspub, libpagemaker, …  OdpGenerator – generates ODP from librevenge::RVNGPresentationInterface  Added for the use in libetonyek  OdsGenerator – generates ODS from librevenge::RVNGSpreadsheetInterface  Added for needs of libwps and libmwaw

Slide 19

Slide 19 text

Thank you! www.documentliberation.org