Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Flat ODF: the under-estimated flavour of Open Document

Flat ODF: the under-estimated flavour of Open Document

Fridrich Strba

May 07, 2014
Tweet

More Decks by Fridrich Strba

Other Decks in Technology

Transcript

  1. Fridrich Štrba, Software Engineer
    Flat ODF:
    the underestimated flavour
    of Open Document

    View Slide

  2. Whois?
     Software Engineer in SUSE Linux
    Enterprise
     Used to work for SUSE on LibreOffice
    and OpenOffice
     Diverse background
     FLOSS enthusiast
     Working in free time on various
    projects including LibreOffice
     Document Liberation Project

    View Slide

  3. What is Flat ODF

    View Slide

  4. “Normal” ODF File
     Package OpenDocument File
     Chapter 3.1.3 of ODF 1.2 specifications
     Collection of xml files in a ZIP container








     Images and OLE objects sub-streams in the ZIP container
     Referenced by xlink:href attribute

    View Slide

  5. How it looks inside
    47 b- stor 14-May-05 22:23 mimetype
    26087 b- stor 14-May-05 22:23 Thumbnails/thumbnail.png
    1526 bl defN 14-May-05 22:23 meta.xml
    11315 bl defN 14-May-05 22:23 settings.xml
    44472 bl defN 14-May-05 22:23 content.xml
    37148 b- stor 14-May-05 22:23 Pictures/100002010000038D00000226C198A9BD.png
    99784 bl defN 14-May-05 22:23 Pictures/100185C800006D7C00004242BCCE6537.svg
    385 b- stor 14-May-05 22:23 Pictures/100002010000001E0000001E57F6E610.png
    181031 bl defN 14-May-05 22:23 styles.xml
    0 b- stor 14-May-05 22:23 Configurations2/popupmenu/
    0 b- stor 14-May-05 22:23 Configurations2/menubar/
    0 b- stor 14-May-05 22:23 Configurations2/images/Bitmaps/
    0 b- stor 14-May-05 22:23 Configurations2/toolpanel/
    0 b- stor 14-May-05 22:23 Configurations2/progressbar/
    0 b- stor 14-May-05 22:23 Configurations2/statusbar/
    0 bl defN 14-May-05 22:23 Configurations2/accelerator/current.xml
    0 b- stor 14-May-05 22:23 Configurations2/floater/
    0 b- stor 14-May-05 22:23 Configurations2/toolbar/
    1367 bl defN 14-May-05 22:23 META-INF/manifest.xml

    View Slide

  6. “Flat” ODF file
     Single OpenDocument XML Files
     Chapter 3.1.2 of ODF 1.2 specifications
     Contains the whole document

    root element

    office:mimetype and office:version attributes
     Images and OLE objects inlined


    [base64 data]

    View Slide

  7. ZIP Storage (1)
     Main entry point is the Central Directory End located at the end of
    the ZIP file.
     Scanning for it towards the end of the file
     Whole file must be present
     Pointing at the Central Directory containing entries
     For name access, iterate over them
     Need to seek back to read the entry
     Which contains pointer to Local File Header
     Seek to local file-header offset
     Need to seek back again
     The content of the stream comes after the header

    View Slide

  8. ZIP Storage (2)
    FILE
    E NTRY 1

    FILE
    E NTRY 2

    FILE
    E NTRY 3

    FILE
    E NTRY 4

    Local header 1
    Local header 2
    Local header 3
    Local header n
    R elative offset 2
    R elative offset 1
    R elative offset 3
    R elative offset n
    File entry 1
    File entry 2
    File entry 3
    n
    C E NTR A L
    D IR E C TO RY

    View Slide

  9. Advantages of Flat ODF

    View Slide

  10. No Need for Special tools
     No need for compression and stream extraction tools
     ODF documents can be generated manually
     Developer modifying a document
     XSLT tools
     Programmatic generation of documents
     Further processing of documents
     Easier parsing of document and extraction of relevant information

    View Slide

  11. Sequential Access to File
     Difference with ZIP file
     No need to know the end of the file to start to parse it
     Exchange file-format par excellence
     Possibility to communicate document as SAX messages
     Communicate XML to an application sequentially
     Possibility to stream document over wire
     Collaboration protocol?
     Communication with CMS systems?

    View Slide

  12. Simple API for XML (SAX)
     A class containing typical functions

    startDocument();

    startElement(const char* name,
    std::map &attributes);

    characters(const char* characters);

    endElement(const char* name);

    endDocument();
     An XML producer calls functions of the class
     Passing to it a content in abstract XML form
     An XML consumer is inheriting from the class
     Processing the content received from producer

    View Slide

  13. Issues with Flat ODF

    View Slide

  14. Duplication of embedded binaries
     Package ODF file
     The embedded images and ole objects are in a special storage
     Referenced by link
     Possibility to reference the same file several times and save space
     Flat ODF file
     Embedded binary objects are in-lined as base64 data
     Inlined on every reference
     Potentially huge files in corner cases.
     Possible solution in specification extension
     Data section listing binaries as we do with gradients or bitmap fills
     Needs “political” will and pass through specification process.

    View Slide

  15. Bitrot and underspecification
     Few ODF producers support it out of the box
     LibreOffice is supporting this file-format out of the box
     New features disregard it
     Unspecified how some features in package transform to flat
     Risk of implementation specific solutions
     Risk of abandon and bitrot
     Possible solution in specification extension
     Need some extra work and “political” will

    View Slide

  16. Examples of Flat ODF use

    View Slide

  17. LibreOffice UNO Filter API
     XSLT based filters
     com.sun.star.comp.Writer.XMLOasisImporter
     pushes to LibreOffice flat ODT
     com.sun.star.comp.Calc.XMLOasisExporter
     receives from LibreOffice flat ODS
     XML based filters
     com::sun::star::xml::sax::XDocumentHandler
     com.sun.star.document.ImportFilter pushes SAX messages to
    com::sun::star::xml::sax::XDocumentHandler
     com.sun.star.document.ExportFilter is itself a
    com::sun::star::xml::sax::XDocumentHandler and receives SAX messages
    from LibreOffice

    View Slide

  18. File importers
     libodfgen generators
     OdtGenerator – generates ODT from librevenge::RVNGTextInterface
     Used by libwpd, libwps, libmwaw, libabw, libetonyek, libe-book,...
     OdgGenerator – generates ODG from
    librevenge::RVNGDrawingInterface
     Used by libwpg, libvisio, libcdr, libmspub, libpagemaker, …
     OdpGenerator – generates ODP from
    librevenge::RVNGPresentationInterface
     Added for the use in libetonyek
     OdsGenerator – generates ODS from
    librevenge::RVNGSpreadsheetInterface
     Added for needs of libwps and libmwaw

    View Slide

  19. Thank you!
    www.documentliberation.org

    View Slide