Slide 1

Slide 1 text

Document Liberation Project News from the reverse and straight engineering world David Tardon, Fridrich Štrba, Валёк Филиппов

Slide 2

Slide 2 text

Agenda  Admitted agenda  About the project  Document Liberation  News  Boring technical details  Hidden agenda  You'll have to wait until the end to know :)

Slide 3

Slide 3 text

The Project

Slide 4

Slide 4 text

History  Launched officially today (2014-04-02)  Group working on file-formats within LibreOffice since the beginning  GSoC 2011 - libvisio  Clear feeling that this is bigger then LibreOffice itself  A service of the LibreOffice community to the wider FOSS world  Reuse by other projects  Scalability issues  One person can produce maximum 24 man-hours per day  Need to attract more people  Not everybody wants to become LibreOffice developer

Slide 5

Slide 5 text

We believe...  Ownership of documents  ... that documents and their content belong to their creators, not software vendors  Unhindered access of an owner  ... that access to content you own should not be hindered by the fact that the application that created it is not maintained any more or that the application does not work on the particular operating system that you use  Importance of truly open standards as a long term solution  ... that use of truly open and free standards for encoding digital content is the only long- term guarantee that a user's digital content will never be beholden to a single vendor  Importance of FOSS implementation  ... that implementation of Free and Open Source Software that can read proprietary file- formats is the best solution to escape vendor lock during the transition period to truly open and free standards

Slide 6

Slide 6 text

Our mission is...  File-format understanding  ... to try to understand the structure and details of proprietary, undocumented file-formats  Parser library implementations  ... to use the understanding of the file-formats to implement FOSS libraries that are able to parse such documents and extract as much information as possible from them  Being good citizens of ODF ecosystem  ... to use our existing framework to encode this data in a truly free and open standard file-format: the Open Document Format

Slide 7

Slide 7 text

The boring specifics

Slide 8

Slide 8 text

Goodness in OLEToy  New file-formats understood  Adobe PageMaker  Versions 3 to 7  New contributor to OLEToy!  David Tardon  Software602 602Text  Zoner Callisto (Draw)  Zoner Zebra (predecessor of Callisto)  Apple Keynote 6 / Pages 5 / Numbers 3

Slide 9

Slide 9 text

Cool feature everybody envies Binary Diff (special thanks to nomis for the encouragement)

Slide 10

Slide 10 text

New libraries  Libetonyek  Support first for Keynote documents  Extending support to Numbers and Pages  Libe-book  Supports a host of e-book file-formats  Libfreehand  Started the implementing of Freehand import filter  Libabw  Now we can load documents of our “cousin”  … and more still to come

Slide 11

Slide 11 text

New document types  Previously only text documents and graphics  Text documents based on libwpd API  Libwpd, libwps, libmwaw  Graphics based on libwpg API  Libwpg, libvisio, libcdr, libmspub, libfreehand  New presentation support  Presentations based on libetonyek API  Libwpg's API was too limited for presentations  Need to extend to spreadsheets too  Libmwaw  Libwps

Slide 12

Slide 12 text

libodfgen  ODF Generation was duplicated in several places  LibreOffice writerperfect module  Standalone writerperfect  Calligra sources  It makes sense to collect all bugs in the same place  OdtGenerator class  Implementations of WPXDocumentInterface  OdgGenerator class  Implementation of WPGPaintInterface  OdpGenerator class added later  Implementation of KEYPresentationInterface  OdfDocumentHandler interface  SAX-like interface to output XML in a generic way

Slide 13

Slide 13 text

librevenge  Interface of each document type in different library  Libwpd, libwpg, libetonyek  The common types in libwpd  Libwpd is a text-related library  All others had to link to it  Consolidating the types and interfaces  Interfaces  RVNGTextInterface, RVNGDrawingInterface,  RVNGPresentationInterface, RVNGSpreadsheetInterface  Types  RVNGProperty, RVNGPropertyList, RVNGPropertyListVector  Extended the capacities  RVNGBinaryData, RVNGString, RVNGStringVector

Slide 14

Slide 14 text

librevenge-stream  RVNGInputStream interface  Extended to handle structured documents a bit more efficiently  Several implementations:  RVNGFileStream  Implementation using file name  RVNGStringStream  Implementation using a buffer of data  RVNGDirectoryStream  Accesses a directory structure as if it was a structured document  OLE2 and ZIP documents handled transparently  No need to know what is the container type  Gives the responsibility to the implementers!

Slide 15

Slide 15 text

librevenge-generators  Useful implementations of the different interfaces  Raw Generators  Implementations of the different RVNG interfaces  printing callbacks called and properties passed  Used for regression testing  CSV generator for spreadsheets, HTML, Text generators  SVG generators  Exception: SVG generator for drawings  included in librevenge core library  Historical reasons  ODF generators in libodfgen  More complicated  Historical reasons

Slide 16

Slide 16 text

Advantage of the design  Parser libraries independent and self-contained  Much easier life of filter writers  Enough to focus on the structure of document to parse  Call the interface callbacks that one needs  Avoid sucking in unrelated libraries  Librevenge itself and libodfgen have boost as build-time dependency  No need to link text-related libraries in drawing application  Considerable reduction of code duplication  Less risk to have bugs fixed in one place and hanging around in another  Faster to start a library skeleton

Slide 17

Slide 17 text

I am excited! I want to be part of this!

Slide 18

Slide 18 text

Ways to contribute  Code development  Contribute to one of our existing libraries, or  Start a new one  Understanding and documenting file-formats  OLEToy  Preferred way to visualize documents  Need a bit of knowledge of Python  Preparation of sample documents  Need to access a generating application  Important for regression testing

Slide 19

Slide 19 text

Future file-formats to import?  Google Summer of Code  The possibility for a student to work with outstanding mentors  David Tardon  Fridrich Štrba  Валёк Филиппов  Several formats ready for straight engineering  Apple Numbers, Pages  Adobe PageMaker  Zoner Draw

Slide 20

Slide 20 text

Thank you! www.documentliberation.org