$30 off During Our Annual Pro Sale. View Details »

Document Liberation Project: Trying to Achieve Freedom from Vendor Lock

Document Liberation Project: Trying to Achieve Freedom from Vendor Lock

Presentation of Document Liberation Project on FISL15 in Porto Alegre (RS), Brazil. Explaining the philosophy of the project, tools that the project uses and technical details about the software framework.

Fridrich Strba

May 09, 2014

More Decks by Fridrich Strba

Other Decks in Technology


  1. Document Liberation Project Trying to Achieve Freedom from Vendor Lock

    Fridrich Štrba, Software Engineer
  2. Whois?  Software Engineer in SUSE Linux Enterprise  Used

    to work for SUSE on LibreOffice and OpenOffice  Diverse background  FLOSS enthusiast  Working in free time on various projects including LibreOffice  Document Liberation Project
  3. The Project

  4. History  Launched officially on April 2nd 2014 at 11:00

    UTC  First talk given by the founding members 4 hours later  LibreGraphics Meeting 2014 in Leipzig, Germany  Group working on file-formats within LibreOffice since the beginning  GSoC 2011 – import filter for Visio file-formats (libvisio)  During the year of 2012 – import filter for CorelDraw file-formats (libcdr)  GSoC 2012 – import filter for Microsoft Publisher (libmspub)  And more is to come...
  5. Beyond LibreOffice itself  Clear feeling that this is bigger

    then LibreOffice itself  Feedback from conferences  Approached by other projects with a lot of interest  Reuse by other projects  Inkscape  Calligra  Scribus  A service of the LibreOffice community to the wider FOSS world  We receive  We give back
  6. Philosophy

  7. Ownership of documents  Whose painting is “Independência ou Morte”?

     A) The oil paint producer's  B) The canvas producer's  C) Pedro Américo's
  8. Ownership of documents We believe that documents and their content

    belong to their creators, not software vendors
  9. Access to documents We believe that access to content you

    own should not be hindered by the fact that the application that created it is not maintained any more or that the application does not work on the particular operating system that you use
  10. Role of open standards We believe that use of truly

    open and free standards for encoding digital content is the only long-term guarantee that a user's digital content will never be beholden to a single vendor
  11. Transitory period We believe that implementation of Free and Open

    Source Software that can read proprietary file-formats is the best solution to escape vendor lock during the transition period to truly open and free standards
  12. Our mission

  13. File-format understanding Our mission is to try to understand the

    structure and details of proprietary, undocumented file-formats
  14. FOSS parser implementations Our mission is to use the understanding

    of the file-formats to implement FOSS libraries that are able to parse such documents and extract as much information as possible from them
  15. ODF eco-system Our mission is to use our existing framework

    to encode this data in a truly free and open standard file-format: the Open Document Format
  16. The boring specifics

  17. Introspection tools  OLEToy  Introspection of different file-formats 

    We do NOT produce documentation  Here we encode the file-format knowledge  Colupatr  Hexadecimal editor on steroids  Variable length lines  Scripting support
  18. Cool feature everybody envies  Binary diff

  19. Software Framework  librevenge  APIs and general-use types 

    libodfgen  Generators of Open Document files from librevenge APIs  Parser libraries  Libwpd, libwpg, libvisio, libcdr, libmspub, libetonyek,...  Parsing file-format  Processing information  writerperfect  Command-line tools to convert to ODF
  20. librevenge::RVNGDrawingInterface virtual void startDocument (const RVNGPropertyList &propList) = 0; virtual

    void endDocument () = 0; virtual void startGraphics (const RVNGPropertyList &propList) = 0; virtual void endGraphics () = 0; virtual void setStyle (const RVNGPropertyList &propList) = 0; virtual void startLayer (const RVNGPropertyList &propList) = 0; virtual void endLayer () = 0; virtual void startEmbeddedGraphics (const RVNGPropertyList &propList) = 0; virtual void endEmbeddedGraphics () = 0; virtual void drawRectangle (const RVNGPropertyList& propList) = 0; virtual void drawEllipse (const RVNGPropertyList& propList) = 0; virtual void drawPolygon (const RVNGPropertyListVector &vertices) = 0; virtual void drawPolyline (const RVNGPropertyListVector &vertices) = 0; virtual void drawPath (const RVNGPropertyListVector &path) = 0; virtual void drawGraphicObject (const RVNGPropertyList &propList) = 0; virtual void startTextObject (const RVNGPropertyList &propList) = 0; virtual void endTextObject () = 0; virtual void openParagraph (const RVNGPropertyList &propList) = 0; virtual void closeParagraph () = 0; virtual void openSpan (const RVNGPropertyList &propList) = 0; virtual void closeSpan () = 0; virtual void insertText (const RVNGString &str) = 0;  Callback examples
  21. librevenge::RVNGTextInterface virtual void startDocument (const RVNGPropertyList &propList) = 0; virtual

    void endDocument () = 0; virtual void definePageStyle (const RVNGPropertyList &propList) = 0; virtual void openPageSpan (const RVNGPropertyList &propList) = 0; virtual void closePageSpan () = 0; virtual void openHeader (const RVNGPropertyList &propList) = 0; virtual void closeHeader () = 0; virtual void openFooter (const RVNGPropertyList &propList) = 0; virtual void closeFooter () = 0; virtual void defineParagraphStyle (const RVNGPropertyList &propList) = 0; virtual void openParagraph (const RVNGPropertyList &propList) = 0; virtual void closeParagraph () = 0; virtual void defineCharacterStyle (const RVNGPropertyList &propList) = 0; virtual void openSpan (const RVNGPropertyList &propList) = 0; virtual void closeSpan () = 0; virtual void defineSectionStyle (const RVNGPropertyList &propList) = 0; virtual void openSection (const RVNGPropertyList &propList) = 0; virtual void closeSection () = 0; virtual void insertTab () = 0; virtual void insertSpace () = 0; virtual void insertText (const RVNGString &text) = 0; virtual void insertLineBreak () = 0; virtual void insertField (const RVNGPropertyList &propList) = 0; virtual void openOrderedListLevel (const RVNGPropertyList &propList) = 0; virtual void openUnorderedListLevel (const RVNGPropertyList &propList) = 0; virtual void closeOrderedListLevel () = 0; virtual void closeUnorderedListLevel () = 0; virtual void openListElement (const RVNGPropertyList &propList) = 0; virtual void closeListElement () = 0; virtual void openFootnote (const RVNGPropertyList &propList) = 0; virtual void closeFootnote () = 0; virtual void openEndnote (const RVNGPropertyList &propList) = 0; virtual void closeEndnote () = 0;  Callback examples
  22. librevenge-stream  RVNGInputStream interface  Virtual interface allowing stream abstraction

     Several implementations:  RVNGFileStream  Implementation using file name  RVNGStringStream  Implementation using a buffer of data  RVNGDirectoryStream  Accesses a directory structure as if it was a structured document  OLE2 and ZIP documents handled transparently  No need to know what is the container type  Gives the responsibility to the implementers!
  23. librevenge-generators  Useful implementations of the different interfaces  Raw

    Generators  Implementations of the different librevenge interfaces  printing callbacks called and properties passed  Used for regression testing  CSV generator for spreadsheets, HTML, Text generators  SVG generators  Exception: SVG generator for drawings  Included in librevenge core library  Historical reasons
  24. libodfgen  Generators for OpenDocument from librevenge interfaces  OdtGenerator

    class  Implementations of RVNGDocumentInterface  OdgGenerator class  Implementation of RVNGDrawingInterface  OdpGenerator class  Implementation of RVNGPresentationInterface  OdsGenerator class  Implementation of RVNGSpreadsheetInterface  OdfDocumentHandler interface  SAX-like interface to output XML in a generic way
  25. writerperfect  Command-line tools linking the components together  RVNGInputStream

    implementation  librevenge-stream  Different ODF generators  libodfgen  Different parser libraries  libvisio, libcdr, libmspub, libetonyek, libwpd, libwpg,....  Generates Open Document files  Flat ODF  Package (zipped) ODF
  26. Advantage of the design  Parser libraries independent and self-contained

     Much easier life of filter writers  Enough to focus on the structure of document to parse  Call the interface callbacks that one needs  Avoid sucking in unrelated libraries  Librevenge itself and libodfgen have only boost as build-time dependency  No need to link text-related libraries in drawing application  Considerable reduction of code duplication  Less risk to have bugs fixed in one place and hanging around in another  Faster to start a library skeleton
  27. I am excited! I want to be part of this!

  28. Ways to contribute  Code development  Contribute to one

    of our existing libraries, or  Start a new one  Understanding and documenting file-formats  OLEToy  Preferred way to visualize documents  Need a bit of knowledge of Python  Preparation of sample documents  Need to access a generating application  Important for regression testing
  29. New libraries for dummies git clone git://git.code.sf.net/p/libwpd/project-generator cd project-generator/ ./project-generator

    -h project-generator <options> <name> [<outputpath>] Options -a, -e and -p are required. The project will be created in <outputpath> or in the current directory, if no <outputpath> was given. General options: -h Show this text. Setting project parameters: -a author Set main author of the library. -c importer Set the name of the public importer class. Default is ProjectDocument. -d description Set project description. Default is empty. -e email Set author e-mail. -p project Set the name of the project. -t tool Set base name for conversion tools (e.g., tool2raw). Default is project2*. -y year Set year. Default is current year. Project kind: -D Create a vector drawing importer -P Create a presentation importer -S Create a spreadsheet importer -T Create a text importer. This is the default.
  30. Demonstration  [If it does not work, blame it on

    everything but yourself] ./project-generator -p libfisl -d "FISL15 Document importer library" -a "Fridrich Strba" -e "fridrich@libreoffice.org" -T -c FISLDocument -t fisl cd libfisl ./autogen.sh && ./configure --prefix=/usr --libdir=/usr/lib64 --enable-debug --disable-werror make -j4
  31. Future file-formats to import?  Google Summer of Code 

    The possibility for a student to work with outstanding mentors  David Tardon  Fridrich Štrba  Валёк Филиппов  Several formats ready for straight engineering  Apple Numbers, Pages  Adobe PageMaker  Zoner Draw
  32. Thank you! www.documentliberation.org @DocLiberation