Slide 1

Slide 1 text

LGM 2012 - LibreOffice Graphic Import filters LibreOffice import filters for vector graphic formats The fun of reverse- and straight engineering Fridrich Štrba – [email protected] The Document Foundation Software Engineer, SUSE

Slide 2

Slide 2 text

2 LGM 2012 - LibreOffice Graphic Import filters Who Am I? Software Engineer in SUSE LibreOffice Team Diverse background FLOSS enthusiast Working in free time on various projects

Slide 3

Slide 3 text

3 LGM 2012 - LibreOffice Graphic Import filters Agenda Vector graphic import filters resulting from the work of LibreOffice community How we did it Framework used Missing file-format documentation Collaboration patterns Incremental reverse-engineering

Slide 4

Slide 4 text

4 LGM 2012 - LibreOffice Graphic Import filters Why do we handle legacy file-formats?

Slide 5

Slide 5 text

5 LGM 2012 - LibreOffice Graphic Import filters Legacy formats out there ODF is the future of the humanity Nevertheless, humanity does not know about it as of now Other de facto standards Some people use other Office Suites and graphic applications :( Hard-disks full of bad teenage poetry and indecent drawings in funny formats LibreOffice offers the freedom to read that pile of …

Slide 6

Slide 6 text

6 LGM 2012 - LibreOffice Graphic Import filters Pure intellectual exercise Allows to program for LibreOffice without having to understand its internals Pretty stand-alone functionality communicating with LibreOffice over well defined interfaces … … almost Happy users will reward you You will be the hero of the people who can now read their documents... … and they will get on your nerves listing features that are not converted.

Slide 7

Slide 7 text

7 LGM 2012 - LibreOffice Graphic Import filters Import filters available to all resulting from the work of LibreOffice community

Slide 8

Slide 8 text

8 LGM 2012 - LibreOffice Graphic Import filters Vector graphics import filters based on libwpg WordPerfect Graphics import filter and libwpg Started by Marc Oude-Kotte and yours faithful Google Summer of Code by Ariya Hidayat in 2006 MS Visio import filter and libvisio Google Summer of Code by Eilidh McAdam in 2011 Guest appearance of re-lab's Valek Filippov Corel Draw import filter and libcdr Work in progress (kind of) started basically at the end of 2011 Will be in LibreOffice 3.6 Check http://dev-builds.libreoffice.org for preview fun

Slide 9

Slide 9 text

9 LGM 2012 - LibreOffice Graphic Import filters Future directions I prefer to speak about future when it becomes a feature Too many projects with declarations of intentions and nothing at the arrival Code speaks louder then press releases Google Summer of Code 2012 An attempt at MS Publisher file-format Valek's personal pet file-format Macromedia Freehand Trying to crowd-source the development

Slide 10

Slide 10 text

10 LGM 2012 - LibreOffice Graphic Import filters How we did it?

Slide 11

Slide 11 text

11 LGM 2012 - LibreOffice Graphic Import filters Minimize the count of reinvented wheels Reuse, embrace and extend ODF as interchange format Way import filters communicate with LibreOffice libwpg's application programming interface Reusing OdgGenerator class implementing this interface Speedy development No need to write any boilerplate code LibreOffice import filter itself about 100 LOC The core written as a standalone library Faster testing

Slide 12

Slide 12 text

12 LGM 2012 - LibreOffice Graphic Import filters Graphic Document Representation namespace libwpg { class WPGPaintInterface { public: virtual ~WPGPaintInterface () {} virtual void startGraphics (const ::WPXPropertyList &propList) = 0; virtual void endGraphics () = 0; virtual void setStyle (const ::WPXPropertyList &propList, const ::WPXPropertyListVector &gradient) = 0; virtual void startLayer (const ::WPXPropertyList &propList) = 0; virtual void endLayer () = 0; virtual void startEmbeddedGraphics (const ::WPXPropertyList &propList) = 0; virtual void endEmbeddedGraphics () = 0; virtual void drawRectangle (const ::WPXPropertyList& propList) = 0; virtual void drawEllipse (const ::WPXPropertyList& propList) = 0; virtual void drawPolygon (const ::WPXPropertyListVector &vertices) = 0; virtual void drawPolyline (const ::WPXPropertyListVector &vertices) = 0; virtual void drawPath (const ::WPXPropertyListVector &path) = 0; virtual void drawGraphicObject (const ::WPXPropertyList &propList, const ::WPXBinaryData &binaryData) = 0; virtual void startTextObject (const ::WPXPropertyList &propList, const ::WPXPropertyListVector &path) = 0; virtual void endTextObject () = 0; virtual void startTextLine (const ::WPXPropertyList &propList) = 0; virtual void endTextLine () = 0; virtual void startTextSpan (const ::WPXPropertyList &propList) = 0; virtual void endTextSpan () = 0; virtual void insertText (const ::WPXString &str) = 0; }; } // namespace libwpg

Slide 13

Slide 13 text

13 LGM 2012 - LibreOffice Graphic Import filters Key Classes OdgGenerator.?xx Implementation of libwpg::WPGPaintInterface OdfDocumentHandler.hxx Abstract SAX interface to receive ODF document Code that serializes the SAX calls into file (flat ODF and zip- based ODF) *SVGGenerator.?xx Each library has an internal SVG Generator (suboptimal) New libwpg will make the SVG Generator part of the public API

Slide 14

Slide 14 text

14 LGM 2012 - LibreOffice Graphic Import filters Advantages Generating ODF is not trivial Settings Styles Automatic styles Content Provide a linear interface Reuse instead of copying the existing ODF generators Developer does not waste time designing interface Speeds up development by focusing on the essentials

Slide 15

Slide 15 text

15 LGM 2012 - LibreOffice Graphic Import filters File-format documentation Almost none For libvisio Marginally useful user and developer documentation of MSDN Possibility to save using the VDX (xml) file-format Basically XML dump of the binary (the same concepts) For libcdr Document explaining a bit the CMX exchange format (similar concepts) Reverse engineering Based on re-lab's work oletoy

Slide 16

Slide 16 text

16 LGM 2012 - LibreOffice Graphic Import filters Development method Focus on getting “some” result early First embedded raster images Libreoffice is able to render them without further processing Next graphic primitives “Everything is just a path” Develop tools along the implementation Introspection tool improved constantly Driven by the need of the implementation Reflecting growing understanding of file-format Don't solve problems that don't exist

Slide 17

Slide 17 text

17 LGM 2012 - LibreOffice Graphic Import filters The team Valentin Filippov Fridrich Štrba Eilidh McAdam

Slide 18

Slide 18 text

18 LGM 2012 - LibreOffice Graphic Import filters Development method II Design the software as you go Some code is better then an abstract design Possibility to find and fix real bugs Little communication overhead Communication by code in git Learning by doing mistakes and fixing Release soon, release often A release every 2-3 weeks Good to have intermediary targets

Slide 19

Slide 19 text

19 LGM 2012 - LibreOffice Graphic Import filters Extending the file-format coverage Incremental reverse-engineering Nobody reinvents a wheel completely Two subsequent versions of the same file-format will have some common DNA Try to parse lower or higher version using the existing parser Fix issues as they appear Importance of a small number of reference documents covering many features Importance of having a working parser for other versions Experience makes difference Different ways of encoding the same information Different ways of structuring

Slide 20

Slide 20 text

20 LGM 2012 - LibreOffice Graphic Import filters All text and image content in this document is licensed under the Creative Commons Attribution-Share Alike 3.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. Q&A and Stoning session