$30 off During Our Annual Pro Sale. View Details »

Give me my drawing back!

Give me my drawing back!

Dragging your proprietary files to free-sofware world

Fridrich Strba

May 11, 2013
Tweet

More Decks by Fridrich Strba

Other Decks in Technology

Transcript

  1. 1 Give me my drawing back! Fridrich Štrba LibreOffice Developer

    Dragging your proprietary files to free-sofware world
  2. 2 LibreOffice's contribution to wider FOSS eco-system libwpg, libvisio, libcdr

    and libmspub Standalone libraries Using the same interface Standalone framework to generate ODG (libodfgen) Internal class generating SVG for lazy hackers :) Also libwpd, libwps and libmwaw (for text documents) Users outside LibreOffice Inkscape reuses libvisio and libcdr in 0.49 Calligra reuses libvisio and (possibly) libcdr Scribus recently started integration with libmspub More users, more bug reports and (eventually) fixes
  3. Achievements

  4. 4 Visio Import filter - libvisio Google Summer of Code

    2011 Eilidh McAdam (currently Lanedo) Started with Visio 2000 – Visio 2010 file-formats LibreOffice 3.5 release Extended in 2012 to ALL Visio file-format versions that ever existed LibreOffice 4.0 release Visio 2013 (*.vsdx), Visio 1–5, Visio XML Drawings (*.vdx) Stencils/master shapes extraction Used in trunk inkscape (since December 2012)
  5. 5 The team Valentin Filippov Fridrich Štrba Eilidh McAdam

  6. 6 CorelDraw import filter - libcdr Work started in late

    2011 Released in LibreOffice 3.6 An interesting challenge after the success of libvisio Continuation of a fruitful collaboration Support for ALL CorelDraw file-formats Starting from version 1 (code Waldo) Ending by CorelDraw x6 released in March 2012
  7. 7 Microsoft Publisher Import filter - libmspub Google Summer of

    Code 2012 Brennan T. Vincent Flagship feature of LibreOffice 4.0 Version support MS Publisher 97 MS Publisher 98/2000 MS Publisher 2002-2013
  8. 8 Tools - OleToy and colupatr Developed for Reverse-engineering Collaboration

    between reverse and straight engineers RCA for import issues OleToy Support for many proprietary formats Knobs for quick navigation and information gathering Colupatr Hexviewer with variable string length “Hints” for values, comments, format for storing findings
  9. 9 Opensource collaboration (1) Focus on getting “some” result early

    First embedded raster images Libreoffice is able to render them without further processing Next graphic primitives “Everything is just a path” Develop tools along the implementation Introspection tool improved constantly Driven by the need of the implementation Reflecting growing understanding of file-format Don't solve problems that don't exist
  10. 10 Graphic Document Representation namespace libwpg { class WPGPaintInterface {

    public: virtual ~WPGPaintInterface () {} virtual void startGraphics (const ::WPXPropertyList &propList) = 0; virtual void endGraphics () = 0; virtual void setStyle (const ::WPXPropertyList &propList, const ::WPXPropertyListVector &gradient) = 0; virtual void startLayer (const ::WPXPropertyList &propList) = 0; virtual void endLayer () = 0; virtual void startEmbeddedGraphics (const ::WPXPropertyList &propList) = 0; virtual void endEmbeddedGraphics () = 0; virtual void drawRectangle (const ::WPXPropertyList& propList) = 0; virtual void drawEllipse (const ::WPXPropertyList& propList) = 0; virtual void drawPolygon (const ::WPXPropertyListVector &vertices) = 0; virtual void drawPolyline (const ::WPXPropertyListVector &vertices) = 0; virtual void drawPath (const ::WPXPropertyListVector &path) = 0; virtual void drawGraphicObject (const ::WPXPropertyList &propList, const ::WPXBinaryData &binaryData) = 0; virtual void startTextObject (const ::WPXPropertyList &propList, const ::WPXPropertyListVector &path) = 0; virtual void endTextObject () = 0; virtual void startTextLine (const ::WPXPropertyList &propList) = 0; virtual void endTextLine () = 0; virtual void startTextSpan (const ::WPXPropertyList &propList) = 0; virtual void endTextSpan () = 0; virtual void insertText (const ::WPXString &str) = 0; }; } // namespace libwpg
  11. 11 Opensource collaboration (2) Design the software as you go

    Some code is better then abstract design Project financed by Google as part of Summer of code Possibility to find and fix real bugs Little communication overhead Communication by code Learning by doing mistakes and fixing Release soon, release often A release every 2-3 weeks Good to have intermediary targets
  12. Interesting elements: Incremental reverse-engineering

  13. 13 Progressive development of file-formats Nobody reinvents a wheel from

    scratch It is useful to know the release dates of different versions when doing reverse-engineering Two subsequent versions of the same file-format will have many things in common Design parser to be able to parse lower and higher versions Opened version conditions Guard assumptions by exceptions and be verbose in debug mode Try to parse lower or higher version using the existing parser Fix issues as they appear Importance of a small number of reference documents covering many features
  14. 14 Extending the CorelDraw version coverage (1) Departing point Support

    for versions 7 to x3 Extending the coverage upwards x4 and x5 Support for RIFF documents inside structured ZIP storage RIFF stream is just stored, so possible to parse without any zip implementation x6 More complicated structure inside the ZIP storage
  15. 15 Evolution of ZIP-based CDR file-formats (1) Version x4 -rw----

    4.5 fat 2610 bx stor 12-Mar-05 05:19 content/riffData.cdr -rw---- 4.5 fat 196662 bx defN 12-Mar-05 05:19 metadata/thumbnails/thumbnail.bmp -rw---- 4.5 fat 184374 bx defN 12-Mar-05 05:19 metadata/thumbnails/page1.bmp -rw---- 4.5 fat 5860 bx defN 12-Mar-05 05:19 metadata/metadata.xml -rw---- 4.5 fat 667 bx defN 12-Mar-05 05:19 metadata/textinfo.xml -rw---- 4.5 fat 53 bx defN 12-Mar-05 05:19 links.xml Version x5 -rw---- 4.5 fat 2608 bx stor 12-Mar-05 05:19 content/riffData.cdr -rw---- 4.5 fat 196662 bx defN 12-Mar-05 05:19 metadata/thumbnails/thumbnail.bmp -rw---- 4.5 fat 184374 bx defN 12-Mar-05 05:19 metadata/thumbnails/page1.bmp -rw---- 4.5 fat 252 bx defN 12-Mar-05 05:19 color/color.xml -rw---- 4.5 fat 5860 bx defN 12-Mar-05 05:19 metadata/metadata.xml -rw---- 4.5 fat 667 bx defN 12-Mar-05 05:19 metadata/textinfo.xml -rw---- 4.5 fat 103 bx defN 12-Mar-05 05:19 color/docPalette.xml -rw---- 4.5 fat 53 bx defN 12-Mar-05 05:19 links.xml
  16. 16 Evolution of ZIP-based CDR file-formats (2) Version x6 -rw----

    4.5 fat 14315 bx defN 12-Mar-05 05:18 content/data/data1.dat -rw---- 4.5 fat 445 bx defN 12-Mar-05 05:18 content/data/masterPage.dat -rw---- 4.5 fat 3355 bx defN 12-Mar-05 05:18 content/data/page1.dat -rw---- 4.5 fat 34 bx defN 12-Mar-05 05:18 content/dataFileList.dat -rw---- 4.5 fat 1332 bx defN 12-Mar-05 05:18 content/root.dat -rw---- 4.5 fat 196662 bx defN 12-Mar-05 05:18 metadata/thumbnails/thumbnail.bmp -rw---- 4.5 fat 184374 bx defN 12-Mar-05 05:18 metadata/thumbnails/page1.bmp -rw---- 4.5 fat 252 bx defN 12-Mar-05 05:18 color/color.xml -rw---- 4.5 fat 5860 bx defN 12-Mar-05 05:18 metadata/metadata.xml -rw---- 4.5 fat 667 bx defN 12-Mar-05 05:18 metadata/textinfo.xml -rw---- 4.5 fat 103 bx defN 12-Mar-05 05:18 color/docPalette.xml -rw---- 4.5 fat 15377 bx defN 12-Mar-05 05:18 styles/document.cdss -rw---- 4.5 fat 53 bx defN 12-Mar-05 05:18 links.xml
  17. 17 Extending the CorelDraw version coverage (2) Extending the coverage

    downwards Version 6 (first 32-bit version) Only some RIFF names different Versions 4 and 5 (16-bit versions) Different way to express coordinates Version 3 First RIFF based CDR file-format but we did not know it by then Fill and outline information embedded inside the shape Shape transform does not accumulate group transforms
  18. 18 Extending the CorelDraw version coverage (3) Extending the coverage

    downwards (continued) Versions 2 and 1 Not RIFF based at all Version 2 more structured With some exception handling both can be parsed alike A header with pointers to different sequences of chunks Implementation of linked list (“type 1”) and shape information (“type 2”) Embedded raster (“type 3” and “6”), group transforms (“type 7”), arrow information (“type 8”) Some of the binary content similar to version 3
  19. 19 Reverse-engineering – the never ending story The hypotheses and

    assumptions are challenged with every new document that “does not convert correctly” Even a complex feature document is easily beaten by real life documents. Heise example Not a very good review of the CDR import filter in c't “Ein neuer Import-Filter in Draw öffnet jetzt auch CorelDraw-Dateien, was uns im Test allerdings nur mit sehr einfachen Zeichnungen fehlerfrei gelang. In dieser Form ist er schlicht unbrauchbar.” “A new import filter in Draw opens now also CorelDraw files, which we managed to do without errors only with very simple drawings. In this form, it is rather unusable.”
  20. 20 We hear our users even when they cry in

    the wilderness Before LibreOffice 3.6.2.2 After LibreOffice 3.6.3.1
  21. 21 Extending the Visio version coverage (1) Departing point Versions

    6 and 11 Difference in some offsets and in text encoding Common structure A trailer pointing to “streams” Some “streams” consist in a hierarchical sequence of “chunks” Shapes and text content in “chunks” Bug driven rewrite A document (most likely generated by SDK) Challenged completely our assumptions and led to more generalized parser
  22. 22 Extending the Visio version coverage (2) Microsoft Visio 2013

    Preview We wanted to support it before the official release xml-based (ooxml-ish) file-format (*.vsdx) Another rewrite of the parsers Need to separate more clearly the parsing and information processing Side-effect: support of Visio XML Drawing (*.vdx) Versions 1 to 5 Some “chunks” of type list different An override for readers of some chunks “streams” format very similar Little abstractions and generalizations needed Improved understanding of the file-format Cleaner and simpler parser
  23. Getting involved ... how you can make a difference

  24. 24 Future file-formats to import? Google Summer of Code The

    possibility for a student to work with outstanding mentors Valentin Filippov Your faithful (Altsys, Aldus, Macromedia & Adobe) Freehand File-format partially reverse-engineered The big lines of the structure Ripe to be a successful project A talented student can make difference in LibreOffice
  25. 25 Impact within LibreOffice and the known universe Happy users

    will reward you You will be the hero of the people who can now read their documents... … and they will get on your nerves listing features that are not converted. Users outside LibreOffice Inkscape reuses libvisio and libcdr in 0.49 Calligra reuses libvisio and (possibly) libcdr since 2.5 Scribus will reuse libmspub (a scribus developer contributes patches to libmspub)
  26. 26 All text and image content in this document is

    licensed under the Creative Commons Attribution-Share Alike 3.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. QA and Stoning session