and libmspub Standalone libraries Using the same interface Standalone framework to generate ODG (libodfgen) Internal class generating SVG for lazy hackers :) Also libwpd, libwps and libmwaw (for text documents) Users outside LibreOffice Inkscape reuses libvisio and libcdr in 0.49 Calligra reuses libvisio and (possibly) libcdr Scribus recently started integration with libmspub More users, more bug reports and (eventually) fixes
2011 Eilidh McAdam (currently Lanedo) Started with Visio 2000 – Visio 2010 file-formats LibreOffice 3.5 release Extended in 2012 to ALL Visio file-format versions that ever existed LibreOffice 4.0 release Visio 2013 (*.vsdx), Visio 1–5, Visio XML Drawings (*.vdx) Stencils/master shapes extraction Used in trunk inkscape (since December 2012)
2011 Released in LibreOffice 3.6 An interesting challenge after the success of libvisio Continuation of a fruitful collaboration Support for ALL CorelDraw file-formats Starting from version 1 (code Waldo) Ending by CorelDraw x6 released in March 2012
between reverse and straight engineers RCA for import issues OleToy Support for many proprietary formats Knobs for quick navigation and information gathering Colupatr Hexviewer with variable string length “Hints” for values, comments, format for storing findings
First embedded raster images Libreoffice is able to render them without further processing Next graphic primitives “Everything is just a path” Develop tools along the implementation Introspection tool improved constantly Driven by the need of the implementation Reflecting growing understanding of file-format Don't solve problems that don't exist
Some code is better then abstract design Project financed by Google as part of Summer of code Possibility to find and fix real bugs Little communication overhead Communication by code Learning by doing mistakes and fixing Release soon, release often A release every 2-3 weeks Good to have intermediary targets
scratch It is useful to know the release dates of different versions when doing reverse-engineering Two subsequent versions of the same file-format will have many things in common Design parser to be able to parse lower and higher versions Opened version conditions Guard assumptions by exceptions and be verbose in debug mode Try to parse lower or higher version using the existing parser Fix issues as they appear Importance of a small number of reference documents covering many features
for versions 7 to x3 Extending the coverage upwards x4 and x5 Support for RIFF documents inside structured ZIP storage RIFF stream is just stored, so possible to parse without any zip implementation x6 More complicated structure inside the ZIP storage
downwards Version 6 (first 32-bit version) Only some RIFF names different Versions 4 and 5 (16-bit versions) Different way to express coordinates Version 3 First RIFF based CDR file-format but we did not know it by then Fill and outline information embedded inside the shape Shape transform does not accumulate group transforms
downwards (continued) Versions 2 and 1 Not RIFF based at all Version 2 more structured With some exception handling both can be parsed alike A header with pointers to different sequences of chunks Implementation of linked list (“type 1”) and shape information (“type 2”) Embedded raster (“type 3” and “6”), group transforms (“type 7”), arrow information (“type 8”) Some of the binary content similar to version 3
assumptions are challenged with every new document that “does not convert correctly” Even a complex feature document is easily beaten by real life documents. Heise example Not a very good review of the CDR import filter in c't “Ein neuer Import-Filter in Draw öffnet jetzt auch CorelDraw-Dateien, was uns im Test allerdings nur mit sehr einfachen Zeichnungen fehlerfrei gelang. In dieser Form ist er schlicht unbrauchbar.” “A new import filter in Draw opens now also CorelDraw files, which we managed to do without errors only with very simple drawings. In this form, it is rather unusable.”
6 and 11 Difference in some offsets and in text encoding Common structure A trailer pointing to “streams” Some “streams” consist in a hierarchical sequence of “chunks” Shapes and text content in “chunks” Bug driven rewrite A document (most likely generated by SDK) Challenged completely our assumptions and led to more generalized parser
Preview We wanted to support it before the official release xml-based (ooxml-ish) file-format (*.vsdx) Another rewrite of the parsers Need to separate more clearly the parsing and information processing Side-effect: support of Visio XML Drawing (*.vdx) Versions 1 to 5 Some “chunks” of type list different An override for readers of some chunks “streams” format very similar Little abstractions and generalizations needed Improved understanding of the file-format Cleaner and simpler parser
possibility for a student to work with outstanding mentors Valentin Filippov Your faithful (Altsys, Aldus, Macromedia & Adobe) Freehand File-format partially reverse-engineered The big lines of the structure Ripe to be a successful project A talented student can make difference in LibreOffice
will reward you You will be the hero of the people who can now read their documents... … and they will get on your nerves listing features that are not converted. Users outside LibreOffice Inkscape reuses libvisio and libcdr in 0.49 Calligra reuses libvisio and (possibly) libcdr since 2.5 Scribus will reuse libmspub (a scribus developer contributes patches to libmspub)
licensed under the Creative Commons Attribution-Share Alike 3.0 License (unless otherwise specified). "LibreOffice" and "The Document Foundation" are registered trademarks. Their respective logos and icons are subject to international copyright laws. The use of these therefore is subject to the trademark policy. QA and Stoning session