Good news: file filters is one area which can be easily unit-tested in most cases ‒ Compared to layout or UI ‒ So it's not an endless fixing wrt a particular problem • Bad news: in most cases it's about modifying the source code ‒ In rare cases you can work around by modifying the input file
the situation when your problem is caused by an import or export filter • Good examples: ‒ Stroke weight of the line inside a group shape is too large, when importing from DOCX ‒ This document is supposed to be of a single page, not two • Bad examples: ‒ The imported document causes a layout loop ‒ Writer doesn't support a particular feature, which is supported by the given format
export • “Open” on the UI: Import to an empty document, then reset the undo stack • “Copy and paste” on the UI: partial export, followed by an import to an existing document • “Save” on the UI: exporting to an already existing path • Explains: ‒ Why a single character modification totally rewrites the file ‒ Why it's not possible to extract the “conversion machine” from LibreOffice (but: we have a headless mode)
and export supposed to be loss-less • ODF semantics are very close to Writer document model: ‒ Example for paragraphs: UNO properties ↔ XML attributes • Most of the implementation is an UNO filter ‒ Can serve as a good example for other filters • Code under xmloff/ and sw/source/filter/xml/ • ODF validator: ‒ http://odf-validator2.rhcloud.com/odf-validator2/
like HTML, but supports all word processing features (page size, columns, etc.) ‒ Can be hard to read • Export ‒ New in LibreOffice 3.3 ‒ Internal filter, core shared with DOC/DOCX • Import ‒ New in LibreOffice 3.5 ‒ UNO filter, domain mapper shared with DOCX • Mostly my fault
‒ Not counting binfilter • Both import and export are internal filters • Specification is available as [MS-DOC] • Tokenizer and domain mapper is not separated • For tokenizer problems, mso-dumper can help: ‒ http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper/ • Import/export somewhat shared
Example: left/right or start/end for paragraph margins • Import is older ‒ Over-engineered in writerfilter/ ‒ XSLT generates the tokenizer code, challenging to debug ‒ Inherited from OpenOffice.org, UNO-based ‒ Domain mapper shared with RTF • Export is LibreOffice-only ‒ Internal ‒ Shared with DOC/RTF
can handle the file ‒ CVE tests ‒ If the filter provides the expected return value, we're good • Internal tests ‒ Provide access to private Writer symbols ‒ Handy to test methods used by the UI ‒ In most cases not needed by filter tests
• Import ‒ Load the file, then assert the accessed UNO document model • Export ‒ Import → export → import ‒ This way the same API can be used for tests, and export is tested as well ‒ Alternative: building the document from code, then somehow check the result (XPath for XML-based formats, but what about the rest?) ‒ Drawback: import should be fine ‒ Not a bad thing anyway
of the header in page style “Default” ‒ But: text in the header on page 3 • Sometimes handy, but be careful ‒ Writer layout is partly counted in the idle, tests won't wait for that ‒ Layout may be OK to differ ‒ E.g. missing fonts
content in this document, unless otherwise specified, is licensed under the Creative Commons Attribution-Share Alike 3.0 License . This does not include the LibreOffice name, logo, or icon. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States.