Slide 1

Slide 1 text

Status of the Writer import/export filters Miklós Vajna 2013­09­25

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

3 / 25 LibreOffice Conference 2013 | Miklós Vajna How to measure an import filter? ● Target is to handle every feature that the given format can describe ● We want numbers ● Number of … handled: ● XML tags ● Single PRoperty Modifiers ● Control words ● Shape properties

Slide 4

Slide 4 text

4 / 25 LibreOffice Conference 2013 | Miklós Vajna How to measure an export filter? ● Target is to handle every Writer feature ● Checklist, e.g. vmiklos01_rtf_status.ods ● UNO filter unhandled properties → ● Internal filter unhandled SfxPoolItems → ● Word export unimplemented methods →

Slide 5

Slide 5 text

5 / 25 LibreOffice Conference 2013 | Miklós Vajna Is this optimal? ● No, because: ● Even if pictures and tables are handled, possibly pictures inside tables need explicit support ● Not every item (XML tag, SPRM, etc.) are equally important ● Still better than having no numbers at all ● Scripts used: swfilters­loconf­milan­2k13.txt

Slide 6

Slide 6 text

Status of the ODT / ODF filter

Slide 7

Slide 7 text

7 / 25 LibreOffice Conference 2013 | Miklós Vajna ODF import status ● ODF is XML­based, so has a schema ● Almost everything is imported ● As ODF was OOoXML before, with little differences ● No domain problems ● Writer and ODF concepts are really close to each other ● Main task is to keep it up to date

Slide 8

Slide 8 text

8 / 25 LibreOffice Conference 2013 | Miklós Vajna ODF export status ● Simplified architecture: ● Iterates over paragraphs, text portions, properties ● If something is not exported unhandled property → ● Current status is near perfect (apart from occasional bugs) ● Keep it up to date ● In an ideal case a new property is 3 lines to handle

Slide 9

Slide 9 text

9 / 25 LibreOffice Conference 2013 | Miklós Vajna LibreOffice­specific extensions ● When we add new features, we serialize those from/to ODF, see ODF_Implementer_Notes ● Each such commit is turned into an ODF proposal before we hit a release ● Still, a slow process ● ODF 1.2 is not yet an ISO standard e.g.

Slide 10

Slide 10 text

10 / 25 LibreOffice Conference 2013 | Miklós Vajna Intentional OOo/LO­specific data ● settings.xml's ● e.g. ooo:configuration­settings ● Contains application­specific configuration – e.g. Writer layout compat settings ● Macros:

Slide 11

Slide 11 text

11 / 25 LibreOffice Conference 2013 | Miklós Vajna Summary of ODF filter status ● Export is lossless ● Modulo the reported bugs in Bugzilla ;­) ● What we export is imported lossless as well ● Other: reasonably ● If only the above would be true for other filters as well

Slide 12

Slide 12 text

Status of DOCX / OOXML filter

Slide 13

Slide 13 text

13 / 25 LibreOffice Conference 2013 | Miklós Vajna Introduction ● OOXML may mean lots of things, here: ● OOXML, as in what Word does (there is also an ISO version) ● OOXML is DOCX, XLSX, PPTX, drawingml, VML, etc. – here: wordprocessingml ● Import is older, from­scratch ● Export is newer (only in LO), re­using the DOC export

Slide 14

Slide 14 text

14 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML import ● Two parts: tokenizer and domain mapper ● Problems to be solved by tokenizer: ● XML element token → ● RTF control word token → ● Problems to be solved by the domain mapper: ● Section Page style →

Slide 15

Slide 15 text

15 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML export ● Shared Word export: reused domain mapper ● Still not perfect due to domain problems ● e.g. textframes: writer and draw textframes ↔ Word old­style frames and rectangle shapes

Slide 16

Slide 16 text

16 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML numbers ● # of XML elements in OfficeOpenXML­ XMLSchema­Strict.zip / wml.xsd: 546 ● # of at least tokenized elements: 461 ● # of unhandled elements: 85 ● Looks pretty good, but remember: this does not measure the coverage of element attributes, combinations, etc.

Slide 17

Slide 17 text

Status of DOC / WW8 filter

Slide 18

Slide 18 text

18 / 25 LibreOffice Conference 2013 | Miklós Vajna Introduction ● DOC is usually WW8 (Word 97­2003) ● Probably the oldest Writer filter ● What's possible wrt. Writer Word (up to ↔ 2003) handling is mostly already done ● Still, not as up to date as it could be ● e.g. commented text ranges are not handled

Slide 19

Slide 19 text

19 / 25 LibreOffice Conference 2013 | Miklós Vajna Import / export status ● Import: ● Could measure unhandled FIB entries – rare problem ● Can measure SPRM's – more useful ● Export: ● The export framework is from the WW8 filter ● All methods are implemented, not useful to check ● Unhandled SfxPoolItems – better to measure this

Slide 20

Slide 20 text

20 / 25 LibreOffice Conference 2013 | Miklós Vajna WW8 numbers ● Specification has character (85), paragraph (93), table (80), section (59) and picture (8) SPRM's 325 in total → ● Number of handled SPRMS: 318 ● Handling of some may be still problematic, but again – looks pretty good

Slide 21

Slide 21 text

Status of the RTF filter

Slide 22

Slide 22 text

22 / 25 LibreOffice Conference 2013 | Miklós Vajna What to measure in import ● RTF has control words: ● A control word can be a flag, a destination, a symbol, a toggle or a value ● Shape import is Writer­specific as well (unlike ODF/OOXML/WW8) ● Same for math import

Slide 23

Slide 23 text

23 / 25 LibreOffice Conference 2013 | Miklós Vajna What to measure in export ● Uses the shared Word exporter can → check unimplemented methods (almost none) ● Control words: as useful as XML tags in the OOXML case ● Checklist

Slide 24

Slide 24 text

24 / 25 LibreOffice Conference 2013 | Miklós Vajna RTF numbers ● # of control words from the spec: 1821 ● # of LO­specific RTF control words: 4 ● obsolete, only there for compatibility reasons ● # of handled control words on import: 575 ● Does not cover e.g. shape properties ● # of handled control words on export: 368 ● Math status is on par with OOXML ● Easier, due to 1:1 mapping

Slide 25

Slide 25 text

25 / 25 LibreOffice Conference 2013 | Miklós Vajna Questions? ● Anyone? Slides: http://vmiklos.hu/odp