Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Status of the Writer import/export filters

Miklos V
September 25, 2013
400

Status of the Writer import/export filters

Miklos V

September 25, 2013
Tweet

Transcript

  1. 3 / 25 LibreOffice Conference 2013 | Miklós Vajna How

    to measure an import filter? • Target is to handle every feature that the given format can describe • We want numbers • Number of … handled: • XML tags • Single PRoperty Modifiers • Control words • Shape properties
  2. 4 / 25 LibreOffice Conference 2013 | Miklós Vajna How

    to measure an export filter? • Target is to handle every Writer feature • Checklist, e.g. vmiklos01_rtf_status.ods • UNO filter unhandled properties → • Internal filter unhandled SfxPoolItems → • Word export unimplemented methods →
  3. 5 / 25 LibreOffice Conference 2013 | Miklós Vajna Is

    this optimal? • No, because: • Even if pictures and tables are handled, possibly pictures inside tables need explicit support • Not every item (XML tag, SPRM, etc.) are equally important • Still better than having no numbers at all • Scripts used: swfilters­loconf­milan­2k13.txt
  4. 7 / 25 LibreOffice Conference 2013 | Miklós Vajna ODF

    import status • ODF is XML­based, so has a schema • Almost everything is imported • As ODF was OOoXML before, with little differences • No domain problems • Writer and ODF concepts are really close to each other • Main task is to keep it up to date
  5. 8 / 25 LibreOffice Conference 2013 | Miklós Vajna ODF

    export status • Simplified architecture: • Iterates over paragraphs, text portions, properties • If something is not exported unhandled property → • Current status is near perfect (apart from occasional bugs) • Keep it up to date • In an ideal case a new property is 3 lines to handle
  6. 9 / 25 LibreOffice Conference 2013 | Miklós Vajna LibreOffice­specific

    extensions • When we add new features, we serialize those from/to ODF, see ODF_Implementer_Notes • Each such commit is turned into an ODF proposal before we hit a release • Still, a slow process • ODF 1.2 is not yet an ISO standard e.g.
  7. 10 / 25 LibreOffice Conference 2013 | Miklós Vajna Intentional

    OOo/LO­specific data • settings.xml's <config:config­item­set> • e.g. ooo:configuration­settings • Contains application­specific configuration – e.g. Writer layout compat settings • Macros: <script:event­listener script:language="ooo:StarBasic" …>
  8. 11 / 25 LibreOffice Conference 2013 | Miklós Vajna Summary

    of ODF filter status • Export is lossless • Modulo the reported bugs in Bugzilla ;­) • What we export is imported lossless as well • Other: reasonably • If only the above would be true for other filters as well
  9. 13 / 25 LibreOffice Conference 2013 | Miklós Vajna Introduction

    • OOXML may mean lots of things, here: • OOXML, as in what Word does (there is also an ISO version) • OOXML is DOCX, XLSX, PPTX, drawingml, VML, etc. – here: wordprocessingml • Import is older, from­scratch • Export is newer (only in LO), re­using the DOC export
  10. 14 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML

    import • Two parts: tokenizer and domain mapper • Problems to be solved by tokenizer: • XML element token → • RTF control word token → • Problems to be solved by the domain mapper: • Section Page style →
  11. 15 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML

    export • Shared Word export: reused domain mapper • Still not perfect due to domain problems • e.g. textframes: writer and draw textframes ↔ Word old­style frames and rectangle shapes
  12. 16 / 25 LibreOffice Conference 2013 | Miklós Vajna OOXML

    numbers • # of XML elements in OfficeOpenXML­ XMLSchema­Strict.zip / wml.xsd: 546 • # of at least tokenized elements: 461 • # of unhandled elements: 85 • Looks pretty good, but remember: this does not measure the coverage of element attributes, combinations, etc.
  13. 18 / 25 LibreOffice Conference 2013 | Miklós Vajna Introduction

    • DOC is usually WW8 (Word 97­2003) • Probably the oldest Writer filter • What's possible wrt. Writer Word (up to ↔ 2003) handling is mostly already done • Still, not as up to date as it could be • e.g. commented text ranges are not handled
  14. 19 / 25 LibreOffice Conference 2013 | Miklós Vajna Import

    / export status • Import: • Could measure unhandled FIB entries – rare problem • Can measure SPRM's – more useful • Export: • The export framework is from the WW8 filter • All methods are implemented, not useful to check • Unhandled SfxPoolItems – better to measure this
  15. 20 / 25 LibreOffice Conference 2013 | Miklós Vajna WW8

    numbers • Specification has character (85), paragraph (93), table (80), section (59) and picture (8) SPRM's 325 in total → • Number of handled SPRMS: 318 • Handling of some may be still problematic, but again – looks pretty good
  16. 22 / 25 LibreOffice Conference 2013 | Miklós Vajna What

    to measure in import • RTF has control words: • A control word can be a flag, a destination, a symbol, a toggle or a value • Shape import is Writer­specific as well (unlike ODF/OOXML/WW8) • Same for math import
  17. 23 / 25 LibreOffice Conference 2013 | Miklós Vajna What

    to measure in export • Uses the shared Word exporter can → check unimplemented methods (almost none) • Control words: as useful as XML tags in the OOXML case • Checklist
  18. 24 / 25 LibreOffice Conference 2013 | Miklós Vajna RTF

    numbers • # of control words from the spec: 1821 • # of LO­specific RTF control words: 4 • obsolete, only there for compatibility reasons • # of handled control words on import: 575 • Does not cover e.g. shape properties • # of handled control words on export: 368 • Math status is on par with OOXML • Easier, due to 1:1 mapping
  19. 25 / 25 LibreOffice Conference 2013 | Miklós Vajna Questions?

    • Anyone? Slides: http://vmiklos.hu/odp