Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Status of the Writer import/export filters

Miklos V
September 25, 2013
390

Status of the Writer import/export filters

Miklos V

September 25, 2013
Tweet

Transcript

  1. Status of the Writer
    import/export filters
    Miklós Vajna
    2013­09­25

    View Slide

  2. Introduction

    View Slide

  3. 3 / 25 LibreOffice Conference 2013 | Miklós Vajna
    How to measure an import filter?
    ● Target is to handle every feature that the given
    format can describe
    ● We want numbers
    ● Number of … handled:
    ● XML tags
    ● Single PRoperty Modifiers
    ● Control words
    ● Shape properties

    View Slide

  4. 4 / 25 LibreOffice Conference 2013 | Miklós Vajna
    How to measure an export filter?
    ● Target is to handle every Writer feature
    ● Checklist, e.g. vmiklos01_rtf_status.ods
    ● UNO filter unhandled properties

    ● Internal filter unhandled SfxPoolItems

    ● Word export unimplemented methods

    View Slide

  5. 5 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Is this optimal?
    ● No, because:
    ● Even if pictures and tables are handled,
    possibly pictures inside tables need explicit
    support
    ● Not every item (XML tag, SPRM, etc.) are
    equally important
    ● Still better than having no numbers at all
    ● Scripts used: swfilters­loconf­milan­2k13.txt

    View Slide

  6. Status of the ODT / ODF filter

    View Slide

  7. 7 / 25 LibreOffice Conference 2013 | Miklós Vajna
    ODF import status
    ● ODF is XML­based, so has a schema
    ● Almost everything is imported
    ● As ODF was OOoXML before, with little
    differences
    ● No domain problems
    ● Writer and ODF concepts are really close to each
    other
    ● Main task is to keep it up to date

    View Slide

  8. 8 / 25 LibreOffice Conference 2013 | Miklós Vajna
    ODF export status
    ● Simplified architecture:
    ● Iterates over paragraphs, text portions, properties
    ● If something is not exported unhandled property

    ● Current status is near perfect (apart from occasional
    bugs)
    ● Keep it up to date
    ● In an ideal case a new property is 3 lines to handle

    View Slide

  9. 9 / 25 LibreOffice Conference 2013 | Miklós Vajna
    LibreOffice­specific extensions
    ● When we add new features, we serialize
    those from/to ODF, see
    ODF_Implementer_Notes
    ● Each such commit is turned into an ODF
    proposal before we hit a release
    ● Still, a slow process
    ● ODF 1.2 is not yet an ISO standard e.g.

    View Slide

  10. 10 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Intentional OOo/LO­specific data
    ● settings.xml's
    ● e.g. ooo:configuration­settings
    ● Contains application­specific configuration
    – e.g. Writer layout compat settings
    ● Macros: script:language="ooo:StarBasic" …>

    View Slide

  11. 11 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Summary of ODF filter status
    ● Export is lossless
    ● Modulo the reported bugs in Bugzilla ;­)
    ● What we export is imported lossless as well
    ● Other: reasonably
    ● If only the above would be true for other
    filters as well

    View Slide

  12. Status of DOCX / OOXML filter

    View Slide

  13. 13 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Introduction
    ● OOXML may mean lots of things, here:
    ● OOXML, as in what Word does (there is also an
    ISO version)
    ● OOXML is DOCX, XLSX, PPTX, drawingml, VML,
    etc. – here: wordprocessingml
    ● Import is older, from­scratch
    ● Export is newer (only in LO), re­using the
    DOC export

    View Slide

  14. 14 / 25 LibreOffice Conference 2013 | Miklós Vajna
    OOXML import
    ● Two parts: tokenizer and domain mapper
    ● Problems to be solved by tokenizer:
    ● XML element token

    ● RTF control word token

    ● Problems to be solved by the domain
    mapper:
    ● Section Page style

    View Slide

  15. 15 / 25 LibreOffice Conference 2013 | Miklós Vajna
    OOXML export
    ● Shared Word export: reused domain
    mapper
    ● Still not perfect due to domain problems
    ● e.g. textframes: writer and draw textframes ↔
    Word old­style frames and rectangle shapes

    View Slide

  16. 16 / 25 LibreOffice Conference 2013 | Miklós Vajna
    OOXML numbers
    ● # of XML elements in OfficeOpenXML­
    XMLSchema­Strict.zip / wml.xsd: 546
    ● # of at least tokenized elements: 461
    ● # of unhandled elements: 85
    ● Looks pretty good, but remember: this
    does not measure the coverage of
    element attributes, combinations, etc.

    View Slide

  17. Status of DOC / WW8 filter

    View Slide

  18. 18 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Introduction
    ● DOC is usually WW8 (Word 97­2003)
    ● Probably the oldest Writer filter
    ● What's possible wrt. Writer Word (up to

    2003) handling is mostly already done
    ● Still, not as up to date as it could be
    ● e.g. commented text ranges are not handled

    View Slide

  19. 19 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Import / export status
    ● Import:
    ● Could measure unhandled FIB entries – rare problem
    ● Can measure SPRM's – more useful
    ● Export:
    ● The export framework is from the WW8 filter
    ● All methods are implemented, not useful to check
    ● Unhandled SfxPoolItems – better to measure this

    View Slide

  20. 20 / 25 LibreOffice Conference 2013 | Miklós Vajna
    WW8 numbers
    ● Specification has character (85),
    paragraph (93), table (80), section (59)
    and picture (8) SPRM's 325 in total

    ● Number of handled SPRMS: 318
    ● Handling of some may be still problematic,
    but again – looks pretty good

    View Slide

  21. Status of the RTF filter

    View Slide

  22. 22 / 25 LibreOffice Conference 2013 | Miklós Vajna
    What to measure in import
    ● RTF has control words:
    ● A control word can be a flag, a destination, a
    symbol, a toggle or a value
    ● Shape import is Writer­specific as well
    (unlike ODF/OOXML/WW8)
    ● Same for math import

    View Slide

  23. 23 / 25 LibreOffice Conference 2013 | Miklós Vajna
    What to measure in export
    ● Uses the shared Word exporter can

    check unimplemented methods (almost
    none)
    ● Control words: as useful as XML tags in the
    OOXML case
    ● Checklist

    View Slide

  24. 24 / 25 LibreOffice Conference 2013 | Miklós Vajna
    RTF numbers
    ● # of control words from the spec: 1821
    ● # of LO­specific RTF control words: 4
    ● obsolete, only there for compatibility reasons
    ● # of handled control words on import: 575
    ● Does not cover e.g. shape properties
    ● # of handled control words on export: 368
    ● Math status is on par with OOXML
    ● Easier, due to 1:1 mapping

    View Slide

  25. 25 / 25 LibreOffice Conference 2013 | Miklós Vajna
    Questions?
    ● Anyone?
    Slides: http://vmiklos.hu/odp

    View Slide