Slide 1

Slide 1 text

Writer Training #2 Miklos Vajna 2014­07­07

Slide 2

Slide 2 text

2 / 23 Writer Training | Miklos Vajna Overview ● Writer ● Layout ● Filters ● Testing ● UI ● Help ● Extending ODF ● Editing

Slide 3

Slide 3 text

3 / 23 Writer Training | Miklos Vajna Layout ● Most complex part: ● No easy way to test automatically – Think of missing fonts on test machines ● Document model has only paragraphs, not pages ● One opened document multiple layouts ↔ ● Try it: Window new window → ● Typically single layout: SwRootFrm (root frame) ● Inside: pages – SwPageFrm ● Paragraphs – SwTxtFrm

Slide 4

Slide 4 text

4 / 23 Writer Training | Miklos Vajna Layout

Slide 5

Slide 5 text

5 / 23 Writer Training | Miklos Vajna Layout inside a paragraph ● No more frames:

Slide 6

Slide 6 text

6 / 23 Writer Training | Miklos Vajna Doc. Model layout notification → ● SwModify: kind of a server, e.g. SwTxtNode ● SwClient: the client, e.g. SwTxtFrm ● SwModify SwClient is 1:N ↔ ● SwModify has Modify(SfxPoolItem* pOld, SfxPoolItem *pNew) ● So layout can react without building from scratch ● SwClient can only be registered in one SwModify – But SwClient can have multiple SwDepend (which is an SwClient)

Slide 7

Slide 7 text

7 / 23 Writer Training | Miklos Vajna Related: TextFrames and drawings ● Writer has its own TextFrame ● Can contain anything: tables, columns, fields, etc. ● Does not support advanced drawing features – Like rounded corners ● Drawinglayer (shared) takes care of all other drawings ● Also has a rectangle, with all features one can ever wish – Rounded edges, rotations, etc. – Except it doesn't know about Writer layout, so can't contain fields, etc. ● Problem for Word interop: ● They don't have this code shared, so combining the above two feature list is possible there Writer TextBoxes on master →

Slide 8

Slide 8 text

8 / 23 Writer Training | Miklos Vajna Filters ● Every feature stored in the document model has to be serialized / loaded back to every file format ● Or you loose data ● In practice: ODF should not loose data, the rest should be good enough ● Important filters: ● ODF (.odt) ● OOXML (.docx) ● WW8 (.doc) ● RTF (.rtf) ● Rest: HTML, plain text, etc.

Slide 9

Slide 9 text

9 / 23 Writer Training | Miklos Vajna ODF filter ● If you extend the document model, this has to be updated before the change hits a release ● So users have at least one format which don't loose data for sure ● Mostly uses the UNO API: ● Code under xmloff/ ● Some Writer­specific bits are using the internal API: ● sw/source/filter/xml/ ● Is an open standard, proposals for new features can be submitted

Slide 10

Slide 10 text

10 / 23 Writer Training | Miklos Vajna OOXML: DOCX ● Import: ● Uses the UNO API, code under writerfilter/ ● Tokenizer: – Shared XML parser, model.xml tokens → ● Domain mapper: – Handles the incoming stream of tokens and maps them to UNO API ● Tokenizer dmapper traffic is XML logged: → – cd writerfilter; make ­sr dbglevel=2, then /tmp/test.docx*.XML after load ● Export: ● Shared with RTF/WW8, uses internal API ● sw/source/filter/ww8/docx*

Slide 11

Slide 11 text

11 / 23 Writer Training | Miklos Vajna OOXML: shared parts ● For drawing and other shared parts, writerfilter calls into oox ● VML import: oox/source/vml/ ● VML export: oox/source/export/vmlexport.cxx ● 4.3 / master also supports drawingML: implemented in oox as well ● Also: metadata parsing (author date, etc.) ● Math expressions: both import/export under starmath/ ● starmath/source/ooxml*

Slide 12

Slide 12 text

12 / 23 Writer Training | Miklos Vajna com.sun.star.xml.sax.XFastParser ● DOCX import is a push parser ● Benefit: can implement feature incrementally ● Drawback: XML is text, would have to compare strings a lot slow → ● Solution: we know all the expected strings (namespaces, element names, attribute names, attribute values) ● Register a string id map before parsing → ● Exactly what XFastParser does

Slide 13

Slide 13 text

13 / 23 Writer Training | Miklos Vajna com.sun.star.xml.sax.XFastParser ● Other than being “fast”, how does it work? ● Problem: we don't want a single handler class (startElement, endElement, etc) for the whole document, it would be a God object ● Solution: XFastContextHandler interface ● createFastChildContext() method to handle child contexts can be a different class →

Slide 14

Slide 14 text

14 / 23 Writer Training | Miklos Vajna model.xml ● DOCX tokenizer works by having all its configuration in the model.xml, then generated code does the real work ● Input: XML stream + mapping definitions (model.xml) ● Output: token stream ● XML elements: SPRM tokens, contains Attribute tokens ● XML attributes: Attribute tokens

Slide 15

Slide 15 text

15 / 23 Writer Training | Miklos Vajna model.xml syntax ● Parsed using pattern­matching by XSLT scripts... ● Cleanup of that is in progress ● Concept: ● Take the RNG schema (grammar / defines) ● Add matching resource tags that define the token maps ● Example: framePr

Slide 16

Slide 16 text

16 / 23 Writer Training | Miklos Vajna WW8 (.doc) ● Oldest Writer filter: ● Binfilter was even older, but it's removed ● Import/export somewhat shared ● Uses the internal API ● Code under sw/source/filter/ww8/ ● Shared (doc, xls, ppt) parts: ● filter/source/msfilter/ ● Using doc­dumper may help

Slide 17

Slide 17 text

17 / 23 Writer Training | Miklos Vajna RTF (.rtf) ● Export is shared with DOC/DOCX: ● Code under sw/source/filters/ww8/rtf* ● Import is shared with DOCX: ● Code under writerfilter/source/rtftok/ ● Domain mapper is the same for RTF and DOCX ● Math: ● Import generates OOXML tokens (RTF­specific part is inside the normal RTF tokenizer) ● Export is shared with DOCX: – Code under starmath/source/rtf*

Slide 18

Slide 18 text

18 / 23 Writer Training | Miklos Vajna Testing ● What's easy: filter tests ● Both import / export ● Poke around with xray, then assert the UNO document model ● The rest is more challenging ● We have uwriter, which has access to private sw symbols ● UI tests: uiwriter, it tests the shell

Slide 19

Slide 19 text

19 / 23 Writer Training | Miklos Vajna UI ● Again, shared with other modules where makes sense ● Doesn't use the UNO API ● Input/output for the dialog is an SfxItemSet ● Own toolkit: VCL ● Most dialogs use the GTK .ui format now – Glade is a GUI to edit those ● If have to touch an older dialog: best to convert it first – Doesn't take too much time

Slide 20

Slide 20 text

20 / 23 Writer Training | Miklos Vajna Help ● Lots of help buttons on UI ● Typically every existing dialog has a related help page ● If you add a new UI element, makes sense to spend 5 minutes on updating the related help ● Requires a ­­with­help build ● XML based, also stored in git, just different repo ● Offline / online help is generated from that

Slide 21

Slide 21 text

21 / 23 Writer Training | Miklos Vajna Extending ODF ● ODF is really close to the UNO API what we offer ● Typically 1 UNO property 1 XML attribute in ODF ↔ ● If you extend the UNO API ● Go ahead with updating the ODF filter ● After implementation is ready: ● See https://wiki.documentfoundation.org/Development/ODF_Implementer_Notes#LibreOffice_ODF_extensions ● Submit a proposal to OASIS, so it can be part of the next version of the standard

Slide 22

Slide 22 text

22 / 23 Writer Training | Miklos Vajna Bookmarks ● Wiki: https://wiki.documentfoundation.org/Development/Writer ● New feature checklist, ODF implementer notes, etc. ● sw README: http://opengrok.libreoffice.org/xref/core/sw/README ● Older Writer notes: ● http://cgit.freedesktop.org/libreoffice/build/tree/doc/sw­flr.otl?h=master­backup ● http://cgit.freedesktop.org/libreoffice/build/tree/doc/sw.txt?h=master­backup

Slide 23

Slide 23 text

23 / 23 Writer Training | Miklos Vajna Questions? ● Anyone?