Why ODF is the best intermediate format for report generation systems

Why ODF is the best intermediate format for report generation systems

Slide for COSCUP 2020 RB105
https://coscup.org/2020/en/agenda/KXPYPW

Fe32614f47b6c4f7a8155a5576703ba8?s=128

Naruhiko Ogasawara

August 02, 2020
Tweet

Transcript

  1. Why ODF is the best intermediate format for report generation

    systems Naruhiko Ogasawara Twitter: @naru0ga Facebook: naruoga Telegram: @naruoga
  2. 02/08/2020 COSCUP 2020 Day2 2 Who am I • 小笠原

    (OGASAWARA) 徳彦 (Naruhiko) – Call me “NARU” • FLOSS lover from Japan – LibreOffice, Ubuntu, Selenium, Jenkins, ... • An employee of the security vendor in Japan – Internal tools development (like report generation systems) – DevSecOps service development
  3. 02/08/2020 COSCUP 2020 Day2 3 Agenda • PDF as a

    reporting file format • OpenDocument Format (ODF): overviews • Implementation of ODF-based report generation system; in our case • Conclusion
  4. 02/08/2020 COSCUP 2020 Day2 4 PDF as a reporting file

    format
  5. 02/08/2020 COSCUP 2020 Day2 5 PDF is the best for

    reports • Application independent • Environment independent • Suitable for viewing on a monitor and for printing • Casual prevention of modification • PDF is the best file format for easy-to-read reports that don't require editing
  6. 02/08/2020 COSCUP 2020 Day2 6 Anyway… in my company •

    As the security vendor, we do vulnerability testing every day • Test customers’ software to find vulnerability – Sometimes manually by hands – Sometimes automated by vulnerability scanners • Then generate PDF reports from test results
  7. 02/08/2020 COSCUP 2020 Day2 7 We need system like this

    Report Generation System Report Generation System Test results Template Report
  8. 02/08/2020 COSCUP 2020 Day2 8 Our choice: use Scala +

    ODF + LibreOffice • Scala – Hybrid language: Object Oriented + Functional Programming – Run on JVM • Can use huge Java-based library ecosystem and multi-platform • ODF – LibreOffice native format – Easily manipulate via codes than OOXML (discussed later) – Suitable for intermediate format • LibreOffice – Can covert from ODF to PDF
  9. 02/08/2020 COSCUP 2020 Day2 9 LibreOffice as the PDF generator

    • LibreOffice is the feature-rich OSS office suite; it can be used to create all kinds of nice looking documents • And powerful PDF generation functions – PDF/A – Accessibility complient – PDF forms – Digital signature • Do this with command line, without GUI – Easy to integrate your own software soffice --headless --convert-to pdf *.odt soffice --headless --convert-to pdf *.odt
  10. 02/08/2020 COSCUP 2020 Day2 10 OpenDocument Format (ODF): overviews

  11. 02/08/2020 COSCUP 2020 Day2 11 OpenDocument Format (ODF): overviews •

    http://opendocumentformat.org/ • “REAL” International Standard file format for document productive suite – Standardized by OASIS, Open Document Format for Office Applications TC – ISO/IEC 26300 • LibreOffice (and its predecessor, OpenOffice.org) native format • Other software can use it thanks of Open Standard – Microsoft Office, Google Drive also support • Simple, human-readable, easy to machine-manipulate zipped XML • Keep up with the evolution of the application – Not as the “pseudo standard,” which is essentially unrevised from the proprietary application document format released in 2007
  12. 02/08/2020 COSCUP 2020 Day2 12 ODF structure basics • Simple,

    human-readable, easy to machine-manipulate zipped XML – With some embedded media files – Easily found contents of your document • Same package structures for each applications – Wordprocessor, Spreadsheet, Presentation, … • Mostly common schema for each applications • Better properties to process than OOXML, the same zipped XML
  13. 02/08/2020 COSCUP 2020 Day2 13 Package structure: ODF Word Processor

    Spreadsheet ODT ── ├ Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml ODS ├── Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml
  14. 02/08/2020 COSCUP 2020 Day2 14 Package structure: OOXML Word Processor

    Spreadsheet DOCX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── word ├── _rels │ ── └ document.xml.rels ├── document.xml ├── fontTable.xml ├── settings.xml └── styles.xml XLSX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── xl ├── _rels │ ── └ workbook.xml.rels ├── sharedStrings.xml ├── styles.xml ├── workbook.xml └── worksheets └── sheet1.xml
  15. 02/08/2020 COSCUP 2020 Day2 15 Schema: ODF Word Processor Spreadsheet

    <office:document-content xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me ta:1.0" ...> <office:body> <office:text> <text:sequence-decls> <text:sequence-decl text:display- outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display- outline-level="0" text:name="Table"/> <text:sequence-decl text:display- outline-level="0" text:name="Text"/> <text:sequence-decl text:display- outline-level="0" text:name="Drawing"/> <text:sequence-decl text:display- outline-level="0" text:name="Figure"/> </text:sequence-decls> <text:p text:style-name="Standard">THIS IS A TEST TEXT</text:p> </office:text> </office:body> </office:document-content> <office:document-content xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me ta:1.0" ...> <office:body> <office:spreadsheet> <table:calculation-settings table:automatic-find-labels="false" table:use- regular-expressions="false" table:use- wildcards="true"/> <table:table table:name="Sheet1" table:style-name="ta1"> <table:table-column table:style- name="co1" table:default-cell-style-name="Default"/> <table:table-row table:style- name="ro1"> <table:table-cell office:value- type="string" calcext:value-type="string"> <text:p>THIS IS A TEST TEXT</text:p> </table:table-cell> </table:table-row> </table:table> <table:named-expressions/> </office:spreadsheet> </office:body> </office:document-content>
  16. 02/08/2020 COSCUP 2020 Day2 16 Schema: OOXML Word Processor Spreadsheet

    <w:document xmlns:o="urn:schemas-microsoft- com:office:office" ...> <w:body> <w:p> <w:pPr> <w:pStyle w:val="Normal"/> <w:bidi w:val="0"/> <w:jc w:val="left"/> <w:rPr></w:rPr> </w:pPr> <w:r> <w:rPr></w:rPr> <w:t>THIS IS A TEST TEXT</w:t> </w:r> </w:p> ... </w:body> </w:document> SharedString.xml <si> <t xml:space="preserve">THIS IS A TEST TEXT</t> </si> Sheet1.xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreads heetml/2006/main" ...> <sheetData> <row r="1" customFormat="false" ht="12.8" hidden="false" customHeight="false" outlineLevel="0" collapsed="false"> <c r="A1" s="1" t="s"> <v>0</v> </c> </row>
  17. 02/08/2020 COSCUP 2020 Day2 17 Manipulate ODF: four major ways

    • Primitive – Unzip it, modify XML, then zip it again; no special tools needed • Flat ODF – Special representation of ODF: all contents as a single XML file • Manipulate LibreOffice via UNO interface – Powerful, but quite heavy • ODF manipulation libraries – Flexible, lightweight and powerful, most recommended!
  18. 02/08/2020 COSCUP 2020 Day2 18 ODF manipulation libraries • http://opendocumentformat.org/developers/

    • https://github.com/search?q=opendocument+form at&ref=opensearch • There should be several libraries available in your favorite programming languages • Or easily can develop your own libraries because ODF is so simple
  19. 02/08/2020 COSCUP 2020 Day2 19 Integration of ODF into the

    report generation system; in our case
  20. 02/08/2020 COSCUP 2020 Day2 20 Our library choice: jOpenDocument •

    http://www.jopendocument.org/ • Well template handling with the dedicated extension • Simple API • Bit an old: latest release at 2014 (1.4 rc2) • But still useful
  21. 02/08/2020 COSCUP 2020 Day2 21 Use jOpenDocument with SBT •

    Unfortunately, jOpenDocument has not published in public repo (like maven central) • So grab *.jar then put it on your project ‘lib’ dir • Then SBT automatically recognize the dependency
  22. 02/08/2020 COSCUP 2020 Day2 22 Play with template files in

    resource • Put your template file into src/main/resources • Then just do this:
  23. 02/08/2020 COSCUP 2020 Day2 23 Prepare template file with jOpenDocument

    extension • If you use LibreOffice 7.0 (which will release within a week), DO NOT FORGET save your template as ODF format version “1.2 Extended” – ODF 1.3 is the latest standard version of ODF, which does not be supported 2014’s library
  24. 02/08/2020 COSCUP 2020 Day2 24 More practical example • Reports

    have several parts – Title – Issues list – Issue details for each issues – End of report (such as disclaimer, contact, …) ABC System Vulnerability Test Report Issues List Issue Detail #1 ... Issue Detail #2 ... End Of Report
  25. 02/08/2020 COSCUP 2020 Day2 25 Project class NOTE: This is

    NOT our actual code because it is our IP, and I know I shouldn’t compound data model and it’s output procedure...
  26. 02/08/2020 COSCUP 2020 Day2 26 Issue class

  27. 02/08/2020 COSCUP 2020 Day2 27 Report generation • Assume that

    there are “project” and “issues” (list of Issue instances)
  28. 02/08/2020 COSCUP 2020 Day2 28 Example result

  29. 02/08/2020 COSCUP 2020 Day2 29 Our reporting system(s)

  30. 02/08/2020 COSCUP 2020 Day2 30 Future Plan • Migrate jOpenDocument

    to others – Such as ODFDOM, part of ODF Toolkit – ODF Toolkit is an official project by The Document Foundation, home organization of LibreOffice – https://odftoolkit.org/odfdom/ • Even better, re-implement jOpenDocument on top of ODFDOM
  31. 02/08/2020 COSCUP 2020 Day2 31 Conclusion

  32. 02/08/2020 COSCUP 2020 Day2 32 Conclusion • PDF is the

    best report file format • ODF is great for PDF report generation, – with using ODF manipulation libraries for your favorite programming languages • In our case, we are happy with Scala + jOpenDocument + ODF + LibreOffice :)
  33. 02/08/2020 COSCUP 2020 Day2 33 Questions? Twitter: @naru0ga Facebook: naruoga

    Telegram: @naruoga
  34. 02/08/2020 COSCUP 2020 Day2 34 REFERENCE • Sample project of

    Scala + jOpenDocument + ODF (+ LibreOffice) – https://github.com/naruoga/jopendocumentsample – At this time, no document includes README and LICENSES – And might have unused files – But hope it helps you