Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why ODF is the best intermediate format for report generation systems

Why ODF is the best intermediate format for report generation systems

Slide for COSCUP 2020 RB105
https://coscup.org/2020/en/agenda/KXPYPW

Naruhiko Ogasawara

August 02, 2020
Tweet

More Decks by Naruhiko Ogasawara

Other Decks in Technology

Transcript

  1. Why ODF is the best intermediate format for report generation

    systems Naruhiko Ogasawara Twitter: @naru0ga Facebook: naruoga Telegram: @naruoga
  2. 02/08/2020 COSCUP 2020 Day2 2 Who am I • 小笠原

    (OGASAWARA) 徳彦 (Naruhiko) – Call me “NARU” • FLOSS lover from Japan – LibreOffice, Ubuntu, Selenium, Jenkins, ... • An employee of the security vendor in Japan – Internal tools development (like report generation systems) – DevSecOps service development
  3. 02/08/2020 COSCUP 2020 Day2 3 Agenda • PDF as a

    reporting file format • OpenDocument Format (ODF): overviews • Implementation of ODF-based report generation system; in our case • Conclusion
  4. 02/08/2020 COSCUP 2020 Day2 4 PDF as a reporting file

    format
  5. 02/08/2020 COSCUP 2020 Day2 5 PDF is the best for

    reports • Application independent • Environment independent • Suitable for viewing on a monitor and for printing • Casual prevention of modification • PDF is the best file format for easy-to-read reports that don't require editing
  6. 02/08/2020 COSCUP 2020 Day2 6 Anyway… in my company •

    As the security vendor, we do vulnerability testing every day • Test customers’ software to find vulnerability – Sometimes manually by hands – Sometimes automated by vulnerability scanners • Then generate PDF reports from test results
  7. 02/08/2020 COSCUP 2020 Day2 7 We need system like this

    Report Generation System Report Generation System Test results Template Report
  8. 02/08/2020 COSCUP 2020 Day2 8 Our choice: use Scala +

    ODF + LibreOffice • Scala – Hybrid language: Object Oriented + Functional Programming – Run on JVM • Can use huge Java-based library ecosystem and multi-platform • ODF – LibreOffice native format – Easily manipulate via codes than OOXML (discussed later) – Suitable for intermediate format • LibreOffice – Can covert from ODF to PDF
  9. 02/08/2020 COSCUP 2020 Day2 9 LibreOffice as the PDF generator

    • LibreOffice is the feature-rich OSS office suite; it can be used to create all kinds of nice looking documents • And powerful PDF generation functions – PDF/A – Accessibility complient – PDF forms – Digital signature • Do this with command line, without GUI – Easy to integrate your own software soffice --headless --convert-to pdf *.odt soffice --headless --convert-to pdf *.odt
  10. 02/08/2020 COSCUP 2020 Day2 10 OpenDocument Format (ODF): overviews

  11. 02/08/2020 COSCUP 2020 Day2 11 OpenDocument Format (ODF): overviews •

    http://opendocumentformat.org/ • “REAL” International Standard file format for document productive suite – Standardized by OASIS, Open Document Format for Office Applications TC – ISO/IEC 26300 • LibreOffice (and its predecessor, OpenOffice.org) native format • Other software can use it thanks of Open Standard – Microsoft Office, Google Drive also support • Simple, human-readable, easy to machine-manipulate zipped XML • Keep up with the evolution of the application – Not as the “pseudo standard,” which is essentially unrevised from the proprietary application document format released in 2007
  12. 02/08/2020 COSCUP 2020 Day2 12 ODF structure basics • Simple,

    human-readable, easy to machine-manipulate zipped XML – With some embedded media files – Easily found contents of your document • Same package structures for each applications – Wordprocessor, Spreadsheet, Presentation, … • Mostly common schema for each applications • Better properties to process than OOXML, the same zipped XML
  13. 02/08/2020 COSCUP 2020 Day2 13 Package structure: ODF Word Processor

    Spreadsheet ODT ── ├ Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml ODS ├── Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml
  14. 02/08/2020 COSCUP 2020 Day2 14 Package structure: OOXML Word Processor

    Spreadsheet DOCX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── word ├── _rels │ ── └ document.xml.rels ├── document.xml ├── fontTable.xml ├── settings.xml └── styles.xml XLSX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── xl ├── _rels │ ── └ workbook.xml.rels ├── sharedStrings.xml ├── styles.xml ├── workbook.xml └── worksheets └── sheet1.xml
  15. 02/08/2020 COSCUP 2020 Day2 15 Schema: ODF Word Processor Spreadsheet

    <office:document-content xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me ta:1.0" ...> <office:body> <office:text> <text:sequence-decls> <text:sequence-decl text:display- outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display- outline-level="0" text:name="Table"/> <text:sequence-decl text:display- outline-level="0" text:name="Text"/> <text:sequence-decl text:display- outline-level="0" text:name="Drawing"/> <text:sequence-decl text:display- outline-level="0" text:name="Figure"/> </text:sequence-decls> <text:p text:style-name="Standard">THIS IS A TEST TEXT</text:p> </office:text> </office:body> </office:document-content> <office:document-content xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me ta:1.0" ...> <office:body> <office:spreadsheet> <table:calculation-settings table:automatic-find-labels="false" table:use- regular-expressions="false" table:use- wildcards="true"/> <table:table table:name="Sheet1" table:style-name="ta1"> <table:table-column table:style- name="co1" table:default-cell-style-name="Default"/> <table:table-row table:style- name="ro1"> <table:table-cell office:value- type="string" calcext:value-type="string"> <text:p>THIS IS A TEST TEXT</text:p> </table:table-cell> </table:table-row> </table:table> <table:named-expressions/> </office:spreadsheet> </office:body> </office:document-content>
  16. 02/08/2020 COSCUP 2020 Day2 16 Schema: OOXML Word Processor Spreadsheet

    <w:document xmlns:o="urn:schemas-microsoft- com:office:office" ...> <w:body> <w:p> <w:pPr> <w:pStyle w:val="Normal"/> <w:bidi w:val="0"/> <w:jc w:val="left"/> <w:rPr></w:rPr> </w:pPr> <w:r> <w:rPr></w:rPr> <w:t>THIS IS A TEST TEXT</w:t> </w:r> </w:p> ... </w:body> </w:document> SharedString.xml <si> <t xml:space="preserve">THIS IS A TEST TEXT</t> </si> Sheet1.xml <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreads heetml/2006/main" ...> <sheetData> <row r="1" customFormat="false" ht="12.8" hidden="false" customHeight="false" outlineLevel="0" collapsed="false"> <c r="A1" s="1" t="s"> <v>0</v> </c> </row>
  17. 02/08/2020 COSCUP 2020 Day2 17 Manipulate ODF: four major ways

    • Primitive – Unzip it, modify XML, then zip it again; no special tools needed • Flat ODF – Special representation of ODF: all contents as a single XML file • Manipulate LibreOffice via UNO interface – Powerful, but quite heavy • ODF manipulation libraries – Flexible, lightweight and powerful, most recommended!
  18. 02/08/2020 COSCUP 2020 Day2 18 ODF manipulation libraries • http://opendocumentformat.org/developers/

    • https://github.com/search?q=opendocument+form at&ref=opensearch • There should be several libraries available in your favorite programming languages • Or easily can develop your own libraries because ODF is so simple
  19. 02/08/2020 COSCUP 2020 Day2 19 Integration of ODF into the

    report generation system; in our case
  20. 02/08/2020 COSCUP 2020 Day2 20 Our library choice: jOpenDocument •

    http://www.jopendocument.org/ • Well template handling with the dedicated extension • Simple API • Bit an old: latest release at 2014 (1.4 rc2) • But still useful
  21. 02/08/2020 COSCUP 2020 Day2 21 Use jOpenDocument with SBT •

    Unfortunately, jOpenDocument has not published in public repo (like maven central) • So grab *.jar then put it on your project ‘lib’ dir • Then SBT automatically recognize the dependency
  22. 02/08/2020 COSCUP 2020 Day2 22 Play with template files in

    resource • Put your template file into src/main/resources • Then just do this:
  23. 02/08/2020 COSCUP 2020 Day2 23 Prepare template file with jOpenDocument

    extension • If you use LibreOffice 7.0 (which will release within a week), DO NOT FORGET save your template as ODF format version “1.2 Extended” – ODF 1.3 is the latest standard version of ODF, which does not be supported 2014’s library
  24. 02/08/2020 COSCUP 2020 Day2 24 More practical example • Reports

    have several parts – Title – Issues list – Issue details for each issues – End of report (such as disclaimer, contact, …) ABC System Vulnerability Test Report Issues List Issue Detail #1 ... Issue Detail #2 ... End Of Report
  25. 02/08/2020 COSCUP 2020 Day2 25 Project class NOTE: This is

    NOT our actual code because it is our IP, and I know I shouldn’t compound data model and it’s output procedure...
  26. 02/08/2020 COSCUP 2020 Day2 26 Issue class

  27. 02/08/2020 COSCUP 2020 Day2 27 Report generation • Assume that

    there are “project” and “issues” (list of Issue instances)
  28. 02/08/2020 COSCUP 2020 Day2 28 Example result

  29. 02/08/2020 COSCUP 2020 Day2 29 Our reporting system(s)

  30. 02/08/2020 COSCUP 2020 Day2 30 Future Plan • Migrate jOpenDocument

    to others – Such as ODFDOM, part of ODF Toolkit – ODF Toolkit is an official project by The Document Foundation, home organization of LibreOffice – https://odftoolkit.org/odfdom/ • Even better, re-implement jOpenDocument on top of ODFDOM
  31. 02/08/2020 COSCUP 2020 Day2 31 Conclusion

  32. 02/08/2020 COSCUP 2020 Day2 32 Conclusion • PDF is the

    best report file format • ODF is great for PDF report generation, – with using ODF manipulation libraries for your favorite programming languages • In our case, we are happy with Scala + jOpenDocument + ODF + LibreOffice :)
  33. 02/08/2020 COSCUP 2020 Day2 33 Questions? Twitter: @naru0ga Facebook: naruoga

    Telegram: @naruoga
  34. 02/08/2020 COSCUP 2020 Day2 34 REFERENCE • Sample project of

    Scala + jOpenDocument + ODF (+ LibreOffice) – https://github.com/naruoga/jopendocumentsample – At this time, no document includes README and LICENSES – And might have unused files – But hope it helps you