Slide 1

Slide 1 text

Why ODF is the best intermediate format for report generation systems Naruhiko Ogasawara Twitter: @naru0ga Facebook: naruoga Telegram: @naruoga

Slide 2

Slide 2 text

02/08/2020 COSCUP 2020 Day2 2 Who am I ● 小笠原 (OGASAWARA) 徳彦 (Naruhiko) – Call me “NARU” ● FLOSS lover from Japan – LibreOffice, Ubuntu, Selenium, Jenkins, ... ● An employee of the security vendor in Japan – Internal tools development (like report generation systems) – DevSecOps service development

Slide 3

Slide 3 text

02/08/2020 COSCUP 2020 Day2 3 Agenda ● PDF as a reporting file format ● OpenDocument Format (ODF): overviews ● Implementation of ODF-based report generation system; in our case ● Conclusion

Slide 4

Slide 4 text

02/08/2020 COSCUP 2020 Day2 4 PDF as a reporting file format

Slide 5

Slide 5 text

02/08/2020 COSCUP 2020 Day2 5 PDF is the best for reports ● Application independent ● Environment independent ● Suitable for viewing on a monitor and for printing ● Casual prevention of modification ● PDF is the best file format for easy-to-read reports that don't require editing

Slide 6

Slide 6 text

02/08/2020 COSCUP 2020 Day2 6 Anyway… in my company ● As the security vendor, we do vulnerability testing every day ● Test customers’ software to find vulnerability – Sometimes manually by hands – Sometimes automated by vulnerability scanners ● Then generate PDF reports from test results

Slide 7

Slide 7 text

02/08/2020 COSCUP 2020 Day2 7 We need system like this Report Generation System Report Generation System Test results Template Report

Slide 8

Slide 8 text

02/08/2020 COSCUP 2020 Day2 8 Our choice: use Scala + ODF + LibreOffice ● Scala – Hybrid language: Object Oriented + Functional Programming – Run on JVM ● Can use huge Java-based library ecosystem and multi-platform ● ODF – LibreOffice native format – Easily manipulate via codes than OOXML (discussed later) – Suitable for intermediate format ● LibreOffice – Can covert from ODF to PDF

Slide 9

Slide 9 text

02/08/2020 COSCUP 2020 Day2 9 LibreOffice as the PDF generator ● LibreOffice is the feature-rich OSS office suite; it can be used to create all kinds of nice looking documents ● And powerful PDF generation functions – PDF/A – Accessibility complient – PDF forms – Digital signature ● Do this with command line, without GUI – Easy to integrate your own software soffice --headless --convert-to pdf *.odt soffice --headless --convert-to pdf *.odt

Slide 10

Slide 10 text

02/08/2020 COSCUP 2020 Day2 10 OpenDocument Format (ODF): overviews

Slide 11

Slide 11 text

02/08/2020 COSCUP 2020 Day2 11 OpenDocument Format (ODF): overviews ● http://opendocumentformat.org/ ● “REAL” International Standard file format for document productive suite – Standardized by OASIS, Open Document Format for Office Applications TC – ISO/IEC 26300 ● LibreOffice (and its predecessor, OpenOffice.org) native format ● Other software can use it thanks of Open Standard – Microsoft Office, Google Drive also support ● Simple, human-readable, easy to machine-manipulate zipped XML ● Keep up with the evolution of the application – Not as the “pseudo standard,” which is essentially unrevised from the proprietary application document format released in 2007

Slide 12

Slide 12 text

02/08/2020 COSCUP 2020 Day2 12 ODF structure basics ● Simple, human-readable, easy to machine-manipulate zipped XML – With some embedded media files – Easily found contents of your document ● Same package structures for each applications – Wordprocessor, Spreadsheet, Presentation, … ● Mostly common schema for each applications ● Better properties to process than OOXML, the same zipped XML

Slide 13

Slide 13 text

02/08/2020 COSCUP 2020 Day2 13 Package structure: ODF Word Processor Spreadsheet ODT ── ├ Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml ODS ├── Configurations2 │ ── ├ accelerator │ ── ├ floater │ ── ├ images │ │ ── └ Bitmaps │ ── ├ menubar │ ── ├ popupmenu │ ── ├ progressbar │ ── ├ statusbar │ ── ├ toolbar │ ── └ toolpanel ├── META-INF │ ── └ manifest.xml ├── Thumbnails │ ── └ thumbnail.png ├── content.xml ├── manifest.rdf ├── meta.xml ├── mimetype ├── settings.xml └── styles.xml

Slide 14

Slide 14 text

02/08/2020 COSCUP 2020 Day2 14 Package structure: OOXML Word Processor Spreadsheet DOCX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── word ├── _rels │ ── └ document.xml.rels ├── document.xml ├── fontTable.xml ├── settings.xml └── styles.xml XLSX ├── [Content_Types].xml ├── _rels ├── docProps │ ── ├ app.xml │ ── └ core.xml └── xl ├── _rels │ ── └ workbook.xml.rels ├── sharedStrings.xml ├── styles.xml ├── workbook.xml └── worksheets └── sheet1.xml

Slide 15

Slide 15 text

02/08/2020 COSCUP 2020 Day2 15 Schema: ODF Word Processor Spreadsheet THIS IS A TEST TEXT THIS IS A TEST TEXT

Slide 16

Slide 16 text

02/08/2020 COSCUP 2020 Day2 16 Schema: OOXML Word Processor Spreadsheet THIS IS A TEST TEXT ... SharedString.xml THIS IS A TEST TEXT Sheet1.xml 0

Slide 17

Slide 17 text

02/08/2020 COSCUP 2020 Day2 17 Manipulate ODF: four major ways ● Primitive – Unzip it, modify XML, then zip it again; no special tools needed ● Flat ODF – Special representation of ODF: all contents as a single XML file ● Manipulate LibreOffice via UNO interface – Powerful, but quite heavy ● ODF manipulation libraries – Flexible, lightweight and powerful, most recommended!

Slide 18

Slide 18 text

02/08/2020 COSCUP 2020 Day2 18 ODF manipulation libraries ● http://opendocumentformat.org/developers/ ● https://github.com/search?q=opendocument+form at&ref=opensearch ● There should be several libraries available in your favorite programming languages ● Or easily can develop your own libraries because ODF is so simple

Slide 19

Slide 19 text

02/08/2020 COSCUP 2020 Day2 19 Integration of ODF into the report generation system; in our case

Slide 20

Slide 20 text

02/08/2020 COSCUP 2020 Day2 20 Our library choice: jOpenDocument ● http://www.jopendocument.org/ ● Well template handling with the dedicated extension ● Simple API ● Bit an old: latest release at 2014 (1.4 rc2) ● But still useful

Slide 21

Slide 21 text

02/08/2020 COSCUP 2020 Day2 21 Use jOpenDocument with SBT ● Unfortunately, jOpenDocument has not published in public repo (like maven central) ● So grab *.jar then put it on your project ‘lib’ dir ● Then SBT automatically recognize the dependency

Slide 22

Slide 22 text

02/08/2020 COSCUP 2020 Day2 22 Play with template files in resource ● Put your template file into src/main/resources ● Then just do this:

Slide 23

Slide 23 text

02/08/2020 COSCUP 2020 Day2 23 Prepare template file with jOpenDocument extension ● If you use LibreOffice 7.0 (which will release within a week), DO NOT FORGET save your template as ODF format version “1.2 Extended” – ODF 1.3 is the latest standard version of ODF, which does not be supported 2014’s library

Slide 24

Slide 24 text

02/08/2020 COSCUP 2020 Day2 24 More practical example ● Reports have several parts – Title – Issues list – Issue details for each issues – End of report (such as disclaimer, contact, …) ABC System Vulnerability Test Report Issues List Issue Detail #1 ... Issue Detail #2 ... End Of Report

Slide 25

Slide 25 text

02/08/2020 COSCUP 2020 Day2 25 Project class NOTE: This is NOT our actual code because it is our IP, and I know I shouldn’t compound data model and it’s output procedure...

Slide 26

Slide 26 text

02/08/2020 COSCUP 2020 Day2 26 Issue class

Slide 27

Slide 27 text

02/08/2020 COSCUP 2020 Day2 27 Report generation ● Assume that there are “project” and “issues” (list of Issue instances)

Slide 28

Slide 28 text

02/08/2020 COSCUP 2020 Day2 28 Example result

Slide 29

Slide 29 text

02/08/2020 COSCUP 2020 Day2 29 Our reporting system(s)

Slide 30

Slide 30 text

02/08/2020 COSCUP 2020 Day2 30 Future Plan ● Migrate jOpenDocument to others – Such as ODFDOM, part of ODF Toolkit – ODF Toolkit is an official project by The Document Foundation, home organization of LibreOffice – https://odftoolkit.org/odfdom/ ● Even better, re-implement jOpenDocument on top of ODFDOM

Slide 31

Slide 31 text

02/08/2020 COSCUP 2020 Day2 31 Conclusion

Slide 32

Slide 32 text

02/08/2020 COSCUP 2020 Day2 32 Conclusion ● PDF is the best report file format ● ODF is great for PDF report generation, – with using ODF manipulation libraries for your favorite programming languages ● In our case, we are happy with Scala + jOpenDocument + ODF + LibreOffice :)

Slide 33

Slide 33 text

02/08/2020 COSCUP 2020 Day2 33 Questions? Twitter: @naru0ga Facebook: naruoga Telegram: @naruoga

Slide 34

Slide 34 text

02/08/2020 COSCUP 2020 Day2 34 REFERENCE ● Sample project of Scala + jOpenDocument + ODF (+ LibreOffice) – https://github.com/naruoga/jopendocumentsample – At this time, no document includes README and LICENSES – And might have unused files – But hope it helps you