Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why ODF is the best intermediate format for report generation systems

Why ODF is the best intermediate format for report generation systems

Slide for COSCUP 2020 RB105
https://coscup.org/2020/en/agenda/KXPYPW

Naruhiko Ogasawara

August 02, 2020
Tweet

More Decks by Naruhiko Ogasawara

Other Decks in Technology

Transcript

  1. Why ODF is the best
    intermediate format for
    report generation systems
    Naruhiko Ogasawara
    Twitter: @naru0ga
    Facebook: naruoga
    Telegram: @naruoga

    View full-size slide

  2. 02/08/2020 COSCUP 2020 Day2 2
    Who am I
    ● 小笠原 (OGASAWARA) 徳彦 (Naruhiko)
    – Call me “NARU”
    ● FLOSS lover from Japan
    – LibreOffice, Ubuntu, Selenium, Jenkins, ...
    ● An employee of the security vendor in Japan
    – Internal tools development (like report generation systems)
    – DevSecOps service development

    View full-size slide

  3. 02/08/2020 COSCUP 2020 Day2 3
    Agenda
    ● PDF as a reporting file format
    ● OpenDocument Format (ODF): overviews
    ● Implementation of ODF-based report generation
    system; in our case
    ● Conclusion

    View full-size slide

  4. 02/08/2020 COSCUP 2020 Day2 4
    PDF as a reporting file format

    View full-size slide

  5. 02/08/2020 COSCUP 2020 Day2 5
    PDF is the best for reports
    ● Application independent
    ● Environment independent
    ● Suitable for viewing on a monitor and for printing
    ● Casual prevention of modification
    ● PDF is the best file format for easy-to-read reports
    that don't require editing

    View full-size slide

  6. 02/08/2020 COSCUP 2020 Day2 6
    Anyway… in my company
    ● As the security vendor, we do vulnerability testing
    every day
    ● Test customers’ software to find vulnerability
    – Sometimes manually by hands
    – Sometimes automated by vulnerability scanners
    ● Then generate PDF reports from test results

    View full-size slide

  7. 02/08/2020 COSCUP 2020 Day2 7
    We need system like this
    Report
    Generation
    System
    Report
    Generation
    System
    Test results
    Template
    Report

    View full-size slide

  8. 02/08/2020 COSCUP 2020 Day2 8
    Our choice: use Scala + ODF + LibreOffice
    ● Scala
    – Hybrid language: Object Oriented + Functional Programming
    – Run on JVM
    ● Can use huge Java-based library ecosystem and multi-platform
    ● ODF
    – LibreOffice native format
    – Easily manipulate via codes than OOXML (discussed later)
    – Suitable for intermediate format
    ● LibreOffice
    – Can covert from ODF to PDF

    View full-size slide

  9. 02/08/2020 COSCUP 2020 Day2 9
    LibreOffice as the PDF generator
    ● LibreOffice is the feature-rich OSS office suite;
    it can be used to create all kinds of nice looking documents
    ● And powerful PDF generation functions
    – PDF/A
    – Accessibility complient
    – PDF forms
    – Digital signature
    ● Do this with command line, without GUI
    – Easy to integrate your own software
    soffice --headless --convert-to pdf *.odt
    soffice --headless --convert-to pdf *.odt

    View full-size slide

  10. 02/08/2020 COSCUP 2020 Day2 10
    OpenDocument Format (ODF): overviews

    View full-size slide

  11. 02/08/2020 COSCUP 2020 Day2 11
    OpenDocument Format (ODF): overviews
    ● http://opendocumentformat.org/
    ● “REAL” International Standard file format for document productive suite
    – Standardized by OASIS, Open Document Format for Office Applications TC
    – ISO/IEC 26300
    ● LibreOffice (and its predecessor, OpenOffice.org) native format
    ● Other software can use it thanks of Open Standard
    – Microsoft Office, Google Drive also support
    ● Simple, human-readable, easy to machine-manipulate zipped XML
    ● Keep up with the evolution of the application
    – Not as the “pseudo standard,” which is essentially unrevised from the proprietary application
    document format released in 2007

    View full-size slide

  12. 02/08/2020 COSCUP 2020 Day2 12
    ODF structure basics
    ● Simple, human-readable, easy to machine-manipulate zipped
    XML
    – With some embedded media files
    – Easily found contents of your document
    ● Same package structures for each applications
    – Wordprocessor, Spreadsheet, Presentation, …
    ● Mostly common schema for each applications
    ● Better properties to process than OOXML, the same zipped XML

    View full-size slide

  13. 02/08/2020 COSCUP 2020 Day2 13
    Package structure: ODF
    Word Processor Spreadsheet
    ODT
    ──
    ├ Configurations2
    │ ──
    ├ accelerator
    │ ──
    ├ floater
    │ ──
    ├ images
    │ │ ──
    └ Bitmaps
    │ ──
    ├ menubar
    │ ──
    ├ popupmenu
    │ ──
    ├ progressbar
    │ ──
    ├ statusbar
    │ ──
    ├ toolbar
    │ ──
    └ toolpanel
    ├── META-INF
    │ ──
    └ manifest.xml
    ├── Thumbnails
    │ ──
    └ thumbnail.png
    ├── content.xml
    ├── manifest.rdf
    ├── meta.xml
    ├── mimetype
    ├── settings.xml
    └── styles.xml
    ODS
    ├── Configurations2
    │ ──
    ├ accelerator
    │ ──
    ├ floater
    │ ──
    ├ images
    │ │ ──
    └ Bitmaps
    │ ──
    ├ menubar
    │ ──
    ├ popupmenu
    │ ──
    ├ progressbar
    │ ──
    ├ statusbar
    │ ──
    ├ toolbar
    │ ──
    └ toolpanel
    ├── META-INF
    │ ──
    └ manifest.xml
    ├── Thumbnails
    │ ──
    └ thumbnail.png
    ├── content.xml
    ├── manifest.rdf
    ├── meta.xml
    ├── mimetype
    ├── settings.xml
    └── styles.xml

    View full-size slide

  14. 02/08/2020 COSCUP 2020 Day2 14
    Package structure: OOXML
    Word Processor Spreadsheet
    DOCX
    ├── [Content_Types].xml
    ├── _rels
    ├── docProps
    │ ──
    ├ app.xml
    │ ──
    └ core.xml
    └── word
    ├── _rels
    │ ──
    └ document.xml.rels
    ├── document.xml
    ├── fontTable.xml
    ├── settings.xml
    └── styles.xml
    XLSX
    ├── [Content_Types].xml
    ├── _rels
    ├── docProps
    │ ──
    ├ app.xml
    │ ──
    └ core.xml
    └── xl
    ├── _rels
    │ ──
    └ workbook.xml.rels
    ├── sharedStrings.xml
    ├── styles.xml
    ├── workbook.xml
    └── worksheets
    └── sheet1.xml

    View full-size slide

  15. 02/08/2020 COSCUP 2020 Day2 15
    Schema: ODF
    Word Processor Spreadsheet
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me
    ta:1.0" ...>



    outline-level="0" text:name="Illustration"/>
    outline-level="0" text:name="Table"/>
    outline-level="0" text:name="Text"/>
    outline-level="0" text:name="Drawing"/>
    outline-level="0" text:name="Figure"/>

    THIS
    IS A TEST TEXT



    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:me
    ta:1.0" ...>


    table:automatic-find-labels="false" table:use-
    regular-expressions="false" table:use-
    wildcards="true"/>
    table:style-name="ta1">
    name="co1" table:default-cell-style-name="Default"/>
    name="ro1">
    type="string" calcext:value-type="string">
    THIS IS A TEST
    TEXT







    View full-size slide

  16. 02/08/2020 COSCUP 2020 Day2 16
    Schema: OOXML
    Word Processor Spreadsheet











    THIS IS A TEST TEXT


    ...


    SharedString.xml

    THIS IS A TEST
    TEXT

    Sheet1.xml
    standalone="yes"?>
    xmlns="http://schemas.openxmlformats.org/spreads
    heetml/2006/main" ...>

    ht="12.8" hidden="false" customHeight="false"
    outlineLevel="0" collapsed="false">

    0


    View full-size slide

  17. 02/08/2020 COSCUP 2020 Day2 17
    Manipulate ODF: four major ways
    ● Primitive
    – Unzip it, modify XML, then zip it again; no special tools needed
    ● Flat ODF
    – Special representation of ODF: all contents as a single XML file
    ● Manipulate LibreOffice via UNO interface
    – Powerful, but quite heavy
    ● ODF manipulation libraries
    – Flexible, lightweight and powerful, most recommended!

    View full-size slide

  18. 02/08/2020 COSCUP 2020 Day2 18
    ODF manipulation libraries
    ● http://opendocumentformat.org/developers/
    ● https://github.com/search?q=opendocument+form
    at&ref=opensearch
    ● There should be several libraries available in your
    favorite programming languages
    ● Or easily can develop your own libraries because
    ODF is so simple

    View full-size slide

  19. 02/08/2020 COSCUP 2020 Day2 19
    Integration of ODF into the report
    generation system;
    in our case

    View full-size slide

  20. 02/08/2020 COSCUP 2020 Day2 20
    Our library choice: jOpenDocument
    ● http://www.jopendocument.org/
    ● Well template handling with the dedicated extension
    ● Simple API
    ● Bit an old: latest release at 2014 (1.4 rc2)
    ● But still useful

    View full-size slide

  21. 02/08/2020 COSCUP 2020 Day2 21
    Use jOpenDocument with SBT
    ● Unfortunately,
    jOpenDocument has not
    published in public repo
    (like maven central)
    ● So grab *.jar then put it on
    your project ‘lib’ dir
    ● Then SBT automatically
    recognize the dependency

    View full-size slide

  22. 02/08/2020 COSCUP 2020 Day2 22
    Play with template files in resource
    ● Put your template file into src/main/resources
    ● Then just do this:

    View full-size slide

  23. 02/08/2020 COSCUP 2020 Day2 23
    Prepare template file with jOpenDocument extension
    ● If you use LibreOffice 7.0
    (which will release within
    a week),
    DO NOT FORGET save your
    template as ODF format
    version “1.2 Extended”
    – ODF 1.3 is the latest standard
    version of ODF,
    which does not be supported
    2014’s library

    View full-size slide

  24. 02/08/2020 COSCUP 2020 Day2 24
    More practical example
    ● Reports have several parts
    – Title
    – Issues list
    – Issue details for each issues
    – End of report (such as disclaimer, contact, …)
    ABC
    System
    Vulnerability
    Test
    Report
    Issues List
    Issue Detail
    #1 ...
    Issue Detail
    #2 ...
    End
    Of
    Report

    View full-size slide

  25. 02/08/2020 COSCUP 2020 Day2 25
    Project class
    NOTE: This is NOT our actual code because it is our IP,
    and I know I shouldn’t compound data model and it’s output procedure...

    View full-size slide

  26. 02/08/2020 COSCUP 2020 Day2 26
    Issue class

    View full-size slide

  27. 02/08/2020 COSCUP 2020 Day2 27
    Report generation
    ● Assume that there are “project” and “issues” (list
    of Issue instances)

    View full-size slide

  28. 02/08/2020 COSCUP 2020 Day2 28
    Example result

    View full-size slide

  29. 02/08/2020 COSCUP 2020 Day2 29
    Our reporting system(s)

    View full-size slide

  30. 02/08/2020 COSCUP 2020 Day2 30
    Future Plan
    ● Migrate jOpenDocument to others
    – Such as ODFDOM, part of ODF Toolkit
    – ODF Toolkit is an official project by The Document
    Foundation, home organization of LibreOffice
    – https://odftoolkit.org/odfdom/
    ● Even better, re-implement jOpenDocument on top
    of ODFDOM

    View full-size slide

  31. 02/08/2020 COSCUP 2020 Day2 31
    Conclusion

    View full-size slide

  32. 02/08/2020 COSCUP 2020 Day2 32
    Conclusion
    ● PDF is the best report file format
    ● ODF is great for PDF report generation,
    – with using ODF manipulation libraries for your favorite
    programming languages
    ● In our case, we are happy with Scala +
    jOpenDocument + ODF + LibreOffice :)

    View full-size slide

  33. 02/08/2020 COSCUP 2020 Day2 33
    Questions?
    Twitter: @naru0ga
    Facebook: naruoga
    Telegram: @naruoga

    View full-size slide

  34. 02/08/2020 COSCUP 2020 Day2 34
    REFERENCE
    ● Sample project of Scala + jOpenDocument + ODF (+
    LibreOffice)
    – https://github.com/naruoga/jopendocumentsample
    – At this time, no document includes README and LICENSES
    – And might have unused files
    – But hope it helps you

    View full-size slide