Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData Südwest Meetup KA 2018-06-13

PyData Südwest Meetup KA 2018-06-13

Technology Insight · Document the Data – Creating Reports Using Docs Tooling

As a developer, use your existing skill set regarding programming and documentation to generate technical or business reports from distributed data sources.

https://www.meetup.com/PyData-Suedwest/events/250368244/

Jürgen Hermann

June 13, 2018
Tweet

More Decks by Jürgen Hermann

Other Decks in Technology

Transcript

  1. Architecture Board
    Application Monitoring Service
    Phase 1 – Metrics Gateway
    Jürgen Hermann
    PyData Meetup KA · 2018-06
    Technology Insight
    Document the Data
    Creating Reports Using Docs Tooling
    jhermann
    [email protected]

    View Slide

  2. Data Reporting
    Data Reporting
    & Visualization
    & Visualization

    View Slide

  3. Goals & Requirements
    Goals & Requirements

    Mine already existing but distributed data

    Generate new insights…
    – by federation of isolated knowledge
    – by views specifically designed for different audiences

    Avoid one-shot reporting efforts
    – Commonly done manually & thus expensive / not amortized
    – Sustainable & Continuous – “After the audit is before the audit”

    Create demand for complete & correct data
    – Provide motivation for data entry and maintenance
    – Data quality is driven by data usage

    View Slide

  4. Basic Solution Idea:
    Basic Solution Idea:
    Use the Tools You Know
    Use the Tools You Know
    (as a Developer)
    (as a Developer)

    Documentation tools have a lot
    of similarities with classic reporting tools

    Most everything is text on the way to the final rendering,
    and thus easily worked with / debugged

    Just another application domain of these tools –
    use existing / easily gained & retained working knowledge

    Technology stack used here: Python3 · Jinja2 · Sphinx

    View Slide

  5. Python3 –
    Python3 – CLI and
    and
    Model /
    Model / Controller Logic
    Logic

    Provides the foundational and per-report business logic

    Development speed way more important than
    runtime performance

    Full use of the Python software repository
    (e.g. API clients)

    One of the big names in Data Science
    – Reuse the eco-system to create your data models
    – Many options to handle complex data models (Numpy/SciPy, ML frameworks, …)
    – Similar variety regarding data visualization

    View Slide

  6. Jinja2 – Templating Engine
    Jinja2 – Templating Engine

    Create target-oriented views on the assembled data

    Fill data into templates for rendering in the next step

    Powerful built-in mechanisms
    – Template inclusion and inheritance for consistency
    – Macros to avoid repetition and hide technical complexity

    View Slide

  7. Sphinx – HTML Rendering
    Sphinx – HTML Rendering

    Typically used for technical software documentation
    – User manuals, API references, …
    – Initially developed for the new Python documentation

    Renders reStructured Text markup into usable documents

    Cross-references, glossaries / indexes, themes, …

    Extensible by plugins (e.g. charts)

    Output formats: HTML, PDF, Confluence Publishing, …

    View Slide

  8. Use-Cases in the Wild
    Use-Cases in the Wild

    Stakeholders (project contacts in different roles)

    System Overview (views on architectural + technical data)

    Progress Reporting (of large multi-team projects)
    – Aggregation by milestones and sub-systems
    – Multiple JIRAs and multiple queues
    – Driven by requirements linked to tasks

    View Slide

  9. Example: Stakeholders
    Example: Stakeholders

    View Slide

  10. Writing Your
    Writing Your
    Own Reports
    Own Reports

    View Slide

  11. Challenges…
    Challenges…

    Writing your own configuration for
    an existing report – relatively easy

    Otherwise, you need way more time,
    know-how, and reading the docs
    – Details on upcoming slides…

    Understanding your source data, and
    finding a way to access it

    View Slide

  12. Required Know-How
    Required Know-How

    As previously mentioned:
    Python, Jinja2, reStructured Text / Sphinx

    Working in a command line environment

    GitLab and GitLab CI

    Things are only designed for and tested under Linux
    (Mac OSX & WSL should work though)

    View Slide

  13. Technical
    Technical
    Details
    Details

    View Slide

  14. Reporting Engine Details
    Reporting Engine Details
    & General Data Flow
    & General Data Flow

    View Slide

  15. Report Script
    Report Script

    View Slide

  16. Report Template
    Report Template

    View Slide

  17. Report Markup after Injection
    Report Markup after Injection

    View Slide

  18. Result as Rendered by Sphinx
    Result as Rendered by Sphinx
    (déjà vu)
    (déjà vu)

    View Slide

  19. Things To Do Next…
    Things To Do Next…

    View Slide

  20. Questions?
    Questions?
    Thank you!
    Thank you!

    View Slide