Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData Südwest Meetup KA 2018-06-13

PyData Südwest Meetup KA 2018-06-13

Technology Insight · Document the Data – Creating Reports Using Docs Tooling

As a developer, use your existing skill set regarding programming and documentation to generate technical or business reports from distributed data sources.


Jürgen Hermann

June 13, 2018

More Decks by Jürgen Hermann

Other Decks in Technology


  1. Architecture Board Application Monitoring Service Phase 1 – Metrics Gateway

    Jürgen Hermann PyData Meetup KA · 2018-06 Technology Insight Document the Data Creating Reports Using Docs Tooling jhermann [email protected]
  2. Goals & Requirements Goals & Requirements • Mine already existing

    but distributed data • Generate new insights… – by federation of isolated knowledge – by views specifically designed for different audiences • Avoid one-shot reporting efforts – Commonly done manually & thus expensive / not amortized – Sustainable & Continuous – “After the audit is before the audit” • Create demand for complete & correct data – Provide motivation for data entry and maintenance – Data quality is driven by data usage
  3. Basic Solution Idea: Basic Solution Idea: Use the Tools You

    Know Use the Tools You Know (as a Developer) (as a Developer) • Documentation tools have a lot of similarities with classic reporting tools • Most everything is text on the way to the final rendering, and thus easily worked with / debugged • Just another application domain of these tools – use existing / easily gained & retained working knowledge • Technology stack used here: Python3 · Jinja2 · Sphinx
  4. Python3 – Python3 – CLI and and Model / Model

    / Controller Logic Logic • Provides the foundational and per-report business logic • Development speed way more important than runtime performance • Full use of the Python software repository (e.g. API clients) • One of the big names in Data Science – Reuse the eco-system to create your data models – Many options to handle complex data models (Numpy/SciPy, ML frameworks, …) – Similar variety regarding data visualization
  5. Jinja2 – Templating Engine Jinja2 – Templating Engine • Create

    target-oriented views on the assembled data • Fill data into templates for rendering in the next step • Powerful built-in mechanisms – Template inclusion and inheritance for consistency – Macros to avoid repetition and hide technical complexity
  6. Sphinx – HTML Rendering Sphinx – HTML Rendering • Typically

    used for technical software documentation – User manuals, API references, … – Initially developed for the new Python documentation • Renders reStructured Text markup into usable documents • Cross-references, glossaries / indexes, themes, … • Extensible by plugins (e.g. charts) • Output formats: HTML, PDF, Confluence Publishing, …
  7. Use-Cases in the Wild Use-Cases in the Wild • Stakeholders

    (project contacts in different roles) • System Overview (views on architectural + technical data) • Progress Reporting (of large multi-team projects) – Aggregation by milestones and sub-systems – Multiple JIRAs and multiple queues – Driven by requirements linked to tasks
  8. Challenges… Challenges… • Writing your own configuration for an existing

    report – relatively easy • Otherwise, you need way more time, know-how, and reading the docs – Details on upcoming slides… • Understanding your source data, and finding a way to access it
  9. Required Know-How Required Know-How • As previously mentioned: Python, Jinja2,

    reStructured Text / Sphinx • Working in a command line environment • GitLab and GitLab CI • Things are only designed for and tested under Linux (Mac OSX & WSL should work though)