Technology Insight · Document the Data – Creating Reports Using Docs Tooling
As a developer, use your existing skill set regarding programming and documentation to generate technical or business reports from distributed data sources.
Architecture Board Application Monitoring Service Phase 1 – Metrics Gateway Jürgen Hermann PyData Meetup KA · 2018-06 Technology Insight Document the Data Creating Reports Using Docs Tooling jhermann [email protected]
Goals & Requirements Goals & Requirements ● Mine already existing but distributed data ● Generate new insights… – by federation of isolated knowledge – by views specifically designed for different audiences ● Avoid one-shot reporting efforts – Commonly done manually & thus expensive / not amortized – Sustainable & Continuous – “After the audit is before the audit” ● Create demand for complete & correct data – Provide motivation for data entry and maintenance – Data quality is driven by data usage
Basic Solution Idea: Basic Solution Idea: Use the Tools You Know Use the Tools You Know (as a Developer) (as a Developer) ● Documentation tools have a lot of similarities with classic reporting tools ● Most everything is text on the way to the final rendering, and thus easily worked with / debugged ● Just another application domain of these tools – use existing / easily gained & retained working knowledge ● Technology stack used here: Python3 · Jinja2 · Sphinx
Python3 – Python3 – CLI and and Model / Model / Controller Logic Logic ● Provides the foundational and per-report business logic ● Development speed way more important than runtime performance ● Full use of the Python software repository (e.g. API clients) ● One of the big names in Data Science – Reuse the eco-system to create your data models – Many options to handle complex data models (Numpy/SciPy, ML frameworks, …) – Similar variety regarding data visualization
Jinja2 – Templating Engine Jinja2 – Templating Engine ● Create target-oriented views on the assembled data ● Fill data into templates for rendering in the next step ● Powerful built-in mechanisms – Template inclusion and inheritance for consistency – Macros to avoid repetition and hide technical complexity
Sphinx – HTML Rendering Sphinx – HTML Rendering ● Typically used for technical software documentation – User manuals, API references, … – Initially developed for the new Python documentation ● Renders reStructured Text markup into usable documents ● Cross-references, glossaries / indexes, themes, … ● Extensible by plugins (e.g. charts) ● Output formats: HTML, PDF, Confluence Publishing, …
Use-Cases in the Wild Use-Cases in the Wild ● Stakeholders (project contacts in different roles) ● System Overview (views on architectural + technical data) ● Progress Reporting (of large multi-team projects) – Aggregation by milestones and sub-systems – Multiple JIRAs and multiple queues – Driven by requirements linked to tasks
Challenges… Challenges… ● Writing your own configuration for an existing report – relatively easy ● Otherwise, you need way more time, know-how, and reading the docs – Details on upcoming slides… ● Understanding your source data, and finding a way to access it
Required Know-How Required Know-How ● As previously mentioned: Python, Jinja2, reStructured Text / Sphinx ● Working in a command line environment ● GitLab and GitLab CI ● Things are only designed for and tested under Linux (Mac OSX & WSL should work though)