many definitions, obscure definitions. Vague opinions in production. ▪︎ Slow time-to-market Time from a requirement or from observing a change to deployment in the production takes too much time. ▪︎ Low performance … despite having the best hardware, systems, algorithms. (some of it)
any known to work. Any separation of concerns is better than none. 2. Make it formal and documented. Otherwise our effort will be dissolved and the content swampified. 3. Stick with it for a while and observe. 4. Adjust as necessary.
spreadsheet Software at hand, no installation needed; universal, readable and editable by non-engineers. 3. Suffer through the spreadsheet-exchange drill phase Mirror of our processes – seeing the genuine pain points will be useful later. 4. Use functional approach to metadata composition and application … from those spreadsheets. Example: relational algebra library in the language of our ecosystem. 99.(later) Move spreadsheets into a metadata repository
To Data … metadata data quality measurements data quality indicators data metadata definition, computation, warning/error thresholds, ownership, affected business entity, …
How can we drill down? User Interface Metadata Physical Data Region … name Sales Revenue Visits … … 3 2 1 id Cubes Geography … name Date 2 … id … 1 Dimensions Europe Germany Berlin regions Country City Levels 2 region_code country_name … 2 Country 3 country_iso 1 key Region … name id City 2 dim label 2 region_name city_name city_code … countries cities generated which column? concept-to-user propagation
& SQL → Metadata Logical Model Physical Physical Data Store Query Context Input Output Cube all attributes base attributes ⨝ joins database metadata Store Mapper locale parameters create schema collect and sort dependencies map attributes mappings mappings of base attributes fact table naming convention hierarchies Star Schema ̣/❄ compile attributes base attributes dependant attributes columns make star (topological sort) query attributes SQL Query Context create context base columns column expressions for attributes SELECT, GROUP BY “star” join statement FROM conditions WHERE Cubes 1.1 – SQL Query Construction A,B,C? SQL
source of truth derived and managed artefacts Metadata ∑ ∑ ∑ ∑ Multi-Dimensional Query Server ∑ Aggregator metadata repository past 12 months ? ⨝s are expensive Alternative artefacts: a multi-dimensional data store
(structural) separation of concerns, modularity optimisation through better reasoning Complexity separation of concerns, destroy-ability reduction of problem-space, coping with heterogeneity Threats transparency, separation of quality data accounting, verifiable data quality, provable consistency, source of truth