Slide 1

Slide 1 text

Cubes light-weight OLAP Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012

Slide 2

Slide 2 text

Overview ■ purpose ■ analytical modeling and OLAP ■ slicing and dicing ■ OLAP server ■ SQL backend

Slide 3

Slide 3 text

analytical data modelling lightweight

Slide 4

Slide 4 text

http://tendre.sme.sk

Slide 5

Slide 5 text

aggregation browsing slicing and dicing

Slide 6

Slide 6 text

modelling reporting aggregation browsing

Slide 7

Slide 7 text

Architecture

Slide 8

Slide 8 text

backends model browser ✂ server http

Slide 9

Slide 9 text

Logical Model multidimensional, analytical

Slide 10

Slide 10 text

business/analyst’s point of view

Slide 11

Slide 11 text

transactions application (operational) data analysis analytical data OLTP OLAP

Slide 12

Slide 12 text

cubes dimensions measures levels, attributes, hierarchy Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] }

Slide 13

Slide 13 text

Facts fact most detailed information measurable fact data cell

Slide 14

Slide 14 text

dimensions location type time

Slide 15

Slide 15 text

■ provide context for facts ■ used to filter queries or reports ■ control scope of aggregation of facts Dimension

Slide 16

Slide 16 text

May 1st 2010 Hierarchy levels

Slide 17

Slide 17 text

Dimension ■ levels and attributes ■ hierarchy* ■ key attributes ■ label attributes *partial support for multiple hierarchies “dimensions” = [ { “name”:”date”, “levels”: ... “hierarchy”: ... }, ... ]

Slide 18

Slide 18 text

label attribute key attribute for links to slices

Slide 19

Slide 19 text

Cube ■ dimensions ■ measures *partial support for multiple hierarchies “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [ { “name”: “amount”, “label”: “Contract Amount”, “aggregations”: [“sum”] } ] }, ... ]

Slide 20

Slide 20 text

localizable model and attributes "attributes": [ { "name":"group", "label": "Group code" }, { "name":"group_label", "label": "Group", "locales": ["en", "sk"] } ]

Slide 21

Slide 21 text

Aggregation Browser ∑

Slide 22

Slide 22 text

∑ measures

Slide 23

Slide 23 text

get more details

Slide 24

Slide 24 text

“batteries” that are included SQL Snowflake Browser Aggregation Browser SQL Denormalized Browser MongoDB Browser Some HTTP Data Service Browser ?

Slide 25

Slide 25 text

logical model data + Browser Workspace

Slide 26

Slide 26 text

Cell

Slide 27

Slide 27 text

cell context of interest

Slide 28

Slide 28 text

cell

Slide 29

Slide 29 text

Path [2012, 6] [45,2] list of level keys

Slide 30

Slide 30 text

workspace.browser(cube) load_model("model.json") create_workspace("sql", model, url="sqlite:///data.sqlite") model.cube("sales") Aggregation Browser backend cubes Application ∑ 1 2 3 4

Slide 31

Slide 31 text

summary drill-down

Slide 32

Slide 32 text

browser.aggregate(o cell) summary

Slide 33

Slide 33 text

browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down

Slide 34

Slide 34 text

row[q label_attribute] row[k key] for row in result.drilldown: row["amount_sum"]

Slide 35

Slide 35 text

received_amount_sum record_count measure aggregation

Slide 36

Slide 36 text

browser.facts(o cell) browser.values(o cell, 9 dimension) browser.cell_details(o cell)

Slide 37

Slide 37 text

Slicing and Dicing ✂ ✂

Slide 38

Slide 38 text

type supplier date construction work in april 2012 construc tion work April 2012 ✂ ✂

Slide 39

Slide 39 text

cut types ✂ point set range [2010] [[2010,10], [2010,12]] from=[2010,10] to=[2010,12]

Slide 40

Slide 40 text

Implicit Hierarchy drilldown

Slide 41

Slide 41 text

✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 Total Jan Feb Mar Apr March April May ... whole cube o cell = Cell(cube) browser.aggregate(o cell) browser.aggregate(o cell, drilldown=[9 “date”])

Slide 42

Slide 42 text

. drilldown = [9 "date"] . drilldown = {9 "date": "month"} implicit: next from o cell explicit Drill-down Level

Slide 43

Slide 43 text

Cross Table experimental interface

Slide 44

Slide 44 text

2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793 Liabilities Borrowings 110040 128577 Liabilities Derivative Liabilities 115642 110418 Liabilities Other 57 8 Liabilities Other Liabilities 7321 5454 Liabilities Sold or Lent 2323 998

Slide 45

Slide 45 text

rows = ["item.category", "item.subcategory"] columns = ["year"] measures = ["amount_sum"] table = result.cross_table( rows, columns, measures )

Slide 46

Slide 46 text

Slicer The HTTP OLAP Server ✂

Slide 47

Slide 47 text

HTTP JSON Slicer Aggregation Browser ∑ Application

Slide 48

Slide 48 text

GET /model GET /aggregate GET /values GET /report

Slide 49

Slide 49 text

w logical model configuration data $ slicer serve slicer.ini

Slide 50

Slide 50 text

[server] backend: sql log_level: info [model] path: model.json locales: en,sk [workspace] url: postgres://localhost/database schema: datamart fact_prefix: ft_ dimension_prefix: dm_ w

Slide 51

Slide 51 text

∑amount GET /aggregate

Slide 52

Slide 52 text

GET aggregate { "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 } }

Slide 53

Slide 53 text

∑amount GET /aggregate?cut=date:2010 ✂

Slide 54

Slide 54 text

GET aggregate?cut=year:2010 { "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 } }

Slide 55

Slide 55 text

GET aggregate?drilldown=year { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }

Slide 56

Slide 56 text

GET report Content-Type: application/json { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { "by_segment": { "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } } list of cuts list of named queries

Slide 57

Slide 57 text

SQL Backend What data it works with?

Slide 58

Slide 58 text

or ★ ❄

Slide 59

Slide 59 text

★ fact table dimensions

Slide 60

Slide 60 text

❄ fact table dimensions

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

denormalized view snowflake ❄ Aggregation Browser Snowflake Mapper Denormalized Mapper Browsing Context or

Slide 63

Slide 63 text

❄ logical physical

Slide 64

Slide 64 text

SQL Features ■ does not require DB write access ■ denormalisation ■ denormalised browsing, indexing ■ simple date datatype dimension ■ extraction of date parts during mapping ■ multiple schema support

Slide 65

Slide 65 text

Slicer command-line tool

Slide 66

Slide 66 text

■ model validation slicer model validate model.json ■ model translation slicer model translate model.json translation.json ■ workspace testing slicer test config.ini ■ denormalization slicer denormalize --materialize --index config.ini

Slide 67

Slide 67 text

Future

Slide 68

Slide 68 text

■ formatters for visualisation libraries ■ JavaScript library* ■ backends ■ derived measures help needed *http://github.com/Stiivi/cubes-js

Slide 69

Slide 69 text

Open Data ■ shared repository of models ■ shared repository of dimensions ■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki

Slide 70

Slide 70 text

stay light Nutrition Facts Serving Size 1 cube Total Fat 0g Trans Fat 0g Amount Per Serving Saturated Fat 0g % Daily Value 0%

Slide 71

Slide 71 text

Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples: github.com/Stiivi/cubes-examples

Slide 72

Slide 72 text

bit.ly/cubes-ep2012 this presentation: