Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cubes - Lightweight Python OLAP framework

Cubes - Lightweight Python OLAP framework

EuroPython 2012 talk about Cubes – lightweight Python OLAP framework for analytical modeling, multidimensional aggregation browsing and OLAP server.

Stefan Urbanek

July 13, 2012
Tweet

More Decks by Stefan Urbanek

Other Decks in Programming

Transcript

  1. Cubes
    light-weight OLAP
    Stefan Urbanek ■ @Stiivi ■ [email protected] ■ July 2012

    View full-size slide

  2. Overview
    ■ purpose
    ■ analytical modeling and OLAP
    ■ slicing and dicing
    ■ OLAP server
    ■ SQL backend

    View full-size slide

  3. analytical data modelling
    lightweight

    View full-size slide

  4. http://tendre.sme.sk

    View full-size slide

  5. aggregation browsing
    slicing and dicing

    View full-size slide

  6. modelling reporting
    aggregation browsing

    View full-size slide

  7. Architecture

    View full-size slide

  8. backends
    model browser

    server
    http

    View full-size slide

  9. Logical Model
    multidimensional, analytical

    View full-size slide

  10. business/analyst’s
    point of view

    View full-size slide

  11. transactions
    application (operational) data
    analysis
    analytical data
    OLTP OLAP

    View full-size slide

  12. cubes dimensions
    measures levels, attributes, hierarchy
    Model
    {
    “name” = “My Model”
    “description” = ....
    “cubes” = [...]
    “dimensions” = [...]
    }

    View full-size slide

  13. Facts
    fact
    most detailed information
    measurable
    fact data cell

    View full-size slide

  14. dimensions
    location
    type
    time

    View full-size slide

  15. ■ provide context for facts
    ■ used to filter queries or reports
    ■ control scope of aggregation of facts
    Dimension

    View full-size slide

  16. May 1st
    2010
    Hierarchy
    levels

    View full-size slide

  17. Dimension
    ■ levels and attributes
    ■ hierarchy*
    ■ key attributes
    ■ label attributes
    *partial support for multiple hierarchies
    “dimensions” = [
    {
    “name”:”date”,
    “levels”: ...
    “hierarchy”: ...
    },
    ...
    ]

    View full-size slide

  18. label attribute
    key attribute
    for links to slices

    View full-size slide

  19. Cube
    ■ dimensions
    ■ measures
    *partial support for multiple hierarchies
    “cubes” = [
    {
    “name”:”contracts”,
    “dimensions”: [ “date”,
    “category” ]
    “measures”: [
    {
    “name”: “amount”,
    “label”: “Contract Amount”,
    “aggregations”: [“sum”]
    }
    ]
    },
    ...
    ]

    View full-size slide

  20. localizable
    model and attributes
    "attributes": [
    {
    "name":"group",
    "label": "Group code"
    },
    {
    "name":"group_label",
    "label": "Group",
    "locales": ["en", "sk"]
    }
    ]

    View full-size slide

  21. Aggregation
    Browser

    View full-size slide

  22. ∑ measures

    View full-size slide

  23. get more details

    View full-size slide

  24. “batteries” that are included
    SQL Snowflake
    Browser
    Aggregation
    Browser
    SQL Denormalized
    Browser
    MongoDB Browser
    Some HTTP Data
    Service Browser
    ?

    View full-size slide

  25. logical model
    data
    +
    Browser Workspace

    View full-size slide

  26. cell
    context of interest

    View full-size slide

  27. Path
    [2012, 6]
    [45,2]
    list of level keys

    View full-size slide

  28. workspace.browser(cube)
    load_model("model.json")
    create_workspace("sql",
    model,
    url="sqlite:///data.sqlite")
    model.cube("sales")
    Aggregation Browser
    backend
    cubes
    Application

    1
    2
    3
    4

    View full-size slide

  29. summary
    drill-down

    View full-size slide

  30. browser.aggregate(o cell)
    summary

    View full-size slide

  31. browser.aggregate(o cell,
    . drilldown=[9 "sector"])
    drill-down

    View full-size slide

  32. row[q label_attribute] row[k key]
    for row in result.drilldown:
    row["amount_sum"]

    View full-size slide

  33. received_amount_sum
    record_count
    measure aggregation

    View full-size slide

  34. browser.facts(o cell)
    browser.values(o cell, 9 dimension)
    browser.cell_details(o cell)

    View full-size slide

  35. Slicing and Dicing


    View full-size slide

  36. type
    supplier
    date
    construction work
    in april 2012
    construc
    tion work
    April 2012


    View full-size slide

  37. cut types

    point set range
    [2010]
    [[2010,10],
    [2010,12]]
    from=[2010,10]
    to=[2010,12]

    View full-size slide

  38. Implicit Hierarchy
    drilldown

    View full-size slide

  39. ✂ cut = PointCut(9 “date”, [2010])
    o cell = o cell.slice(✂ cut)
    browser.aggregate(o cell,
    drilldown=[9 “date”])
    2006 2007 2008 2009 2010
    Total
    Jan Feb Mar Apr March April May ...
    whole cube
    o cell = Cell(cube)
    browser.aggregate(o cell)
    browser.aggregate(o cell,
    drilldown=[9 “date”])

    View full-size slide

  40. . drilldown = [9 "date"]
    . drilldown = {9 "date": "month"}
    implicit: next from o cell
    explicit
    Drill-down Level

    View full-size slide

  41. Cross Table
    experimental interface

    View full-size slide

  42. 2009 2010
    Assets Due from Banks 3044 1803
    Assets Investments 41012 36012
    Assets Loans Outstanding 103657 118104
    Assets Nonnegotiable 1202 1123
    Assets Other Assets 2247 3071
    Assets Other Receivables 984 811
    Assets Receivables 176 171
    Assets Securities 33 289
    Equity Capital Stock 11491 11492
    Equity Deferred Amounts 359 313
    Equity Other -1683 -3043
    Equity Retained Earnings 29870 28793
    Liabilities Borrowings 110040 128577
    Liabilities Derivative Liabilities 115642 110418
    Liabilities Other 57 8
    Liabilities Other Liabilities 7321 5454
    Liabilities Sold or Lent 2323 998

    View full-size slide

  43. rows = ["item.category",
    "item.subcategory"]
    columns = ["year"]
    measures = ["amount_sum"]
    table = result.cross_table(
    rows,
    columns,
    measures
    )

    View full-size slide

  44. Slicer
    The HTTP OLAP Server

    View full-size slide

  45. HTTP JSON
    Slicer
    Aggregation Browser

    Application

    View full-size slide

  46. GET /model
    GET /aggregate
    GET /values
    GET /report

    View full-size slide

  47. w
    logical model configuration data
    $ slicer serve slicer.ini

    View full-size slide

  48. [server]
    backend: sql
    log_level: info
    [model]
    path: model.json
    locales: en,sk
    [workspace]
    url: postgres://localhost/database
    schema: datamart
    fact_prefix: ft_
    dimension_prefix: dm_
    w

    View full-size slide

  49. ∑amount
    GET /aggregate

    View full-size slide

  50. GET aggregate
    {
    "cell": [],
    "drilldown": [],
    "summary": {
    "record_count": 62,
    "amount_sum": 1116860
    }
    }

    View full-size slide

  51. ∑amount
    GET /aggregate?cut=date:2010

    View full-size slide

  52. GET aggregate?cut=year:2010
    {
    "cell": [
    {
    "path": ["2010"],
    "type": "point",
    "dimension": "year",
    "level_depth": 1
    }
    ],
    "drilldown": [],
    "summary": {
    "record_count": 31,
    "amount_sum": 566020
    }
    }

    View full-size slide

  53. GET aggregate?drilldown=year
    {
    "cell": [],
    "total_cell_count": 2,
    "drilldown": [
    {
    "record_count": 31,
    "amount_sum": 550840,
    "year": 2009
    },
    {
    "record_count": 31,
    "amount_sum": 566020,
    "year": 2010
    }
    ],
    "summary": {
    "record_count": 62,
    "amount_sum": 1116860
    }
    }

    View full-size slide

  54. GET report
    Content-Type: application/json
    {
    "cell": [
    {
    "dimension": "date",
    "type": "range",
    "from": [2009],
    "to": [2011,6]
    }
    ],
    "queries": {
    "by_segment": {
    "query": "aggregate",
    "drilldown": ["segment"]
    },
    "by_year": {
    "query": "aggregate",
    "drilldown": {"date":"year"}
    }
    }
    }
    list of cuts
    list of
    named queries

    View full-size slide

  55. SQL Backend
    What data it works with?

    View full-size slide


  56. fact table
    dimensions

    View full-size slide


  57. fact table
    dimensions

    View full-size slide

  58. denormalized view
    snowflake ❄
    Aggregation Browser
    Snowflake
    Mapper
    Denormalized
    Mapper
    Browsing Context
    or

    View full-size slide


  59. logical
    physical

    View full-size slide

  60. SQL Features
    ■ does not require DB write access
    ■ denormalisation
    ■ denormalised browsing, indexing
    ■ simple date datatype dimension
    ■ extraction of date parts during mapping
    ■ multiple schema support

    View full-size slide

  61. Slicer
    command-line tool

    View full-size slide

  62. ■ model validation
    slicer model validate model.json
    ■ model translation
    slicer model translate model.json translation.json
    ■ workspace testing
    slicer test config.ini
    ■ denormalization
    slicer denormalize --materialize --index config.ini

    View full-size slide

  63. ■ formatters for visualisation libraries
    ■ JavaScript library*
    ■ backends
    ■ derived measures
    help needed
    *http://github.com/Stiivi/cubes-js

    View full-size slide

  64. Open Data
    ■ shared repository of models
    ■ shared repository of dimensions
    ■ public cubes
    open Slicer HTTP APIs
    http://github.com/Stiivi/cubes/wiki

    View full-size slide

  65. stay light
    Nutrition Facts
    Serving Size 1 cube
    Total Fat 0g
    Trans Fat 0g
    Amount Per Serving
    Saturated Fat 0g
    % Daily Value
    0%

    View full-size slide

  66. Thank You
    source:
    github.com/Stiivi/cubes
    documentation:
    packages.python.org/cubes/
    examples:
    github.com/Stiivi/cubes-examples

    View full-size slide

  67. bit.ly/cubes-ep2012
    this presentation:

    View full-size slide