Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cubes - Lightweight Python OLAP framework

Cubes - Lightweight Python OLAP framework

EuroPython 2012 talk about Cubes – lightweight Python OLAP framework for analytical modeling, multidimensional aggregation browsing and OLAP server.

Stefan Urbanek

July 13, 2012
Tweet

More Decks by Stefan Urbanek

Other Decks in Programming

Transcript

  1. Cubes
    light-weight OLAP
    Stefan Urbanek ■ @Stiivi ■ [email protected] ■ July 2012

    View Slide

  2. Overview
    ■ purpose
    ■ analytical modeling and OLAP
    ■ slicing and dicing
    ■ OLAP server
    ■ SQL backend

    View Slide

  3. analytical data modelling
    lightweight

    View Slide

  4. http://tendre.sme.sk

    View Slide

  5. aggregation browsing
    slicing and dicing

    View Slide

  6. modelling reporting
    aggregation browsing

    View Slide

  7. Architecture

    View Slide

  8. backends
    model browser

    server
    http

    View Slide

  9. Logical Model
    multidimensional, analytical

    View Slide

  10. business/analyst’s
    point of view

    View Slide

  11. transactions
    application (operational) data
    analysis
    analytical data
    OLTP OLAP

    View Slide

  12. cubes dimensions
    measures levels, attributes, hierarchy
    Model
    {
    “name” = “My Model”
    “description” = ....
    “cubes” = [...]
    “dimensions” = [...]
    }

    View Slide

  13. Facts
    fact
    most detailed information
    measurable
    fact data cell

    View Slide

  14. dimensions
    location
    type
    time

    View Slide

  15. ■ provide context for facts
    ■ used to filter queries or reports
    ■ control scope of aggregation of facts
    Dimension

    View Slide

  16. May 1st
    2010
    Hierarchy
    levels

    View Slide

  17. Dimension
    ■ levels and attributes
    ■ hierarchy*
    ■ key attributes
    ■ label attributes
    *partial support for multiple hierarchies
    “dimensions” = [
    {
    “name”:”date”,
    “levels”: ...
    “hierarchy”: ...
    },
    ...
    ]

    View Slide

  18. label attribute
    key attribute
    for links to slices

    View Slide

  19. Cube
    ■ dimensions
    ■ measures
    *partial support for multiple hierarchies
    “cubes” = [
    {
    “name”:”contracts”,
    “dimensions”: [ “date”,
    “category” ]
    “measures”: [
    {
    “name”: “amount”,
    “label”: “Contract Amount”,
    “aggregations”: [“sum”]
    }
    ]
    },
    ...
    ]

    View Slide

  20. localizable
    model and attributes
    "attributes": [
    {
    "name":"group",
    "label": "Group code"
    },
    {
    "name":"group_label",
    "label": "Group",
    "locales": ["en", "sk"]
    }
    ]

    View Slide

  21. Aggregation
    Browser

    View Slide

  22. ∑ measures

    View Slide

  23. get more details

    View Slide

  24. “batteries” that are included
    SQL Snowflake
    Browser
    Aggregation
    Browser
    SQL Denormalized
    Browser
    MongoDB Browser
    Some HTTP Data
    Service Browser
    ?

    View Slide

  25. logical model
    data
    +
    Browser Workspace

    View Slide

  26. Cell

    View Slide

  27. cell
    context of interest

    View Slide

  28. cell

    View Slide

  29. Path
    [2012, 6]
    [45,2]
    list of level keys

    View Slide

  30. workspace.browser(cube)
    load_model("model.json")
    create_workspace("sql",
    model,
    url="sqlite:///data.sqlite")
    model.cube("sales")
    Aggregation Browser
    backend
    cubes
    Application

    1
    2
    3
    4

    View Slide

  31. summary
    drill-down

    View Slide

  32. browser.aggregate(o cell)
    summary

    View Slide

  33. browser.aggregate(o cell,
    . drilldown=[9 "sector"])
    drill-down

    View Slide

  34. row[q label_attribute] row[k key]
    for row in result.drilldown:
    row["amount_sum"]

    View Slide

  35. received_amount_sum
    record_count
    measure aggregation

    View Slide

  36. browser.facts(o cell)
    browser.values(o cell, 9 dimension)
    browser.cell_details(o cell)

    View Slide

  37. Slicing and Dicing


    View Slide

  38. type
    supplier
    date
    construction work
    in april 2012
    construc
    tion work
    April 2012


    View Slide

  39. cut types

    point set range
    [2010]
    [[2010,10],
    [2010,12]]
    from=[2010,10]
    to=[2010,12]

    View Slide

  40. Implicit Hierarchy
    drilldown

    View Slide

  41. ✂ cut = PointCut(9 “date”, [2010])
    o cell = o cell.slice(✂ cut)
    browser.aggregate(o cell,
    drilldown=[9 “date”])
    2006 2007 2008 2009 2010
    Total
    Jan Feb Mar Apr March April May ...
    whole cube
    o cell = Cell(cube)
    browser.aggregate(o cell)
    browser.aggregate(o cell,
    drilldown=[9 “date”])

    View Slide

  42. . drilldown = [9 "date"]
    . drilldown = {9 "date": "month"}
    implicit: next from o cell
    explicit
    Drill-down Level

    View Slide

  43. Cross Table
    experimental interface

    View Slide

  44. 2009 2010
    Assets Due from Banks 3044 1803
    Assets Investments 41012 36012
    Assets Loans Outstanding 103657 118104
    Assets Nonnegotiable 1202 1123
    Assets Other Assets 2247 3071
    Assets Other Receivables 984 811
    Assets Receivables 176 171
    Assets Securities 33 289
    Equity Capital Stock 11491 11492
    Equity Deferred Amounts 359 313
    Equity Other -1683 -3043
    Equity Retained Earnings 29870 28793
    Liabilities Borrowings 110040 128577
    Liabilities Derivative Liabilities 115642 110418
    Liabilities Other 57 8
    Liabilities Other Liabilities 7321 5454
    Liabilities Sold or Lent 2323 998

    View Slide

  45. rows = ["item.category",
    "item.subcategory"]
    columns = ["year"]
    measures = ["amount_sum"]
    table = result.cross_table(
    rows,
    columns,
    measures
    )

    View Slide

  46. Slicer
    The HTTP OLAP Server

    View Slide

  47. HTTP JSON
    Slicer
    Aggregation Browser

    Application

    View Slide

  48. GET /model
    GET /aggregate
    GET /values
    GET /report

    View Slide

  49. w
    logical model configuration data
    $ slicer serve slicer.ini

    View Slide

  50. [server]
    backend: sql
    log_level: info
    [model]
    path: model.json
    locales: en,sk
    [workspace]
    url: postgres://localhost/database
    schema: datamart
    fact_prefix: ft_
    dimension_prefix: dm_
    w

    View Slide

  51. ∑amount
    GET /aggregate

    View Slide

  52. GET aggregate
    {
    "cell": [],
    "drilldown": [],
    "summary": {
    "record_count": 62,
    "amount_sum": 1116860
    }
    }

    View Slide

  53. ∑amount
    GET /aggregate?cut=date:2010

    View Slide

  54. GET aggregate?cut=year:2010
    {
    "cell": [
    {
    "path": ["2010"],
    "type": "point",
    "dimension": "year",
    "level_depth": 1
    }
    ],
    "drilldown": [],
    "summary": {
    "record_count": 31,
    "amount_sum": 566020
    }
    }

    View Slide

  55. GET aggregate?drilldown=year
    {
    "cell": [],
    "total_cell_count": 2,
    "drilldown": [
    {
    "record_count": 31,
    "amount_sum": 550840,
    "year": 2009
    },
    {
    "record_count": 31,
    "amount_sum": 566020,
    "year": 2010
    }
    ],
    "summary": {
    "record_count": 62,
    "amount_sum": 1116860
    }
    }

    View Slide

  56. GET report
    Content-Type: application/json
    {
    "cell": [
    {
    "dimension": "date",
    "type": "range",
    "from": [2009],
    "to": [2011,6]
    }
    ],
    "queries": {
    "by_segment": {
    "query": "aggregate",
    "drilldown": ["segment"]
    },
    "by_year": {
    "query": "aggregate",
    "drilldown": {"date":"year"}
    }
    }
    }
    list of cuts
    list of
    named queries

    View Slide

  57. SQL Backend
    What data it works with?

    View Slide

  58. or
    ★ ❄

    View Slide


  59. fact table
    dimensions

    View Slide


  60. fact table
    dimensions

    View Slide

  61. View Slide

  62. denormalized view
    snowflake ❄
    Aggregation Browser
    Snowflake
    Mapper
    Denormalized
    Mapper
    Browsing Context
    or

    View Slide


  63. logical
    physical

    View Slide

  64. SQL Features
    ■ does not require DB write access
    ■ denormalisation
    ■ denormalised browsing, indexing
    ■ simple date datatype dimension
    ■ extraction of date parts during mapping
    ■ multiple schema support

    View Slide

  65. Slicer
    command-line tool

    View Slide

  66. ■ model validation
    slicer model validate model.json
    ■ model translation
    slicer model translate model.json translation.json
    ■ workspace testing
    slicer test config.ini
    ■ denormalization
    slicer denormalize --materialize --index config.ini

    View Slide

  67. Future

    View Slide

  68. ■ formatters for visualisation libraries
    ■ JavaScript library*
    ■ backends
    ■ derived measures
    help needed
    *http://github.com/Stiivi/cubes-js

    View Slide

  69. Open Data
    ■ shared repository of models
    ■ shared repository of dimensions
    ■ public cubes
    open Slicer HTTP APIs
    http://github.com/Stiivi/cubes/wiki

    View Slide

  70. stay light
    Nutrition Facts
    Serving Size 1 cube
    Total Fat 0g
    Trans Fat 0g
    Amount Per Serving
    Saturated Fat 0g
    % Daily Value
    0%

    View Slide

  71. Thank You
    source:
    github.com/Stiivi/cubes
    documentation:
    packages.python.org/cubes/
    examples:
    github.com/Stiivi/cubes-examples

    View Slide

  72. bit.ly/cubes-ep2012
    this presentation:

    View Slide