Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Business analytics with Python-based tools

Business analytics with Python-based tools

Python is arguably the leading language for open source data analytics in 2017, with a wealth of libraries, frameworks and solution to ingest, transform, aggregate, analyse, explore and visualise data, in various contexts (enterprise, scientific, journalistic…).

In this talk, we will focus on a few tools with a focus on enterprise applications:

- Pandas - a software library for data manipulation and analysis, which offers data structures and operations for manipulating numerical tables and time series.
- Apache Superset (incubating) - an enterprise-ready web application for data exploration, data visualization and dashboarding.
- OlaPy - an open source developed by the speaker’s company, which provide traditional “business intelligence” interfaces on top of Pandas
- Bonobo (ETL)

Stefane Fermigier

December 07, 2017
Tweet

More Decks by Stefane Fermigier

Other Decks in Programming

Transcript

  1. Stéfane Fermigier
    Founder & CEO, Abilian - Enterprise Social Software
    Business analytics with Python-based tools
    Paris Open Source Summit - 7 December 2017

    View Slide

  2. Intro

    View Slide

  3. Who am I ?
    • Stefane Fermigier, Python developer since
    1996
    • Organizer of the PyData Paris PyParis
    conference (2015+)
    • Founder of Abilian SAS
    • Python shop, developing business
    application (collaboration, CRM,
    workflow…)
    • R&D activity (Wendelin -> Olapy)

    View Slide

  4. Why use Python for business data analysis ?
    • Why not? :)
    • Python is one of the leading languages for
    data science / data processing, and also a
    leading language for web & business apps
    • As a Python shop, we’d like to leverage this
    leadership in data processing tools to build
    exploration / reporting features in our
    business applications using a familiar
    language
    Source:  KDnuggets

    View Slide

  5. Our goal today
    • Overview and demo a few useful tools related to business data
    analytics
    • Use a very common dataset, called “Black Friday” (sales for a
    variety of products, over a variety of categories, locations, etc.), as
    a starting point for our explorations

    View Slide

  6. “Black Friday” dataset

    View Slide

  7. Pandas & Jupyter

    View Slide

  8. Jupyter  Notebooks
    • Originally  called  iPython  notebooks  
    • Very  simple  to  use  
    • Web  based  notebook  
    • Great  environment  for  exploration  
    • Rich  text  (markdown)  inline  comments  
    • Figures  embed  into  the  documents  
    Install  
    • pip install jupyter
    Run  
    • jupyter notebook

    View Slide

  9. • Data  analysis  tools  library  
    • Built  on  NumPy,  inspired  by  R  
    • Provides  built-­‐in  data  structures  which  simplify  the  manipulation  and  
    analysis  of  data  sets.    
    • https://pandas.pydata.org/

    Use  the  following  import  convention:

    View Slide

  10. Pandas  Data  Structures
    • A  one-­‐dimensional  labeled  array  

    capable  of  holding  any  data  type   index
    A 3
    B -­‐5
    C 7
    D 4
    Series

    View Slide

  11. • A  two-­‐dimensional  labeled  data  structure  with  

    columns  of  potentially  different  types
    DataFrames
    Belgium Brussels 11190846
    India New  Delhi 1303171035
    Brazil Brasília 207847528
    Country Capital Population
    1
    2
    3
    Columns
    Index

    View Slide

  12. Working  with  files
    Read
    Write

    View Slide

  13. Advanced  manipulations
    Combining  Data

    View Slide

  14. Grouping  Data

    View Slide

  15. • Spread  rows  into  columns
    Pivot  Table

    View Slide

  16. Apache Superset (incubating)

    View Slide

  17. “A  modern,  enterprise-­‐ready  business  intelligence  tool”  
    • Data  exploration  and  visualisation  platform  
    • A  Rich  SQL  IDE  
    • A  Data  Exploration  Interface  
    • Create  and  share  interactive  dashboards  
    • Flexible  authentication  and  authorisation  
    • Customisable  and  hackable  (based  on  Flask)!  
    • Supports  many  backends  (MySQL,  Postgres,  Redshift,  SparkSQL…)

    View Slide

  18. Black  Friday  dataset
    • Table  by  Table  (  report/analyses  )
    https://github.com/apache/incubator-­‐superset

    View Slide

  19. Black  Friday  dataset
    • Table  by  Table  (  report/analyses  )
    https://github.com/apache/incubator-­‐superset

    View Slide

  20. Olapy

    View Slide

  21. Olapy

    • Developed  since  2016  by  Abilian  
    • In-­‐memory  data  processing  using  
    Pandas  
    • Aggregated  data  browsing  
    • MDX  support  
    • XMLA  interface  (-­‐>  Excel)  
    • Multiple  back-­‐ends  (CSV,  SQL)  
    • Simple  web  front-­‐end
    https://github.com/abilian/olapy

    View Slide

  22. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]

    View Slide

  23. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]
    Dimension  
    MDX  ??


    View Slide

  24. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]
    Hierarchy
    MDX  ??


    View Slide

  25. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]
    Level
    MDX  ??


    View Slide

  26. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]
    Axis
    MDX  ??


    View Slide

  27. ❖MDX  =  Multi  Dimensional  Expressions  
    ❖  SQL  extension  for  querying  a  multi-­‐dimensional  database  
    Select    
       [Geography].[Geo].[Country]    on  Rows,  
       [Time].[Calendar].[Year].[2010]  on  Columns  
    From  Sales  
    Where  [Measures].[Count]
    MDX  ??


    View Slide

  28. Extensible  Markup  Language  for  Analysis    -­‐  XMLA
    • Data  Access  Protocol  
    • Supports  exchange  of  analytical  
    data  between  clients  and  servers  
    • Available  on  Any  Device  or  Platform  
    • Using  Any  Programming  Language  
    • Just  SOAP  
    • Discover  
    • Execute

    View Slide

  29. •A  multidimensional  BD  is  a  hypercube:
    •Axes  are  called  user-­‐defined  dimensions
    •Cells  contain  measures  calculated  from  more  or  less  complex  formulas.
    •operators  on  the  cube  are  algebraic  (return  a  cube)  and  can  thus  be  combined.
    Multi-dimensional database = "super-spreadsheet"
    Geography
    Time
    Product
    2014 2015 2016
    Continent
    Country
    City
    Company
    Category
    Sub category
    dimensions
    mesures
    Black Friday
    On-­‐Line  Analytical  Processing  (OLAP)  &  Multidimensional  Databases

    View Slide

  30. Architecture
    Olapy
    XM
    LA  &  M
    DX XM
    LA
    Request Response

    View Slide

  31. Olapy  as  server
    Install:  
    • pip install olapy
    Run:  
    • olapy runserver
    ➢From  excel  go  to:  
    Data/from  other  sources/    
    from  analyses  services  
    ➢Use  URL  
    http://127.0.0.1:8000/xmla

    View Slide

  32. Olapy  as  server
    Install:  
    • pip install olapy
    Run:  
    • olapy runserver
    ➢From  excel  go  to:  
    Data/from  other  sources/    
    from  analyses  services  
    ➢Use  URL  
    http://127.0.0.1:8000/xmla

    View Slide

  33. olapy-­‐web

    • Web  client  for  olapy-­‐core  (very  basic)  
    • Interactive  data  exploration  
    • Dashboard  based  on  configuration  file  
    • Based  on  pivottable.js  and  Plotly  
    1. git clone https://github.com/abilian/olapy-web.git
    2. cd olapy-web
    3. pip install –r requirements.txt
    4. export FLASK_APP=manage.py
    5. flask run
    6. Use URL 127.0.0.1:5000 on your web browser

    View Slide

  34. olapy-­‐web

    • Web  client  for  olapy-­‐core  (very  basic)  
    • Interactive  data  exploration  
    • Dashboard  based  on  configuration  file  
    • Based  on  pivottable.js  and  Plotly  
    1. git clone https://github.com/abilian/olapy-web.git
    2. cd olapy-web
    3. pip install –r requirements.txt
    4. export FLASK_APP=manage.py
    5. flask run
    6. Use URL 127.0.0.1:5000 on your web browser

    View Slide

  35. Use  olapy  as  library
    • Execute  MDX  queries

    View Slide

  36. Roadmap
    • Version 0.5 just released !
    • WIP
    • Benchmarking & performance tuning
    • Web front-end / OnlyOffice integration
    • Integration in real projects
    • Multi-core / multi-server scalability using the wendelin-core
    out-of-core computation engine

    View Slide

  37. Bonobo ETL

    View Slide

  38. Bonobo  
    • python  3.5+

    View Slide

  39. Bonobo  with  olapy
    Source
    Transform
    Extract
    Load
    Olapy

    View Slide

  40. More tools

    View Slide

  41. Redash
    • Query  all  your  data  sources  in  one  place  
    • Convert  your  queried  data  into  visualisations  
    Online  Demo:  
    http://demo.redash.io
    https://redash.io/

    View Slide

  42. Redash
    • Query  all  your  data  sources  in  one  place  
    • Convert  your  queried  data  into  visualisations  
    Online  Demo:  
    http://demo.redash.io
    https://redash.io/

    View Slide

  43. Cubes  &  friends
    http://cubes.databrewery.org/
    • Light-­‐weight  Python  framework  and  OLAP  HTTP  server  
    • OLAP  and  aggregated  browsing  
    • Multiple  hierarchies  in  a  dimension  
    • Authentication  and  authorisation  of  cubes  and  their  data
    CubesViewer
    • Data  exploration  and  visualisation  tool  for  Cubes

    Online  demo:  
    http://www.cubesviewer.com/studio.html http://www.cubesviewer.com/

    View Slide

  44. Conclusion

    View Slide

  45. More info
    • Slides will appear soon on https://speakerdeck.com/sfermigier/
    • Repo for this talk: https://github.com/abilian/talks
    • Doc Olapy: http://olapy.readthedocs.io/en/latest/
    • Repo Olapy: https://github.com/abilian/olapy
    • Contact: [email protected]

    View Slide