Analytics Design Patterns

B911d14451f50b883b4c4a122226b7f4?s=47 Valentin
November 27, 2014

Analytics Design Patterns

A first attempt at formalizing a set of Analytics Design Patterns created for the data2day conference.



November 27, 2014


  1. analytics design patterns Valentin Zacharias, 27.11.2014

  2. A design pattern is a general reusable solution to a

    commonly occurring problem in the design of analytics solutions. It is a best practice template for how to solve a problem that can be used in many different situations. Patterns establish a language to organize best practices. Patterns establish a knowledge baseline. (partly adapted from Wikipedia’s definition of software design patterns)
  3. motivation how to scale delivery of analytics services? (beyond algorithms

    and tools) how to organize analytics know how and best practices? how to systematically bridge the gap between business need and tools / algorithms?
  4. goal (of this presentation) spread and discuss ideas on how

    to structure know how and best practices in analytics present some interesting analytics design patterns that might be of use
  5. me • 13 year experience in software driven innovation o

    AI researcher at FZI / PhD Student at KIT o Manager of a research division at FZI concerned with all aspects of information driven decisions o Big Data consultant with codecentric o Analytics Consultant / Data Scientist with Daimler TSS
  6. structure Generic CRISP-DM as organizational structure (for this talk)

  7. business understanding Source: Crisp-DM Process

  8. project patterns • Data First: Find, understand and utilize patterns

    in a given data set • Decision Driven: Identify and realize concrete analytics use case • Metrics Model: Create systems to understand and optimize a business's core driver’s of growth • Integration First: Break open data silos and harness integration
  9. value patterns • Data Driven Business o Operational Excellence o

    Customer Intimacy o Product Leadership • Analytics as added Value o For Capitel Goods No unplanned Downtime Asset Optimization Enterprise Optimization • Data & Analytics as Business
  10. data understanding Source: Crisp-DM Process

  11. data understanding patterns Data Source Visualization Event Log Exploration Query

    Based Data Extracts (w/wo sampling) Analytical Record Reverse Pivot Data Audit
  12. data source visualization • Database structure visualization • Colors: o

    Blue=numeric o Green=categoric o Yellow=data/time o Red = other o White = missing
  13. Analytical Record (1/2) Most Data Mining tools and methods demand

    their input structured around one kind of observation (rows) with a number of values (columns). The extract that represent the elements of interest (customer/product/process instance..) in this way is called the Analytical Record
  14. Analytical Record (2/2) • e.g. a feature vector representing a

    hosts behaviour at a point of time based on log data from IDS/Firewall (for intrusion detection) • e.g. a feature vector representing a customer’s recent interactions with the company (calls to hotline, use of services) from different transactional / log systems
  15. data preparation Source: Crisp-DM Process

  16. data preparation patterns • handling outliers o Winsorization o Trimming

    o Spatial Sign • data weighting and balancing o Case Weighting o Oversampling / Undersampling o Partitioning
  17. data preparation patterns • handling missing values o Case Deletion

    o Available Case Analysis o Single Imputation Mean / Reasonable Value Regression o Multiple Imputation Regression o Maximum Likelihood
  18. data preparation patterns • data transformation o Generalization Abstraction o

    Temporal Abstraction o Rates o Dummy Variables for Categories o Centering and Scaling o Polynomials o Interaction Terms o Box Cox transformation
  19. data preparation patterns • Feature Selection ◦ Filter / Proxy

    Based ◦ Subset Selection ◦ Embedded ◦ Dimensionality Reduction
  20. modeling Source: Crisp-DM Process

  21. modeling • Descriptive Modeling o density estimation o dependency modeling

    o cluster analysis / segmentation • Predictive Modeling o Classification o Regression • Pattern / Rule Discovery • Content Based Retrieval • Recommendation
  22. Evaluation • Training/Test/Validation • LOOCV / K-Fold CV • Bootstrap

    • Expected Value and Cost Benefit Matrix
  23. Deployment Source: Crisp-DM Process

  24. Deployment Patterns • PMML • A/B Testing • Canary Pattern

    • (Visualizations)
  25. Credits

  26. end. valentin zacharias