$30 off During Our Annual Pro Sale. View Details »

Analytics Design Patterns

Valentin
November 27, 2014

Analytics Design Patterns

A first attempt at formalizing a set of Analytics Design Patterns created for the data2day conference.

Valentin

November 27, 2014
Tweet

More Decks by Valentin

Other Decks in Programming

Transcript

  1. analytics design
    patterns
    Valentin Zacharias, 27.11.2014

    View Slide

  2. A design pattern is a general reusable solution
    to a commonly occurring problem in the
    design of analytics solutions.
    It is a best practice template for how to
    solve a problem that can be used in many
    different situations.
    Patterns establish a language to organize
    best practices.
    Patterns establish a knowledge baseline.
    (partly adapted from Wikipedia’s definition of software design patterns)

    View Slide

  3. motivation
    how to scale delivery of analytics services?
    (beyond algorithms and tools)
    how to organize analytics know how and best
    practices?
    how to systematically bridge the gap between
    business need and tools / algorithms?

    View Slide

  4. goal (of this presentation)
    spread and discuss ideas on how to structure
    know how and best practices in analytics
    present some interesting analytics design
    patterns that might be of use

    View Slide

  5. me
    ● 13 year experience in software driven
    innovation
    o AI researcher at FZI / PhD Student at KIT
    o Manager of a research division at FZI concerned
    with all aspects of information driven decisions
    o Big Data consultant with codecentric
    o Analytics Consultant / Data Scientist with Daimler
    TSS

    View Slide

  6. structure
    Generic CRISP-DM
    as organizational
    structure (for this
    talk)

    View Slide

  7. business understanding
    Source: Crisp-DM Process

    View Slide

  8. project patterns
    ● Data First: Find, understand and utilize
    patterns in a given data set
    ● Decision Driven: Identify and realize
    concrete analytics use case
    ● Metrics Model: Create systems to
    understand and optimize a business's core
    driver’s of growth
    ● Integration First: Break open data silos and
    harness integration

    View Slide

  9. value patterns
    ● Data Driven Business
    o Operational Excellence
    o Customer Intimacy
    o Product Leadership
    ● Analytics as added Value
    o For Capitel Goods
    No unplanned Downtime
    Asset Optimization
    Enterprise Optimization
    ● Data & Analytics as Business

    View Slide

  10. data
    understanding
    Source: Crisp-DM Process

    View Slide

  11. data understanding
    patterns
    Data Source Visualization
    Event Log Exploration
    Query Based Data Extracts (w/wo sampling)
    Analytical Record
    Reverse Pivot
    Data Audit

    View Slide

  12. data source visualization
    ● Database structure
    visualization
    ● Colors:
    o Blue=numeric
    o Green=categoric
    o Yellow=data/time
    o Red = other
    o White = missing

    View Slide

  13. Analytical Record (1/2)
    Most Data Mining tools and methods demand
    their input structured around one kind of
    observation (rows) with a number of values
    (columns).
    The extract that represent the elements of
    interest (customer/product/process instance..)
    in this way is called the Analytical Record

    View Slide

  14. Analytical Record (2/2)
    ● e.g. a feature vector representing a hosts
    behaviour at a point of time based on log
    data from IDS/Firewall (for intrusion
    detection)
    ● e.g. a feature vector representing a
    customer’s recent interactions with the
    company (calls to hotline, use of services)
    from different transactional / log systems

    View Slide

  15. data preparation
    Source: Crisp-DM Process

    View Slide

  16. data preparation patterns
    ● handling outliers
    o Winsorization
    o Trimming
    o Spatial Sign
    ● data weighting and balancing
    o Case Weighting
    o Oversampling / Undersampling
    o Partitioning

    View Slide

  17. data preparation patterns
    ● handling missing values
    o Case Deletion
    o Available Case Analysis
    o Single Imputation
    Mean / Reasonable Value
    Regression
    o Multiple Imputation
    Regression
    o Maximum Likelihood

    View Slide

  18. data preparation patterns
    ● data transformation
    o Generalization Abstraction
    o Temporal Abstraction
    o Rates
    o Dummy Variables for Categories
    o Centering and Scaling
    o Polynomials
    o Interaction Terms
    o Box Cox transformation

    View Slide

  19. data preparation patterns
    ● Feature Selection
    ○ Filter / Proxy Based
    ○ Subset Selection
    ○ Embedded
    ○ Dimensionality Reduction

    View Slide

  20. modeling
    Source: Crisp-DM Process

    View Slide

  21. modeling
    ● Descriptive Modeling
    o density estimation
    o dependency modeling
    o cluster analysis / segmentation
    ● Predictive Modeling
    o Classification
    o Regression
    ● Pattern / Rule Discovery
    ● Content Based Retrieval
    ● Recommendation

    View Slide

  22. Evaluation
    ● Training/Test/Validation
    ● LOOCV / K-Fold CV
    ● Bootstrap
    ● Expected Value and Cost Benefit Matrix

    View Slide

  23. Deployment
    Source: Crisp-DM Process

    View Slide

  24. Deployment Patterns
    ● PMML
    ● A/B Testing
    ● Canary Pattern
    ● (Visualizations)

    View Slide

  25. Credits

    View Slide

  26. end.
    valentin zacharias
    www.vzach.de
    [email protected]

    View Slide