Slide 1

Slide 1 text

analytics design patterns Valentin Zacharias, 27.11.2014

Slide 2

Slide 2 text

A design pattern is a general reusable solution to a commonly occurring problem in the design of analytics solutions. It is a best practice template for how to solve a problem that can be used in many different situations. Patterns establish a language to organize best practices. Patterns establish a knowledge baseline. (partly adapted from Wikipedia’s definition of software design patterns)

Slide 3

Slide 3 text

motivation how to scale delivery of analytics services? (beyond algorithms and tools) how to organize analytics know how and best practices? how to systematically bridge the gap between business need and tools / algorithms?

Slide 4

Slide 4 text

goal (of this presentation) spread and discuss ideas on how to structure know how and best practices in analytics present some interesting analytics design patterns that might be of use

Slide 5

Slide 5 text

me ● 13 year experience in software driven innovation o AI researcher at FZI / PhD Student at KIT o Manager of a research division at FZI concerned with all aspects of information driven decisions o Big Data consultant with codecentric o Analytics Consultant / Data Scientist with Daimler TSS

Slide 6

Slide 6 text

structure Generic CRISP-DM as organizational structure (for this talk)

Slide 7

Slide 7 text

business understanding Source: Crisp-DM Process

Slide 8

Slide 8 text

project patterns ● Data First: Find, understand and utilize patterns in a given data set ● Decision Driven: Identify and realize concrete analytics use case ● Metrics Model: Create systems to understand and optimize a business's core driver’s of growth ● Integration First: Break open data silos and harness integration

Slide 9

Slide 9 text

value patterns ● Data Driven Business o Operational Excellence o Customer Intimacy o Product Leadership ● Analytics as added Value o For Capitel Goods No unplanned Downtime Asset Optimization Enterprise Optimization ● Data & Analytics as Business

Slide 10

Slide 10 text

data understanding Source: Crisp-DM Process

Slide 11

Slide 11 text

data understanding patterns Data Source Visualization Event Log Exploration Query Based Data Extracts (w/wo sampling) Analytical Record Reverse Pivot Data Audit

Slide 12

Slide 12 text

data source visualization ● Database structure visualization ● Colors: o Blue=numeric o Green=categoric o Yellow=data/time o Red = other o White = missing

Slide 13

Slide 13 text

Analytical Record (1/2) Most Data Mining tools and methods demand their input structured around one kind of observation (rows) with a number of values (columns). The extract that represent the elements of interest (customer/product/process instance..) in this way is called the Analytical Record

Slide 14

Slide 14 text

Analytical Record (2/2) ● e.g. a feature vector representing a hosts behaviour at a point of time based on log data from IDS/Firewall (for intrusion detection) ● e.g. a feature vector representing a customer’s recent interactions with the company (calls to hotline, use of services) from different transactional / log systems

Slide 15

Slide 15 text

data preparation Source: Crisp-DM Process

Slide 16

Slide 16 text

data preparation patterns ● handling outliers o Winsorization o Trimming o Spatial Sign ● data weighting and balancing o Case Weighting o Oversampling / Undersampling o Partitioning

Slide 17

Slide 17 text

data preparation patterns ● handling missing values o Case Deletion o Available Case Analysis o Single Imputation Mean / Reasonable Value Regression o Multiple Imputation Regression o Maximum Likelihood

Slide 18

Slide 18 text

data preparation patterns ● data transformation o Generalization Abstraction o Temporal Abstraction o Rates o Dummy Variables for Categories o Centering and Scaling o Polynomials o Interaction Terms o Box Cox transformation

Slide 19

Slide 19 text

data preparation patterns ● Feature Selection ○ Filter / Proxy Based ○ Subset Selection ○ Embedded ○ Dimensionality Reduction

Slide 20

Slide 20 text

modeling Source: Crisp-DM Process

Slide 21

Slide 21 text

modeling ● Descriptive Modeling o density estimation o dependency modeling o cluster analysis / segmentation ● Predictive Modeling o Classification o Regression ● Pattern / Rule Discovery ● Content Based Retrieval ● Recommendation

Slide 22

Slide 22 text

Evaluation ● Training/Test/Validation ● LOOCV / K-Fold CV ● Bootstrap ● Expected Value and Cost Benefit Matrix

Slide 23

Slide 23 text

Deployment Source: Crisp-DM Process

Slide 24

Slide 24 text

Deployment Patterns ● PMML ● A/B Testing ● Canary Pattern ● (Visualizations)

Slide 25

Slide 25 text

Credits

Slide 26

Slide 26 text

end. valentin zacharias www.vzach.de [email protected]