A design pattern is a general reusable solution to a commonly occurring problem in the design of analytics solutions. It is a best practice template for how to solve a problem that can be used in many different situations. Patterns establish a language to organize best practices. Patterns establish a knowledge baseline. (partly adapted from Wikipedia’s definition of software design patterns)
motivation how to scale delivery of analytics services? (beyond algorithms and tools) how to organize analytics know how and best practices? how to systematically bridge the gap between business need and tools / algorithms?
goal (of this presentation) spread and discuss ideas on how to structure know how and best practices in analytics present some interesting analytics design patterns that might be of use
me ● 13 year experience in software driven innovation o AI researcher at FZI / PhD Student at KIT o Manager of a research division at FZI concerned with all aspects of information driven decisions o Big Data consultant with codecentric o Analytics Consultant / Data Scientist with Daimler TSS
project patterns ● Data First: Find, understand and utilize patterns in a given data set ● Decision Driven: Identify and realize concrete analytics use case ● Metrics Model: Create systems to understand and optimize a business's core driver’s of growth ● Integration First: Break open data silos and harness integration
value patterns ● Data Driven Business o Operational Excellence o Customer Intimacy o Product Leadership ● Analytics as added Value o For Capitel Goods No unplanned Downtime Asset Optimization Enterprise Optimization ● Data & Analytics as Business
data understanding patterns Data Source Visualization Event Log Exploration Query Based Data Extracts (w/wo sampling) Analytical Record Reverse Pivot Data Audit
data source visualization ● Database structure visualization ● Colors: o Blue=numeric o Green=categoric o Yellow=data/time o Red = other o White = missing
Analytical Record (1/2) Most Data Mining tools and methods demand their input structured around one kind of observation (rows) with a number of values (columns). The extract that represent the elements of interest (customer/product/process instance..) in this way is called the Analytical Record
Analytical Record (2/2) ● e.g. a feature vector representing a hosts behaviour at a point of time based on log data from IDS/Firewall (for intrusion detection) ● e.g. a feature vector representing a customer’s recent interactions with the company (calls to hotline, use of services) from different transactional / log systems
data preparation patterns ● handling outliers o Winsorization o Trimming o Spatial Sign ● data weighting and balancing o Case Weighting o Oversampling / Undersampling o Partitioning
data preparation patterns ● handling missing values o Case Deletion o Available Case Analysis o Single Imputation Mean / Reasonable Value Regression o Multiple Imputation Regression o Maximum Likelihood
data preparation patterns ● data transformation o Generalization Abstraction o Temporal Abstraction o Rates o Dummy Variables for Categories o Centering and Scaling o Polynomials o Interaction Terms o Box Cox transformation
modeling ● Descriptive Modeling o density estimation o dependency modeling o cluster analysis / segmentation ● Predictive Modeling o Classification o Regression ● Pattern / Rule Discovery ● Content Based Retrieval ● Recommendation